Krebs has a write up on this discovery of the “Trojan Source” vulnerability. What makes this vulnerability unique is that it affects most computer code compilers and many SDEs. This is due to the issue lying with the Unicode encoding standard. This is the standard that translates characters regardless of language used to facilitate communication between computers. The problem was discovered with the bi-directional override that is used to display the order in which the characters appear. This override exists for switching the order of characters when going from a left-to-right reading language to a right-to-left, such as English to Arabic. These Bidi overrides can be used in comments and strings, which is a problem because most programming languages allow comments which all text within is ignored by the compilers. And most languages allow string literals that can contain special or control characters. As quoted from the research paper, “Therefore, by placing Bidi override characters exclusively within comments and strings, we can smuggle them into source code in a manner that most compilers will accept. Our key insight is that we can reorder source code characters in such a way that the resulting display order also represents syntactically valid source code”. This research paper highlights this issue for almost all computer languages and makes it a great opportunity for vendors to get ahead of this issue before it becomes a problem.
https://krebsonsecurity.com/2021/11/trojan-source-bug-threatens-the-security-of-all-code/#more-57367
This was an interesting read. I thought the following was particularly concerning:
“Any developer who copies code from an untrusted source into a protected code base may inadvertently introduce an invisible vulnerability”
This introduces an insidious attack vector given the prevalence of copying/pasting code from forums, etc.