Theory vs. Practice
Diagnosis is not the end, but the beginning of practice.
ARM "Memory Tagging Extension" (MET) vs SLIMalloc
As a designer of CPUs based in the U.K., ARM has tried to address the "memory safety" issues that the NSA, rightly in our humble opinion, attributes to the OS vendors ("their vulnerabilities") properly documented by Paul Hsieh's Microsoft Watch page.
Having received a request to compare the new security features of the recent ARM CPUs to SLIMalloc, we have searched and finally found a description of their method.
ARM has designed MET with the assistance of GOOGLE but since most of the published documents are marketing nonsense, it requires some time and dedication to find enough facts to evaluate their combined works.
As the left picture suggests, this is a coloring scheme. For this to work, all memory accesses MUST be made from an address that belongs to the SAME COLOR used by the whole memory area we attempt to reach.
ARM MET access memory violations trigger a CPU-generated fault crashing the process (unlike SLIMalloc which gracefully blocks, documents and recovers errors).
Do ARM CPUs boosted by MET make our OS and applications safer (and run faster) like SLIMalloc does? Let's have a look!
- The ARM "Memory Tagging Extension" Design
Unlike SLIMalloc which is transparent for all CPUs, OS kernels and usermode layers, using ARM MET requires to manufacture and buy recent ARM CPUs – as well as modifications in the OS kernel and usermode system libraries (like if they were not already carrying more fat than butter).
This digression was required to understand what follows (and which would have granted jail-time a few decades ago):
16-byte blocks are associated with a random (or pseudo-random – "both methods are supported") 4-bit value (a mere 16 possible different values) and consecutive blocks use a same random value in the "fast and less-precise" version and different ones for the "slower but more precise" version.
These values are stored in an unused byte of the pointers, permitting to check if a buffer is underflowed or overflowed, re-used while already freed, freed twice, etc.
At this stage, severe by-design problems are already apparent. But there's worse because it seems that everything that could have been designed wrongly has been executed wrongly in this project. So, without any loss of enthusiasm, let's continue this exciting journey in "cloud-cuckoo-land" Cloud-cuckoo-land: to think that things that are completely impossible might happen, rather
than understanding how things really are: "when referees make contentious decisions players
are going to be upset, and anyone who thinks otherwise is living in cloud cuckoo land."
– Cambridge Advanced Learner's Dictionary
I rather see this place as a foggy forest where people are lost, spending their life calling for help
by saying 'cuckoo, cuckoo'. They occasionally get an answer but never manage to locate or
meet anyone because:
Finance has an incentive at hiding its sins so that what the rule of law defines as a nasty scam
is not reproached them – and the music keeps playing ("he who pays the piper calls the tune").
The ones paying the musicians (politicians, judges) can decide what music they want to hear. (an expression I found appropriate and therefore borrowed from Baron Maurice Harry Peston, Member of the U.K. House of Lords – who has all my gratitude). - Why ARM "Memory Tagging Extension" is (Utterly) Unsafe
The "Short" Story (actually longer but easier to understand)
This is UNSAFE because once any given application has allocated more than 16 memory blocks THERE WILL BE POSSIBLE HACKS DUE TO THE SAME TAG BEING USED FOR MANY DIFFERENT BLOCKS. It is not unusual for programs to allocate millions of blocks (that are split and merged hundreds of thousands of times during the life of the application)... so you can figure how fast and brutally the damage will come.
The 2019 GOOGLE "ARM Memory Tagging Extension and How It Improves C/C++ Memory Safety" paper claims that a "random tag will be different from another random tag with a probability of 15/16 or ~93.7%".
This claim is outrageous because a maximum of 16 write attempts are all it takes to destroy, 100% of the time, this misleading "93% safety guaranty" celebrated by GOOGLE:
That's a lock that can be opened by a banana as long as you try up to 16 times (the first attempt can succeed, or the 5th).
If such a lock existed, would you trust a vendor claiming it offers a "93.7% safety guaranty"?
If such a lock existed, would you merely consider buying it?
Pirates will enjoy a 100% success-rate when bypassing this "security" – by just trying all the 16 colors.
Doing so only requires to change the upper byte of their pointer, until the pointer does not crash the program – and performs a nefarious OOB (Out-Of-Bounds) write or UAF (Use-After-Free) to compromise your device.
But, even worse, since a (pseudo-)random generator is used, on average, duplicated tags should be generated AS SOON AS 8 BLOCKS ARE GENERATED (accelerating the occurrence of successful hacks).
Memory heap SPRAYING attacks (a well-established method used to abuse Web Browsers) will succeed with a very high probability (demonstrating the UNSAFETY of the ARM/GOOGLE MET probability-based approach: 16 ATTACKS ALWAYS SUCCEED, and 8 are most likely to succeed).
This is also UNSAFE because BLOCKS < 16 BYTES are NOT PROTECTED AT ALL (unless the minimum allocation size is raised to 16 bytes, which is wasting even more memory).
Further, the UAF (use-after-free) vulnerability is STILL NOT ADDRESSED because re-allocating a subset of freed memory DOES NOT re-tag the reallocated memory. Why bother, right? After all, UAF is GOOGLE's favorite and all-times winner vulnerability (at the expense of end-users) – let's not change that!
With such a level of deception (selling a false sense of security is by definition a scam), there's no wonder about why the Cyber-Security market is planned to triple by 2030 (at the expenses of the taxpayer).
The "Long" Story (actually shorter but that requires skills)
Since violations trigger a program crash, trying all the 16 colors in a row is (theoretically) not possible, and the success-rate of any isolated attack will be 6.25% (100/16)... which is not exactly the definition of safety (and will delight script-kiddies).
Worse, steps can be taken to increase this already disastrous "security guaranty" to a 100% success-rate:
- if you can handle the CPU fault (like SLIMalloc does for fatal CPU and kernel errors) then you can try all the colors in a raw (granting access in one single attack);
- when a read-OOB is possible then reading a block color before doing a write-OOB (a technique widely exploited in Web Browsers and Web servers is to request or force memory allocations) then your success-rate will be 100% either locally or remotely (granting access in one single attack);
- when a PRNG (the ARM MET specifications say that "both a PRNG and a TRNG are supported" but there are no details about why, when and how the weaker RNG version will be involved) is used to assign block colors, getting a few block colors will deterministically let you find the color of the block of interest (granting access in one single attack).
As you see, the problem with ARM MET is well beyond its "probabilistic security" design... and it is impossible to imagine that GOOGLE did not think about all these issues before advising ARM with such a poor solution.
- Why ARM "Memory Tagging Extension" is Slow
This is SLOW because every single 16-byte block MUST be associated to a tag (wasting RAM and CPU) at the time every new memory allocation is made – and then checked when memory is accessed.
Frankly, at this stage, you might wonder why they are storing this kind of metadata rather than the allocated block size which would allow to make exact verifications instead of probabilistic ones:
"Oh, I will now check if we write beyond the block size – but, well, how long is this particular block?
We don't know exactly?
Not a problem, I will give you my best opinion based on the lack of facts we have been given... by ourselves while we had access to the actual and complete information that we have so brilliantly chosen not to use".If this logic sounds odd to you then don't feel bad: that's the evidence that, in contrast with the ARM/GOOGLE MET, you are enjoying all your mental capacity.
In its 2019 paper "ARM Memory Tagging Extension and How It Improves C/C++ Memory Safety" GOOGLE states that: "MTE's memory overhead is 3%-5%" (the authors do not specify if that's for the "fast and less-precise" version or for the "slow and more precise" version but they invite developers to "avoid making large memory allocations" – a wise advice that will certainly convince software engineers that their faith is in good hands).
Note that, despite being implemented in the CPU, ARM "Memory Tagging Extension" generates a performance overhead (while SLIMalloc, all implemented in software, actually accelerates programs). If SLIMalloc was implemented in the CPU, it would speed up programs even more than it does today!
Ahem, I don't want to be rude but, hey, since it is perfectly possible to do the job properly (like SLIMalloc did in 2020), why the hell should you waste tens of millions of dollars annually (modifying and licensing a CPU design is not for the faint heart) for something that has so obviously been designed to fail?
In comparison, SLIMalloc (first published in 2020, and upgraded in 2023 to make C/C++ "memory-safe") is:
- Well-Designed because it:
performs systematic checks (not probabilistic checks).
- Safe because it:
relies on every block's exact allocated size (not "16-byte" increments!).
- Fast because it:
is as fast or faster than the fastest (unsafe) memory allocators written by GOOGLE, FACEBOOK and MICROSOFT.
In 2023, hundreds of articles written about SLIMalloc are still censored on LinkedIn and search engines.
Cui Bono? (who benefits?)
Clearly not the end-users (nor the taxpayer funding the poor MET research that endangers the critical infrastructure... and the reputation of its billionaire authors – if the media were telling the facts instead of merely relaying marketing contents).
To say it gently, the ARM "Memory Tagging Extension" is a shame (it is both slow and unsafe). The less gentle version is that all the invested public money seems to be increasingly wasted by attributing all public contracts and subsidies to the least capable (and most expensive) market players... that are not shy at delivering absolutely useless capacity.
Forever stacking layers of proven vacuity and vulnerability on the top of other layers is good to augment the surface of vulnerability (and therefore increase the perceived need for more solutions). But the someone has to pay the bills (of these pointless and flawed by-design "solutions", on the top of the ever-growing costs of recurring failure).
This ever-growing complexity is the reason why 'modern' (monolithic macro-kernel) OSes have so many problems, and conflicts with the 50 years old UNIX philosophy which promoted a self-sufficient modular architecture (where each component does properly one thing) rather than myriads of redundant and ever-obsolete interleaved dependencies.
Under the pressure of finance (eager to exact as much as possible from the ecosystem, a deviation from its original role of funding the making of products that then could be sold), the industry has embraced the inefficient and ever-breaking way of doing things called "planned obsolescence" as artificially creating problems increases the size of the market (a necessary condition for the musicians to keep playing: paying back debt requires an ever-growing monetary mass, also known as 'inflation'):
Inflation is the one form of taxation that can be imposed without legislation.
A little bit of inflation, like a little bit of pregnancy, has the awkward habit of growing.
It goes without saying that, if History is of any guidance in this matter, such a run-away tactic always ends in tears, ruining everybody (everybody but finance, as they know exactly when to massively-create and convert fiat money into tangible assets).
Like if Western countries aimed at failing as fast as possible in their race to loose the World leadership in every possible category (all but "the most ridiculous self-complacent unlimitedly-funded robber-baron", to quote a venerable U.K. institution... with GOOGLE at its board).
The GOOGLE Chrome Internet Browser has so far compromized the security of unsuspecting end-users with 9 MEMORY-UNSAFETY zero-days since January 2024:
- CVE-2024-0519 - Out-of-bounds memory access in V8
- CVE-2024-2886 - Use-after-free in WebCodecs
- CVE-2024-2887 - Type confusion in WebAssembly
- CVE-2024-3159 - Out-of-bounds memory access in V8
- CVE-2024-4671 - Use-after-free in Visuals
- CVE-2024-4761 - Out-of-bounds write in V8
- CVE-2024-4947 - Type confusion in V8
- CVE-2024-5274 - Type confusion in V8
GOOGLE "Address Sanitizer" (ASAN) claims to correct:
- Use after free (dangling pointer dereference), - Heap buffer overflow, - Stack buffer overflow, - Global buffer overflow, - Use after return, - Use after scope, - Initialization order bugs, - Memory leaks (compared side-by-side with SLIMalloc).
Either GOOGLE ASAN does not work, or GOOGLE does not want to secure GOOGLE Chrome. Both cases should be a concern.