> {0} initializer in C or C++ for unions no longer guarantees clearing of the whole union (except for static storage duration initialization), it just initializes the first union member to zero. If initialization of the whole union including padding bits is desirable, use {} (valid in C23 or C++) or use -fzero-init-padding-bits=unions option to restore old GCC behavior.
This is going to silently break so much existing code, especially union based type punning in C code. {0} used to guarantee full zeroing and {} did not, and step by step we've flipped the situation to the reverse. The only sensible thing, in terms of not breaking old code, would be to have both {0} and {} zero initialize the whole union.
I'm sure this change was discussed in depth on the mailing list, but it's absolutely mind boggling to me
mtklein 3 hours ago [-]
This was my instinct too, until I got this little tickle in the back of my head that maybe I remembered that Clang was already acting like this, so maybe it won't be so bad. Notice 32-bit wzr vs 64-bit xzr:
Ah, I can confirm what I see elsewhere in the thread, this is no longer true in Clang. That first clang was Apple Clang 17---who knows what version that actually is---and here is Clang 20:
$ /opt/homebrew/opt/llvm/bin/clang-20 -O1 -c union.c -o union.o && objdump -d union.o
union.o: file format mach-o arm64
Disassembly of section __TEXT,__text:
0000000000000000 <ltmp0>:
0: f900001f str xzr, [x0]
4: d65f03c0 ret
0000000000000008 <_create_d>:
8: f900001f str xzr, [x0]
c: d65f03c0 ret
Do distros have tooling to deal with this type of change?
I imagine it would be very useful to be able to search through all the C/C++ source files for all the packages in the distro in a semantic manner, so that it understands typedefs and preprocessor macros etc. The search query for this change would be something like "find all union types whose first member is not its largest member, then find all lines of code where that type is initialized with `{0}`".
ryao 45 minutes ago [-]
As a retired Gentoo developer, I can say not really as far as I know. There could be static analysis tools that can find this, but I am not aware of anyone who runs them on the entire distribution.
mastax 36 minutes ago [-]
In theory it's just an extension of IDE tooling. A CLI with a little query language wrapping libclang. In practice I'm sure it's a nightmare just to get 20,000 packages' build systems wrangled such that the right source files get indexed by libclang, and all the endless plumbing for downloading packages and reporting results, and on and on.
ryao 32 minutes ago [-]
Distribution build systems typically operate outside of an IDE. I suspect that it would be a nightmare to get 20,000 packages to compile in an IDE.
It is possible in theory to write a compiler plugin to generate an error when code that does this is found and it would make it easy to find all of the instances in all packages by building with `make -k`, provided that the code is not hidden behind an unused package flag.
ogoffart 3 hours ago [-]
> This is going to silently break so much existing code
The code was already broken. It was an undefined behavior.
That's a problem with C and it's undefined behavior minefields.
ryao 3 hours ago [-]
GCC has long been known to define undefined behavior in C unions. In particular, type punning in unions is undefined behavior under the C and C++ standards, but GCC (and Clang) define it.
flohofwoe 1 hours ago [-]
> type punning in unions is undefined behavior under the C and C++ standards
Union type punning is entirely valid in C, but UB in C++ (one of the surprisingly many subtle but still fundamental differences between C and C++). There's specifically a (somewhat obscure) footnote about this in the C standard, which also has been more clarified in one of the recent C standards.
ryao 1 hours ago [-]
There is no footnote about it in the C standard. Someone proposed adding one to standardize the behavior, but it was never accepted. Ever since then, people keep quoting it even though it is a rejected amendment.
jcranmer 1 hours ago [-]
Footnote 107 in C23, on page 75 in §6.5.2.3:
> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
(though this footnote has been present as far back as C99, albeit with different numbers as the standard has added more text in the intervening 24 years).
ryao 59 minutes ago [-]
The GCC developers disagree with your interpretation:
> Type punning via unions is undefined behavior in both c and c++.
I'm not sure tbh what's there to 'interpret' or how a compiler developer could misread that, the wording is quite clear.
ryao 44 minutes ago [-]
It is an excerpt being taken out of context. Of course it is quite clear. Taking it out of context ignores everything else that the standard says. That interpretation is wrong as far as compiler authors are concerned.
trealira 19 minutes ago [-]
The context is that it's a footnote. The footnote is referenced in this paragraph:
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member (106), and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type
of the designated member.
106) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
In that same document, union type punning is explicitly listed under Annex J.1, Unspecified Behavior:
(11) The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).
The standard is extremely clear and explicit that it's not undefined behavior.
ryao 12 minutes ago [-]
This is not considering the document as a whole. I will defer to the GCC developers on what the document means on this.
mtklein 3 hours ago [-]
I have always thought that punning through a union was legal in C but UB in C++, and that punning through incompatible pointer casting was UB in both.
I am basing this entirely on memory and the wikipedia article on type punning. I welcome extremely pedantic feedback.
jcranmer 1 hours ago [-]
> punning through a union was legal in C
In C89, it was implementation-defined. In C99, it was made expressly legal, but it was erroneously included in the list of undefined behavior annex. From C11 on, the annex was fixed.
> but UB in C++
C++11 adopted "unrestricted unions", which added a concept of active members that is UB to access other members unless you make them active. Except active members rely on constructors and destructors, which primitive types don't have, so the standard isn't particularly clear on what happens here. The current consensus is that it's UB.
C++20 added std::bit_cast which is a much safer interface to type punning than unions.
> punning through incompatible pointer casting was UB in both
There is a general rule that accessing an object through an 'incompatible' lvalue is illegal in both languages. In general, changing the const or volatile qualifier on the object is legal, as is reading via a different signed or unsigned variant, and char pointers can read anything.
ryao 57 minutes ago [-]
The GCC developers disagree as of last December:
> Type punning via unions is undefined behavior in both c and c++.
> In C99, it was made expressly legal, but it was erroneously included in the list of undefined behavior annex.
In C99, union type punning was put under Annex J.1, which is unspecified behavior, not undefined behavior. Unspecified behavior is basically implementation-defined behavior, except that the implementor is not required to document the behavior.
ryao 34 minutes ago [-]
We can use UB to refer to both. :)
trealira 23 minutes ago [-]
Maybe, but we were talking about "undefined behavior," not "UB," so the point is moot.
There has been plenty of misinformation spread on that. One of the GCC developers told me explicitly that type punning through a union was UB in C, but defined by GCC when I asked (after I had a bug report closed due to UB). I could find the bug report if I look for it, but I would rather not do the search.
trealira 2 hours ago [-]
From a draft of the C23 standard, this is what it has to say about union type punning:
> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
In past standards, it said "trap representation" rather than "non-value representation," but in none of them did it say that union type punning was undefined behavior. If you have a PDF of any standard or draft standard, just doing a search for "type punning" should direct you to this footnote quickly.
So I'm going to say that if the GCC developer explicitly said that union type punning was undefined behavior in C, then they were wrong, because that's not what the C standard says.
amboar 1 hours ago [-]
Section J.1 _Unspecified_ behavior says
> (11) The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).
So it's a little more constrained in the ramifications, but the outcomes may still be surprising. It's a bit unfortunate that "UB" aliases to both "Undefined behavior" and "Unspecified behavior" given they have subtly different definitions.
From section 4 we have:
> A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 5.1.2.4.
ryao 56 minutes ago [-]
Here is what was said:
> Type punning via unions is undefined behavior in both c and c++.
Feel free to start a discussion on the GCC mailing list.
trealira 43 minutes ago [-]
I actually might, although not now. Thanks for the link. I'm surprised he directly contradicted the C standard, rather than it just being a misunderstanding.
ryao 40 minutes ago [-]
According to another comment, the C standard contradicts the C standard on this:
Taking snippets of the C standard out of context of the whole seems to result in misunderstandings on this.
trealira 38 minutes ago [-]
It doesn't. That commenter is saying that in C99, it was unspecified behavior. Since C11 onward, it's been removed from the unspecified behavior annex and type punning is allowed, though it may generate a trap/non-value representation. It was never undefined behavior, which is different.
Edit: no, it's still in the unspecified behavior annex, that's my mistake. It's still not undefined, though.
ryao 36 minutes ago [-]
Most of the C code I write is C99 code, so it is undefined behavior either way for me (if I care about compilers other than GCC and Clang).
That said, I am going to defer to the GCC developers on this since I do not have time to make sense of all versions of the C standard.
trealira 33 minutes ago [-]
That's fair. In the end, what matters is how C is implemented in practice on the platforms your code targets, not what the C standard says.
> One of the GCC developers told me explicitly that type punning through a union was UB in C, but defined by GCC when I asked
I just was citing the source of this for reference.
ryao 38 minutes ago [-]
I see. Carry on then. :)
mat_epice 2 hours ago [-]
EDIT: This comment is wrong, see fsmv’s comment below. Leaving for posterity because I’m no coward!
- - -
Undefined behavior only means that the spec leaves a particular situation undefined and that the compiler implementor can do whatever they want. Every compiler defines undefined behavior, whether it’s documented (or easy to qualify, or deterministic) or not.
It is in poor taste that gcc has had widely used, documented behaviors that are changing, especially in a point release.
fsmv 1 hours ago [-]
I think you're confusing unspecified and undefined behavior. UB could do something randomly different every time and unspecified must chose an option.
In a lot of cases in optimizing compilers they just assume UB doesn't exist. Yes technically the compiler does do something but there's still a big difference between the two.
mat_epice 1 hours ago [-]
Thanks, you’re right, I was mistaken.
grandempire 2 hours ago [-]
When you have a big system many people rely on you generally try to look for ways to keep their code working - not look for the changes you’re contractually allowed to make.
GCC probably has a better justification than “we are allowed to”.
arp242 47 minutes ago [-]
> GCC probably has a better justification than “we are allowed to”.
Maybe, but I've seen GCC people justify such changes with little more than "it's UB, we can change it, end of story", so I wouldn't assume it.
myrmidon 2 hours ago [-]
I honestly feel that "uninitialized by default" is strictly a mistake, a relic from the days when C was basically cross-platform assembly language.
Zero-initialized-by-default for everything would be an extremely beneficial tradeoff IMO.
Maybe with a __noinit attribute or somesuch for the few cases where you don't need a variable to be initialized AND the compiler is too stupid to optimize the zero-initialization away on its own.
This would not even break existing code, just lead to a few easily fixed performance regressions, but it would make it significantly harder to introduce undefined and difficult to spot behavior by accident (because very often code assumes zero-initialization and gets it purely by chance, and this is also most likely to happen in the edge cases that might not be covered by tests under memory sanitizer if you even have those).
For malloc, you could use a custom allocator, or replace all the calls with calloc.
myrmidon 21 minutes ago [-]
Very nice, did not know about this!
The only problem with vendor extensions like this is that you can't really rely on it, so you're still kinda forced to keep all the (redundant) zero intialization; solving it at the language level is much nicer. Maybe with C2030...
bjourne 53 minutes ago [-]
There are many low-level devices where initialization is very expensive. It may mean that you need two passes through memory instead of one, making whatever code you are running twice as slow.
myrmidon 40 minutes ago [-]
I would argue that these cases are pretty rare, and you could always get nominal performance with the __noinit hint, but I think this would seldomly even be needed.
If you have instances of zero-initialized structs where you set individual fields after the initialization, all modern compiler will elide the dead stores in the the typical cases already anyway, and data of relevant size that is supposed to stay uninitialized for long is rare and a bit of an anti-pattern in my opinion anyway.
modeless 35 minutes ago [-]
Ok, those developers can use a compiler flag. We need defaults that work better for the vast majority.
44 minutes ago [-]
elromulous 1 hours ago [-]
Devil's advocate: this would be unacceptable for os kernels and super performance critical code (e.g. hft).
TuxSH 12 minutes ago [-]
> this would be unacceptable for os kernels
Depends on the boundary. I can give a non-Linux, microkernel example (but that was/is shipped on dozens of millions of devices):
- prior to 11.0, Nintendo 3DS kernel SVC (syscall) implementations did not clear output parameters, leading to extremely trivial leaks. Unprivileged processes could retrieve kernel-mode stack addresses easily and making exploit code much easier to write, example here: https://github.com/TuxSH/universal-otherapp/blob/master/sour...
- Nintendo started clearing all temporary registers on the Switch kernel at some point (iirc x0-x7 and some more); on the 3DS they never did that, and you can leak kernel object addresses quite easily (iirc by reading r2), this made an entire class of use-after-free and arbwrite bugs easier to exploit (call SvcCreateSemaphore 3 times, get sema kernel object address, use one of the now-patched exploit that can cause a double-decref on the KSemaphore, call SvcWaitSynchronization, profit)
more generally:
- unclearead padding in structures + copy to user = infoleak
so one at least ought to be careful where crossing privilege boundaries
myrmidon 1 hours ago [-]
No, just throw the __noinit attribute at every place where its needed.
You probably would not even need it in a lot of instances because the compiler would elide lots of dead stores (zeroing) even without hinting.
sidkshatriya 1 hours ago [-]
Would you rather have a HFT trade go correctly and a few nanoseconds slower or a few nanoseconds faster but with some edge case bugs related to variable initialisation ?
You might claim that that you can have both but bugs are more inevitable in the uninitialised by default scenario. I doubt that variable initialisation is the thing that would slow down HFT. I would posit is it things like network latency that would dominate.
pjmlp 1 hours ago [-]
It is acceptable enough for Windows, Android and macOS, that have been doing for at least the last five years.
That is the usual fearmongering when security improvements are done to C and C++.
ryao 3 hours ago [-]
> This is going to silently break so much existing code
How much code actually uses unions this way?
> especially union based type punning in C code
I have never done type punning via the GNU C compiler extension in a way that would break because of this. I always assign a value to it and then get out the value from a new type. Do you know of any code that does things differently to be affected by this?
ndiddy 3 hours ago [-]
> How much code actually uses unions this way?
I see this change caused Mbed-TLS to start failing its test suite when compiled with GCC 15: https://github.com/Mbed-TLS/mbedtls/issues/9814 (kinda scary since it's a security library). Hopefully other projects with less rigorous test suites aren't using {0} in that way. The Github issue mentions that Clang tried a similar optimization a while ago and backed it out after user complaints, so maybe the same thing will happen with GCC.
ryao 3 hours ago [-]
GCC’s developers have a strong insistence on standards conformance (minus situations where they explicitly choose to deviate, like type punning in unions) over the status quo. We already went through a much more severe shift with strict aliasing enforcement by GCC and they never changed course. I do not expect this to be any different.
3 hours ago [-]
Calavar 3 hours ago [-]
I would guess a lot. People aren't intimately familiar with the standard, and people are lazy when it comes to writing boilerplate like initialization code. And up until now, it just worked, so even a good test suite wouldn't catch it.
EDIT: I initially mentioned type punning for arithmetic, but this compiler change wouldn't affect that
ryao 3 hours ago [-]
How would that be broken by this? The union will be zero initialized regardless because this change only affects situations where the union members are of different lengths, but for integer to float, the union members should always be the same length or bad things will happen.
Calavar 3 hours ago [-]
I realized my mistake and I think I edited my comment a split second before you replied, but you're right. That particular type punning scenario wouldn't be affected by this change because 1) the members are the same size, so there's no padding bits 2) the specific union member is going to be initialized to the input parameter, not with the syntax sugar for aggregate zero initialization.
ryao 3 hours ago [-]
Well, under your original version, I could see someone filling in bit fields in the float like the exponent and sign while leaving the mantissa zeroed, but given that the integer and float would be the same length, there is no section that would be left uninitialized by this change.
In order for this change to leave something uninitialized, you would need to have a member of the union after the first member that is longer than the first member. Code that does that and relies on {0} to zero the union seems incredibly rare to me.
mistrial9 3 hours ago [-]
using UNION was always considered sketchy IMHO. This is trivia for security exploiters?
grandempire 2 hours ago [-]
No. This is how sum types are implemented.
And from a runtime perspective it’s going to be a struct with perhaps more padding. You’ll need more details about your specific threat model to explain why that’s bad.
mistrial9 2 hours ago [-]
a quick search says that std::variant is the modern replacement to implement your niche feature "sum types"
jlouis 1 hours ago [-]
Not a niche feature. Fundamental for any decent language with a type system.
grandempire 2 hours ago [-]
That’s for C++. And how is std::variant implemented?
Actually, it does use a union, in both libstdc++ [0] and libc++ [1]. (Underneath a lengthy stack of base classes, since it wouldn't be C++ if it weren't painful to match the specified semantics.)
So instead it has a buffer large enough to hold all the types? That’s what union does.
Still waiting to hear the security concerns.
VyseofArcadia 3 hours ago [-]
I feel like once a language is standardized (or reaches 1.0), that's it. You're done. No more changes. You wanna make improvements? Try out some new ideas? Fine, do that in a new language.
I can deal with the footguns if they aren't cheekily mutating over the years. I feel like in C++ especially we barely have the time to come to terms with the unintended consequences of the previous language revision before the next one drops a whole new load of them on us.
seritools 3 hours ago [-]
> If the size of the new type is larger than the size of the last-written type, the contents of the excess bytes are unspecified (and may be a trap representation). Before C99 TC3 (DR 283) this behavior was undefined, but commonly implemented this way.
> When initializing a union, the initializer list must have only one member, which initializes the first member of the union unless a designated initializer is used(since C99).
→ = {0} initializes the first union variant, and bytes outside of that first variant are unspecified. Seems like GCC 15.1 follows the 26 year old standard correctly. (not sure how much has changed from C89 here)
pjmlp 3 hours ago [-]
Programming languages are products, that is like saying you want to keep using vi 1.0.
Maybe C should have stop at K&R C from UNIX V6, at least that would have spared the world in having it being adopted outside UNIX.
rgoulter 2 hours ago [-]
I liked the idea I heard: internet audiences demand progress, but internet audiences hate change.
ryao 47 minutes ago [-]
If C++ had never been invented, that might have been the case.
3 hours ago [-]
ryao 3 hours ago [-]
I suspect this change was motivated by standards conformance.
fuhsnn 3 hours ago [-]
The wording of GCC maintainer was "the standard doesn't require it." when they informed Linux kernel mailing list.
> I feel like once a language is standardized (or reaches 1.0), that's it. You're done. No more changes. You wanna make improvements? Try out some new ideas? Fine, do that in a new language.
Thank goodness this is not how the software world works overall. I'm not sure you understand the implications of what you ask for.
> if they aren't cheekily mutating over the years
You're complaining about languages mutating, then mention C++ which has added stuff but maintained backwards compatibility over the course of many standards (aside from a few hiccups like auto_ptr, which was also short lived), with a high aversion to modifying existing stuff.
hulitu 3 hours ago [-]
It's careless development. Why think something in advance when you can fix it later. It works so well for Microsoft, Google and lately Apple. /s
The release cycle of a software speaks a lot about its quality. Move fast, break things has become the new development process.
The next major version of the GNU Compiler Collection (GCC), 15.1, is expected to be released in April or May 2025.
GCC 15 greatly improved the modules code. For instance, module std is now supported (even in C++20 mode).
artemonster 2 hours ago [-]
those were the greatest improvements of all time. all of them. :D
Rendered at 16:11:50 GMT+0000 (Coordinated Universal Time) with Vercel.
This is going to silently break so much existing code, especially union based type punning in C code. {0} used to guarantee full zeroing and {} did not, and step by step we've flipped the situation to the reverse. The only sensible thing, in terms of not breaking old code, would be to have both {0} and {} zero initialize the whole union.
I'm sure this change was discussed in depth on the mailing list, but it's absolutely mind boggling to me
I imagine it would be very useful to be able to search through all the C/C++ source files for all the packages in the distro in a semantic manner, so that it understands typedefs and preprocessor macros etc. The search query for this change would be something like "find all union types whose first member is not its largest member, then find all lines of code where that type is initialized with `{0}`".
It is possible in theory to write a compiler plugin to generate an error when code that does this is found and it would make it easy to find all of the instances in all packages by building with `make -k`, provided that the code is not hidden behind an unused package flag.
The code was already broken. It was an undefined behavior.
That's a problem with C and it's undefined behavior minefields.
Union type punning is entirely valid in C, but UB in C++ (one of the surprisingly many subtle but still fundamental differences between C and C++). There's specifically a (somewhat obscure) footnote about this in the C standard, which also has been more clarified in one of the recent C standards.
> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
(though this footnote has been present as far back as C99, albeit with different numbers as the standard has added more text in the intervening 24 years).
> Type punning via unions is undefined behavior in both c and c++.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member (106), and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.
106) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
In that same document, union type punning is explicitly listed under Annex J.1, Unspecified Behavior:
(11) The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).
The standard is extremely clear and explicit that it's not undefined behavior.
I am basing this entirely on memory and the wikipedia article on type punning. I welcome extremely pedantic feedback.
In C89, it was implementation-defined. In C99, it was made expressly legal, but it was erroneously included in the list of undefined behavior annex. From C11 on, the annex was fixed.
> but UB in C++
C++11 adopted "unrestricted unions", which added a concept of active members that is UB to access other members unless you make them active. Except active members rely on constructors and destructors, which primitive types don't have, so the standard isn't particularly clear on what happens here. The current consensus is that it's UB.
C++20 added std::bit_cast which is a much safer interface to type punning than unions.
> punning through incompatible pointer casting was UB in both
There is a general rule that accessing an object through an 'incompatible' lvalue is illegal in both languages. In general, changing the const or volatile qualifier on the object is legal, as is reading via a different signed or unsigned variant, and char pointers can read anything.
> Type punning via unions is undefined behavior in both c and c++.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13
In C99, union type punning was put under Annex J.1, which is unspecified behavior, not undefined behavior. Unspecified behavior is basically implementation-defined behavior, except that the implementor is not required to document the behavior.
> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
In past standards, it said "trap representation" rather than "non-value representation," but in none of them did it say that union type punning was undefined behavior. If you have a PDF of any standard or draft standard, just doing a search for "type punning" should direct you to this footnote quickly.
So I'm going to say that if the GCC developer explicitly said that union type punning was undefined behavior in C, then they were wrong, because that's not what the C standard says.
> (11) The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).
So it's a little more constrained in the ramifications, but the outcomes may still be surprising. It's a bit unfortunate that "UB" aliases to both "Undefined behavior" and "Unspecified behavior" given they have subtly different definitions.
From section 4 we have:
> A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 5.1.2.4.
> Type punning via unions is undefined behavior in both c and c++.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13
Feel free to start a discussion on the GCC mailing list.
https://news.ycombinator.com/item?id=43794268
Taking snippets of the C standard out of context of the whole seems to result in misunderstandings on this.
Edit: no, it's still in the unspecified behavior annex, that's my mistake. It's still not undefined, though.
That said, I am going to defer to the GCC developers on this since I do not have time to make sense of all versions of the C standard.
That said, using “the code compiles in godbolt” as proof that it is not relying on what the standard specifies to be UB is fallacious.
> Type punning via unions is undefined behavior in both c and c++.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13
I just was citing the source of this for reference.
- - -
Undefined behavior only means that the spec leaves a particular situation undefined and that the compiler implementor can do whatever they want. Every compiler defines undefined behavior, whether it’s documented (or easy to qualify, or deterministic) or not.
It is in poor taste that gcc has had widely used, documented behaviors that are changing, especially in a point release.
In a lot of cases in optimizing compilers they just assume UB doesn't exist. Yes technically the compiler does do something but there's still a big difference between the two.
GCC probably has a better justification than “we are allowed to”.
Maybe, but I've seen GCC people justify such changes with little more than "it's UB, we can change it, end of story", so I wouldn't assume it.
Zero-initialized-by-default for everything would be an extremely beneficial tradeoff IMO.
Maybe with a __noinit attribute or somesuch for the few cases where you don't need a variable to be initialized AND the compiler is too stupid to optimize the zero-initialization away on its own.
This would not even break existing code, just lead to a few easily fixed performance regressions, but it would make it significantly harder to introduce undefined and difficult to spot behavior by accident (because very often code assumes zero-initialization and gets it purely by chance, and this is also most likely to happen in the edge cases that might not be covered by tests under memory sanitizer if you even have those).
For malloc, you could use a custom allocator, or replace all the calls with calloc.
The only problem with vendor extensions like this is that you can't really rely on it, so you're still kinda forced to keep all the (redundant) zero intialization; solving it at the language level is much nicer. Maybe with C2030...
If you have instances of zero-initialized structs where you set individual fields after the initialization, all modern compiler will elide the dead stores in the the typical cases already anyway, and data of relevant size that is supposed to stay uninitialized for long is rare and a bit of an anti-pattern in my opinion anyway.
Depends on the boundary. I can give a non-Linux, microkernel example (but that was/is shipped on dozens of millions of devices):
- prior to 11.0, Nintendo 3DS kernel SVC (syscall) implementations did not clear output parameters, leading to extremely trivial leaks. Unprivileged processes could retrieve kernel-mode stack addresses easily and making exploit code much easier to write, example here: https://github.com/TuxSH/universal-otherapp/blob/master/sour...
- Nintendo started clearing all temporary registers on the Switch kernel at some point (iirc x0-x7 and some more); on the 3DS they never did that, and you can leak kernel object addresses quite easily (iirc by reading r2), this made an entire class of use-after-free and arbwrite bugs easier to exploit (call SvcCreateSemaphore 3 times, get sema kernel object address, use one of the now-patched exploit that can cause a double-decref on the KSemaphore, call SvcWaitSynchronization, profit)
more generally:
- unclearead padding in structures + copy to user = infoleak
so one at least ought to be careful where crossing privilege boundaries
You probably would not even need it in a lot of instances because the compiler would elide lots of dead stores (zeroing) even without hinting.
You might claim that that you can have both but bugs are more inevitable in the uninitialised by default scenario. I doubt that variable initialisation is the thing that would slow down HFT. I would posit is it things like network latency that would dominate.
That is the usual fearmongering when security improvements are done to C and C++.
How much code actually uses unions this way?
> especially union based type punning in C code
I have never done type punning via the GNU C compiler extension in a way that would break because of this. I always assign a value to it and then get out the value from a new type. Do you know of any code that does things differently to be affected by this?
I see this change caused Mbed-TLS to start failing its test suite when compiled with GCC 15: https://github.com/Mbed-TLS/mbedtls/issues/9814 (kinda scary since it's a security library). Hopefully other projects with less rigorous test suites aren't using {0} in that way. The Github issue mentions that Clang tried a similar optimization a while ago and backed it out after user complaints, so maybe the same thing will happen with GCC.
EDIT: I initially mentioned type punning for arithmetic, but this compiler change wouldn't affect that
In order for this change to leave something uninitialized, you would need to have a member of the union after the first member that is longer than the first member. Code that does that and relies on {0} to zero the union seems incredibly rare to me.
And from a runtime perspective it’s going to be a struct with perhaps more padding. You’ll need more details about your specific threat model to explain why that’s bad.
[0] https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3...
[1] https://github.com/llvm/llvm-project/blob/llvmorg-20.1.3/lib...
Still waiting to hear the security concerns.
I can deal with the footguns if they aren't cheekily mutating over the years. I feel like in C++ especially we barely have the time to come to terms with the unintended consequences of the previous language revision before the next one drops a whole new load of them on us.
https://en.cppreference.com/w/c/language/union
> When initializing a union, the initializer list must have only one member, which initializes the first member of the union unless a designated initializer is used(since C99).
https://en.cppreference.com/w/c/language/struct_initializati...
→ = {0} initializes the first union variant, and bytes outside of that first variant are unspecified. Seems like GCC 15.1 follows the 26 year old standard correctly. (not sure how much has changed from C89 here)
Maybe C should have stop at K&R C from UNIX V6, at least that would have spared the world in having it being adopted outside UNIX.
https://lore.kernel.org/linux-toolchains/Z0hRrrNU3Q+ro2T7@tu...
https://www.yodaiken.com/2018/06/07/torvalds-on-aliasing/
Thank goodness this is not how the software world works overall. I'm not sure you understand the implications of what you ask for.
> if they aren't cheekily mutating over the years
You're complaining about languages mutating, then mention C++ which has added stuff but maintained backwards compatibility over the course of many standards (aside from a few hiccups like auto_ptr, which was also short lived), with a high aversion to modifying existing stuff.
The release cycle of a software speaks a lot about its quality. Move fast, break things has become the new development process.
> C: #embed preprocessing directive support.
> C++: P1967R14, #embed (PR119065)
See also:
https://news.ycombinator.com/item?id=32201951 - Embed is in C23 (2022-07-23)
It would be nice to know what these great improvements actually are.