EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Use of volatile in structure definitions to force word accesses

Started by Tim Wescott September 23, 2015
On Thu, 24 Sep 2015 23:26:22 -0400, DJ Delorie wrote:

> Tim Wescott <seemywebsite@myfooter.really> writes: >> I, for one, would really like to see one of our resident compiler >> experts answer the question at the bottom. > > Well, I'm the resident compiler guy who fixed it in gcc :-) > > As others have noted, you need the version of gcc that supports > -fstrict-volatile-bitfields, *and* you need to have a well-defined (i.e. > unambigious) struct (no holes, consistent types, etc), *and* you need to > have a port that supports it (ARM does), *and* it has to be a > size/alignment that the hardware supports (duh), *and* the > -fstrict-volatile-bitfields options needs to be enabled (a later version > of gcc has a third setting that means "unless the ABI says otherwise" > but the result is the same *if* your struct is well-defined) (-fs-v-b > might (should?) be the default for ports that support it, I don't recall > if they argued to change that). > > So a struct like this should be OK: > > volatile struct foo { > int32_t a:17; > int32_t b:5; int32_t c:10; > }; > > but this would not be: > > volatile struct foo { > char a:8; int32_t b:24; > }; > > But, the only portable reliable way to guarantee the right type of > access is to not use bitfields.
Thanks DJ, for the work as well as the comments. I'm willing to put up with the non-portability caused by bitfields, because at this point I don't see moving away from gcc for any reason (well, unless some client pays me to, and if so they can pay for all the rest, too). -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On 25/09/15 19:03, Tim Wescott wrote:
> On Thu, 24 Sep 2015 23:26:22 -0400, DJ Delorie wrote: > >> Tim Wescott <seemywebsite@myfooter.really> writes: >>> I, for one, would really like to see one of our resident compiler >>> experts answer the question at the bottom. >> >> Well, I'm the resident compiler guy who fixed it in gcc :-) >> >> As others have noted, you need the version of gcc that supports >> -fstrict-volatile-bitfields, *and* you need to have a well-defined (i.e. >> unambigious) struct (no holes, consistent types, etc), *and* you need to >> have a port that supports it (ARM does), *and* it has to be a >> size/alignment that the hardware supports (duh), *and* the >> -fstrict-volatile-bitfields options needs to be enabled (a later version >> of gcc has a third setting that means "unless the ABI says otherwise" >> but the result is the same *if* your struct is well-defined) (-fs-v-b >> might (should?) be the default for ports that support it, I don't recall >> if they argued to change that). >> >> So a struct like this should be OK: >> >> volatile struct foo { >> int32_t a:17; >> int32_t b:5; int32_t c:10; >> }; >> >> but this would not be: >> >> volatile struct foo { >> char a:8; int32_t b:24; >> }; >> >> But, the only portable reliable way to guarantee the right type of >> access is to not use bitfields. > > Thanks DJ, for the work as well as the comments. > > I'm willing to put up with the non-portability caused by bitfields, > because at this point I don't see moving away from gcc for any reason > (well, unless some client pays me to, and if so they can pay for all the > rest, too). >
As I see it, there is only one reasonable choice for an alternative to gcc for ARM development, and that is clang/llvm. Clang is in many ways gcc's strongest competitor - and also its closest partner, as the developers work together on features such as the sanitizers. A key point here is that ARM have chosen clang/llvm as their compiler for the official ARM development tools of the future (replacing the Keil compiler). Of course, ARM continues to support gcc development - they want to sell cores, not compilers! But if you do change to clang in the future, you can expect your bitfields to continue working - clang make a point of keeping as good compatibility with gcc as possible.
Tim Wescott <seemywebsite@myfooter.really> wrote:

> I'm willing to put up with the non-portability caused by bitfields, > because at this point I don't see moving away from gcc for any reason > (well, unless some client pays me to, and if so they can pay for all the > rest, too).
I've been looking at improving my skills with more modern techniques, and this is what I'm planning on using: <https://github.com/andersm/register_templates> It combines the expessiveness of bit fields with the efficiency and control of the traditional macro approach, and should be portable. The caveats are that it's C++14 and so needs a modern compiler. It also absolutely needs to be compiled with optimizations turned on for everything to be precomputed at compile time, and I've found some GCC ports to be fiddly about that (eg. the SuperH port of GCC 5.2.0 would not precompute the bitmask at -Os, but the RX port had no such problems). -a
Em quarta-feira, 30 de setembro de 2015 05:41:46 UTC-3, Anders....@kapsi.spam.stop.fi.invalid  escreveu:
> Tim Wescott <seemywebsite@myfooter.really> wrote: > > > I'm willing to put up with the non-portability caused by bitfields, > > because at this point I don't see moving away from gcc for any reason > > (well, unless some client pays me to, and if so they can pay for all the > > rest, too). > > I've been looking at improving my skills with more modern techniques, > and this is what I'm planning on using: > <https://github.com/andersm/register_templates> > > It combines the expessiveness of bit fields with the efficiency and > control of the traditional macro approach, and should be portable. The > caveats are that it's C++14 and so needs a modern compiler. It also > absolutely needs to be compiled with optimizations turned on for > everything to be precomputed at compile time, and I've found some GCC > ports to be fiddly about that (eg. the SuperH port of GCC 5.2.0 would > not precompute the bitmask at -Os, but the RX port had no such > problems). > > -a
You can ease the optimizer's work if make mask = (1 << width) - 1 -- Francisco
On 02/10/15 02:34, frantas@gmail.com wrote:
> Em quarta-feira, 30 de setembro de 2015 05:41:46 UTC-3, Anders....@kapsi.spam.stop.fi.invalid escreveu: >> Tim Wescott <seemywebsite@myfooter.really> wrote: >> >>> I'm willing to put up with the non-portability caused by bitfields, >>> because at this point I don't see moving away from gcc for any reason >>> (well, unless some client pays me to, and if so they can pay for all the >>> rest, too). >> >> I've been looking at improving my skills with more modern techniques, >> and this is what I'm planning on using: >> <https://github.com/andersm/register_templates> >> >> It combines the expessiveness of bit fields with the efficiency and >> control of the traditional macro approach, and should be portable. The >> caveats are that it's C++14 and so needs a modern compiler. It also >> absolutely needs to be compiled with optimizations turned on for >> everything to be precomputed at compile time, and I've found some GCC >> ports to be fiddly about that (eg. the SuperH port of GCC 5.2.0 would >> not precompute the bitmask at -Os, but the RX port had no such >> problems). >> >> -a > > You can ease the optimizer's work if make > mask = (1 << width) - 1 >
To put that in context, that would be: template<typename reg_type> constexpr reg_type generate_mask(unsigned int width, unsigned int position) { return (width ? ((((((reg_type) 1) << 1) << (width - 1)) - 1) << position) : 0); } This has the advantage that it is C++11 compatible, and does not need C++14 (C++14 allows more general constexpr functions than C++11). For those that are wondering about the weirdness of shifting first by 1, then by (width - 1), it is to avoid undefined behaviour in the code. The C and C++ standards make "a << b" undefined if b is greater than or equal to the maximum of the width of the type a and the width of an int. That is, on a 32-bit system "a << 32" is undefined if a is 32-bit. If "a" is a uint16_t, then "a << 16" is defined and always 0. But even though for "uint32_t a", "a << 32" will almost certainly be evaluated to 0, it is still undefined. "(a << 1) << 31", on the other hand, /is/ defined, and it will be 0. #include <stdint.h> template<typename reg_type> constexpr reg_type generate_mask(unsigned int width, unsigned int position) { return (width ? ((((((reg_type) 1) << 1) << (width - 1)) - 1) << position) : 0); } static_assert(generate_mask<uint8_t>(2, 3) == 0b11000, "Check mask"); static_assert(generate_mask<uint8_t>(8, 0) == 0xff, "Check mask"); static_assert(generate_mask<uint8_t>(8, 4) == 0xf0, "Check mask"); static_assert(generate_mask<uint8_t>(0, 4) == 0x00, "Check mask"); static_assert(generate_mask<uint64_t>(32, 16) == 0x0000ffffffff0000, "Check mask"); static_assert(generate_mask<uint64_t>(64, 0) == 0xffffffffffffffff, "Check mask");
David Brown <david.brown@hesbynett.no> wrote:
> On 02/10/15 02:34, frantas@gmail.com wrote:
>> You can ease the optimizer's work if make >> mask = (1 << width) - 1
Yep, good observation. With this change the mask is precomputed at all optimization levels above 0.
> For those that are wondering about the weirdness of shifting first by 1, > then by (width - 1), it is to avoid undefined behaviour in the code. > The C and C++ standards make "a << b" undefined if b is greater than > or equal to the maximum of the width of the type a and the width of an > int.
The static assertions in the reg_t template would trigger a build failure in this case. -a
On 04/10/15 20:12, Anders.Montonen@kapsi.spam.stop.fi.invalid wrote:
> David Brown <david.brown@hesbynett.no> wrote: >> On 02/10/15 02:34, frantas@gmail.com wrote: > >>> You can ease the optimizer's work if make >>> mask = (1 << width) - 1 > > Yep, good observation. With this change the mask is precomputed at all > optimization levels above 0. > >> For those that are wondering about the weirdness of shifting first by 1, >> then by (width - 1), it is to avoid undefined behaviour in the code. >> The C and C++ standards make "a << b" undefined if b is greater than >> or equal to the maximum of the width of the type a and the width of an >> int. > > The static assertions in the reg_t template would trigger a build > failure in this case. >
No, they would not trigger here. (Your code would not have this problem, since it does the shifts one at a time - but it /would/ have this issue if you did the shifts all in a single shift expression.) Your use of static assertions is a good idea, of course, but you should try to get the details right on every corner case. In this case, if you have a 32-bit int, with width equal to 32 and position 0, then (1 << width) is undefined - but your static assertion will not trigger until width is /greater/ than 32. If you are curious as to why (1u << 32) is undefined (for a 32-bit int C), the reason is that many cpus have a rotation instruction for calculating x << y but the implementation details will vary when y is 32 or more. On some hardware, the result will always be 0 - but on other hardware, the result will be calculated with y % 32, and thus x << y will be x if y is 32. By leaving the result as undefined, the C standards let compilers use a single fast instruction - if the behaviour had been defined, then some compilers would have to add extra checks for x << y calculations.
David Brown <david.brown@hesbynett.no> wrote:
> On 04/10/15 20:12, Anders.Montonen@kapsi.spam.stop.fi.invalid wrote:
(snip)
>> Yep, good observation. With this change the mask is precomputed at all >> optimization levels above 0.
(snip)
> In this case, if you have a 32-bit int, with width equal to 32 and > position 0, then (1 << width) is undefined - but your static assertion > will not trigger until width is /greater/ than 32.
> If you are curious as to why (1u << 32) is undefined (for a 32-bit int > C), the reason is that many cpus have a rotation instruction for > calculating x << y but the implementation details will vary when y is 32 > or more.
Rotate (by one) instructions were common on some early CPUs. That included both N bit and N+1 (rotate through carry), where combinations of such would shift between registers. A little later, and as processors got more powerful, the multiple bit shifts appeared. For some, there would be a processor loop shifting one bit at a time, such that the shift time was proportional (plus a small constant) the the shift amount. Consider that a 36 bit machine might allow for a 2**36 bit shift. More specifically, the 8086 allows for a 256 bit shift, as the shift amount is in an 8 bit register. With the 80286 the shift amount was changed to modulo 32, for one to reduce the possible instruction execution time, but also it allows for a barrel shifter instead of a processor loop. For another specific case, IBM S/360 does both single (32 bit) and double register (64 bit) shifts modulo 64. Early versions of Hercules used the C shift operator on 32 bit values, not realizing that it needed to check for appropriate shift values. Figuring that out was my biggest contribution to Hercules.
> On some hardware, the result will always be 0 - but on other > hardware, the result will be calculated with y % 32, and thus x << y > will be x if y is 32. By leaving the result as undefined, the C > standards let compilers use a single fast instruction - if the behaviour > had been defined, then some compilers would have to add extra checks for > x << y calculations.
If more processors didn't do shifts modulo the register size, I suspect C wouldn't have specified it that way. But they do, and so does C. -- glen
David Brown <david.brown@hesbynett.no> wrote:
> On 04/10/15 20:12, Anders.Montonen@kapsi.spam.stop.fi.invalid wrote: >> The static assertions in the reg_t template would trigger a build >> failure in this case. > No, they would not trigger here.
You are of course right, I was looking at the wrong shift.
> If you are curious as to why (1u << 32) is undefined (for a 32-bit int > C)
Shifting unsigned integers is defined: "The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1*2^E2, reduced modulo one more than the maximum value representable in the result type." "The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a non-negative value, the value of the result is the integral part of the quotient of E1/2^E2." Accordingly, calculating the mask as an unsigned long long should avoid problems and be more readable. -a
On 06/10/15 14:29, Anders.Montonen@kapsi.spam.stop.fi.invalid wrote:
> David Brown <david.brown@hesbynett.no> wrote: >> On 04/10/15 20:12, Anders.Montonen@kapsi.spam.stop.fi.invalid wrote: >>> The static assertions in the reg_t template would trigger a build >>> failure in this case. >> No, they would not trigger here. > > You are of course right, I was looking at the wrong shift. > >> If you are curious as to why (1u << 32) is undefined (for a 32-bit int >> C) > > Shifting unsigned integers is defined: > "The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits > are zero-filled. If E1 has an unsigned type, the value of the result is > E1*2^E2, reduced modulo one more than the maximum value representable in > the result type." > "The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has > an unsigned type or if E1 has a signed type and a non-negative value, > the value of the result is the integral part of the quotient of > E1/2^E2."
I'm guessing that is from 5.8 in the C++ standards (N3337 is the freely available version for C++11). But you forgot the paragraph above: "The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand." So the left operand (1u in this case) is promoted if necessary (i.e., if it is smaller than "int" or "unsigned int" it gets bumped up to "int" or "unsigned int"). If the right operand is greater than or equal to the length in bits of the promoted left operator, the result is undefined. Thus on a 32-bit system, ((uint16_t) 1) << 16 is defined and will be 0, as the calculation is done as though you used a 32-bit unsigned int and then reduced the result to 16-bit. But ((uint32_t) 1) << 32 is /not/ defined - it could reasonably be 0, it could reasonably be 1, or the compiler could assume that the programmer does not care what the result is. (The standards for C have similar wording, and the same effect.)
> > Accordingly, calculating the mask as an unsigned long long should avoid > problems and be more readable. >
You could certainly use ((uint64_t) 1u) for the shift - that would would up to 63 bits. But you would hit the same problem at 64 bits. It is better to have the somewhat ugly code here - it is correct, will work regardless of the bit size of the system, and will still give optimal code (assuming a sane compiler and sensible compiler options).
The 2026 Embedded Online Conference