Reply by Dave Nadler April 5, 20202020-04-05
On Thursday, February 6, 2020 at 7:26:52 AM UTC-5, pozz wrote:
> [1] http://www.nadler.com/embedded/newlibAndFreeRTOS.html
Pozz, you mentioned you're trying to use ST's RubeMX. Be careful, extremely buggy support libraries and examples!! Latest unbelievable foolishness here: https://community.st.com/s/question/0D50X0000CBmXufSQF/newlibmalloc-locking-mechanism-to-be-threadsafe Avoid the ST suggestions!!! Aaaaarrrrggggg.....
Reply by David Brown February 9, 20202020-02-09
On 09/02/2020 04:42, Paul Rubin wrote:
> David Brown <david.brown@hesbynett.no> writes: >> No, it hasn't accessed unallocated memory at all. It did not access >> anything. The compiler could see that either the malloc worked fine >> and the result would be the value 6, > > It's a question of how the compiler implemented the compile-time > execution. I hope that it didn't really allocate 6 bytes of memory at > compile time, and then write past it. But it makes me wonder. >
I'm lost here. Are you talking about the mistake I made in allocating 4 bytes, rather than 4 * sizeof(int) ? It makes no difference to the code generated when that is corrected, and I don't see where "6 bytes" comes from. The use of "malloc" in the source code bears no direct relation to having to allocate dynamic memory in the compiler. Compile-time execution means figuring out what the effect of the code is, and simulating it at compile time - it does /not/ mean executing it directly. And in particular, baring bugs in the compiler it does not mean executing undefined behaviour in the compiler - though it can mean ignoring it in the source code. (It would have been nice if the compiler had spotted my mistake and told me about it, however.)
Reply by Paul Rubin February 8, 20202020-02-08
David Brown <david.brown@hesbynett.no> writes:
> No, it hasn't accessed unallocated memory at all. It did not access > anything. The compiler could see that either the malloc worked fine > and the result would be the value 6,
It's a question of how the compiler implemented the compile-time execution. I hope that it didn't really allocate 6 bytes of memory at compile time, and then write past it. But it makes me wonder.
Reply by David Brown February 8, 20202020-02-08
On 07/02/2020 17:00, Paul Rubin wrote:
> David Brown <david.brown@hesbynett.no> writes: >> The point is that because the compiler knows what malloc and free do - >> they are specified in the standards - it can use that knowledge for >> optimisation. > > In this case it has accessed unallocated memory. I wonder if there is > an exploit. Hmm. >
No, it hasn't accessed unallocated memory at all. It did not access anything. The compiler could see that either the malloc worked fine and the result would be the value 6, or the malloc would fail (and return 0) in which case the program would have undefined behaviour (accessing a null pointer). The compiler can assume that the programmer doesn't care what happens when executing undefined behaviour, and thus giving a result of 6 is perfectly acceptable there too. So the best code is simply to return 6 without any work at run time. Ironically, if I had checked the result of malloc() for a null pointer, it could not have made this optimisation!
Reply by Paul Rubin February 7, 20202020-02-07
David Brown <david.brown@hesbynett.no> writes:
> The point is that because the compiler knows what malloc and free do - > they are specified in the standards - it can use that knowledge for > optimisation.
In this case it has accessed unallocated memory. I wonder if there is an exploit. Hmm.
Reply by Dave Nadler February 7, 20202020-02-07
On Friday, February 7, 2020 at 8:31:10 AM UTC-5, pozz wrote:
> After your considerations, why use a printf that uses heap? There are > other good implementations that don't use heap at all and so are > intrinsically thread-safe.
There's a list of alternate printf implementations on my web page you referenced. Other library functions like strtok use malloc and friends. Again, whatever you do, check the map and MAKE SURE you don't accidentally drag in non-thread-safe uses of library malloc family... Hope that helps, Best Regards, Dave
Reply by pozz February 7, 20202020-02-07
Il 07/02/2020 10:36, upsidedown@downunder.com ha scritto:
> On Thu, 6 Feb 2020 13:26:47 +0100, pozz <pozzugno@gmail.com> wrote: > >> Usually arm gcc compiler uses newlib (or newlib-nano) for standard C >> libraries (memset, malloc, printf, time and so on). >> >> I sometimes replace newlib functions, because I don't like them. First >> of all, I replace snprintf because newlib implementation uses malloc and >> I don't like to use malloc, mostly if it can be avoided. >> And for printf-like functions, there are a few implementations that >> don't use malloc. > > While dynamic memory fragmentation can be a serious issue in systems > that needs to run a long time (years or decades) without reboots. For > this reason it is a good idea to avoid using malloc and free (or at > least avoid using free :-). Fragmentation occurs when variable size > allocations with different lifetimes are used. > > However, functions like printf may allocate some resources at entry > and release them at exit and the heap state is the same before the > printf function after it has been exited. In fact in this case dynamic > memory is used in the same way as stacks. Much of the functionality > could have been implemented using stack allocation. For some > historical reasons (very small stacks on some early processors), > C-language malloc/hree is used much more frequently compared to other > languages using stack work space. > > In a single task system or in multitasking environment with private > heaps using this kind of stack-like usage should not cause > fragmentation. However in a multitasking environment with a single > shared heap, memory fragmentation can occur, if some other task makes > long lasting allocations while printf is being executed. So in > reality, the whole printf function should be protected against task > switching.
After your considerations, why use a printf that uses heap? There are other good implementations that don't use heap at all and so are intrinsically thread-safe.
Reply by David Brown February 7, 20202020-02-07
On 07/02/2020 10:04, Paul Rubin wrote:
> David Brown <david.brown@hesbynett.no> writes: >> int * p = malloc(N); > > (cough) that allocates N bytes, not N ints.
Just checking that you were paying attention :-)
> >> gcc compiles test to: >> >> test: >> mov eax, 6 >> ret > > Wow! I think it saw the consts and basically ran the code at compile > time. >
Yes, exactly. The point is that because the compiler knows what malloc and free do - they are specified in the standards - it can use that knowledge for optimisation. (The exact point at which it will change from run-time calculation to compile-time calculation is dependent on the compiler, target, options, etc.)
Reply by February 7, 20202020-02-07
On Thu, 6 Feb 2020 13:26:47 +0100, pozz <pozzugno@gmail.com> wrote:

>Usually arm gcc compiler uses newlib (or newlib-nano) for standard C >libraries (memset, malloc, printf, time and so on). > >I sometimes replace newlib functions, because I don't like them. First >of all, I replace snprintf because newlib implementation uses malloc and >I don't like to use malloc, mostly if it can be avoided. >And for printf-like functions, there are a few implementations that >don't use malloc.
While dynamic memory fragmentation can be a serious issue in systems that needs to run a long time (years or decades) without reboots. For this reason it is a good idea to avoid using malloc and free (or at least avoid using free :-). Fragmentation occurs when variable size allocations with different lifetimes are used. However, functions like printf may allocate some resources at entry and release them at exit and the heap state is the same before the printf function after it has been exited. In fact in this case dynamic memory is used in the same way as stacks. Much of the functionality could have been implemented using stack allocation. For some historical reasons (very small stacks on some early processors), C-language malloc/hree is used much more frequently compared to other languages using stack work space. In a single task system or in multitasking environment with private heaps using this kind of stack-like usage should not cause fragmentation. However in a multitasking environment with a single shared heap, memory fragmentation can occur, if some other task makes long lasting allocations while printf is being executed. So in reality, the whole printf function should be protected against task switching.
Reply by David Brown February 7, 20202020-02-07
On 06/02/2020 23:23, pozz wrote:
> Il 06/02/2020 21:41, David Brown ha scritto: >> On 06/02/2020 16:06, pozz wrote: >>> Il 06/02/2020 14:58, David Brown ha scritto: >> >>> >>>> nor can you implement memcpy >>>> or memmove, due to the type aliasing and effective type rules.&#4294967295; If you >>>> want to be sure of problem-free code that is safe regardless of >>>> optimisation, link-time optimisation, new generations of compilers, >>>> etc., then you'll be quite careful and make good use of gcc attributes. >>> >>> What about copying byte by byte? >>> Here[1] you can see newlib memcpy implementation. If >>> PREFER_SIZE_OVER_SPEED or __OPTIMIZE_SIZE__ is defined, the >>> implementation is really copying byte by byte. >>> >>> I don't really know how newlib used by my compiler (CubeIDE from ST) >>> was compiled, maybe I'm using dumb version of memcpy already. >>> >>> This is an extract from a listing: >>> >>> 08025850 <memcpy>: >>> &#4294967295;&#4294967295;8025850:&#4294967295;&#4294967295;&#4294967295; b510&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; push&#4294967295;&#4294967295;&#4294967295; {r4, lr} >>> &#4294967295;&#4294967295;8025852:&#4294967295;&#4294967295;&#4294967295; 1e43&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; subs&#4294967295;&#4294967295;&#4294967295; r3, r0, #1 >>> &#4294967295;&#4294967295;8025854:&#4294967295;&#4294967295;&#4294967295; 440a&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; add&#4294967295;&#4294967295;&#4294967295; r2, r1 >>> &#4294967295;&#4294967295;8025856:&#4294967295;&#4294967295;&#4294967295; 4291&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; cmp&#4294967295;&#4294967295;&#4294967295; r1, r2 >>> &#4294967295;&#4294967295;8025858:&#4294967295;&#4294967295;&#4294967295; d100&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; bne.n&#4294967295;&#4294967295;&#4294967295; 802585c <memcpy+0xc> >>> &#4294967295;&#4294967295;802585a:&#4294967295;&#4294967295;&#4294967295; bd10&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; pop&#4294967295;&#4294967295;&#4294967295; {r4, pc} >>> &#4294967295;&#4294967295;802585c:&#4294967295;&#4294967295;&#4294967295; f811 4b01&#4294967295;&#4294967295;&#4294967295;&#4294967295; ldrb.w&#4294967295;&#4294967295;&#4294967295; r4, [r1], #1 >>> &#4294967295;&#4294967295;8025860:&#4294967295;&#4294967295;&#4294967295; f803 4f01&#4294967295;&#4294967295;&#4294967295;&#4294967295; strb.w&#4294967295;&#4294967295;&#4294967295; r4, [r3, #1]! >>> &#4294967295;&#4294967295;8025864:&#4294967295;&#4294967295;&#4294967295; e7f7&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; b.n&#4294967295;&#4294967295;&#4294967295; 8025856 <memcpy+0x6> >>> >>> I'm not an expert of assembly, but it seems to me it is implemented >>> in the simple and not optimized way. >>> >> >> It is not the actual copying that is the problem - copying by char is >> simple and safe (though often inefficient).&#4294967295; The issue is that the C >> standards say memcpy also copies the effective type in certain >> circumstances - there is no way to specify that in C, and it is >> therefore a special feature of the library memcpy.&#4294967295; > > Could you make an example? I didn't understand.
Suppose you have a block "b" of memory allocated on the heap with malloc - it has no "declared type" because it was not part of a C-defined object. You are free to store data of any kind in "b", and it takes on a type based on the access you used to store to "b" (unless you use character type access, which leaves it untyped). Let's say you treat "b" as an array of floats and fill it up - now its effective type is float[]. Suppose you have another C object or array "s" with a specific type from somewhere - such as an array of char* pointers. You want to copy the contents of "s" into "b". You could do this in several ways: 1. Read from "s" as char* pointers, converting to a float using a union, and write it to "b". Then "b" is still an array of floats and the compiler knows that any access to it as an array of char* pointers is undefined behaviour - it can assume it can't happen. Beware the nasal demons! 2. Make a pointer "char* * p = (char* *) b" and use that as the destination when copying from "s" to "b". The compiler knows that "b" is now an array char* pointers, and can be accessed as such. Everything works, but you need a specific copying function each time. 3. Make a generic function that copies using unsigned char, and call that to copy from s to b. Then b takes on the effective type of s, and so b is now an array of char* pointers. Everything works, but copying is inefficient. 4. Make a generic function that copies using uint32_t for speed, and call that to copy from s to b. Then b becomes an array of uint32_t, and accessing it as an array of char* pointers is undefined behaviour. Nasal demons again. 5. Call the standard library memcpy. Then b gets the effective type of s, and everything works. This is true whether the compiler generates a local loop, or calls the library function, and it is true whether the copying is done by byte or in larger lumps. The library memcpy is special here - you cannot duplicate that behaviour in standard C. This kind of thing - type based alias analysis and the effective type rules in C - is difficult to get right. And it is not often that the compiler can use this extra information for optimisation. But sometimes it can. And sometimes it uses it for an optimisation that is correct according to the C code you wrote, but not according to what you wanted. Understanding the rules is hard, and sometimes playing by the rules is even harder, so one solution is to change the rules. The "-fno-strict-aliasing" flag in gcc changes the semantics of C to say that the effective type of an object is always the type used to access it - this simplifies things a lot here, at the cost of occasionally missed optimisation opportunities. For example, the Linux kernel is always compiled with "-fno-strict-aliasing". (Note that this flag does not help with the aliasing issue with home-made malloc, as that's a different thing entirely.)
> > Anyway as you can see, newlib just implements memcpy in pure C language > when compiled without optimizations. Are you saying it's bugged?
No - it can be treated as special because it is the standard library for your implementation.
> > >> A homemade memcpy does not have that same feature.&#4294967295; (In a similar >> vain, there is no way to get memory in standard C that has "no >> declared type" except via the library malloc and friends - a homemade >> malloc won't do.)&#4294967295; I am not sure what the best solution is here. >> >> Anyway, for memcpy make sure the compiler can use the builtin versions >> where possible (avoid -ffreestanding, or use -fbuiltin) as this will >> give far better code. >> > > I don't use -ffreestanding, but I don't know if I'm using -fbuiltin. > Anyway you are suggesting to use builtin functions that are functions > built *in* the compiler and not in the newlib. > > Another reason to consider useless newlib. >
When you use one of the common "small" functions in the C standard library, like memcpy, memset, strcat, etc., the compiler knows what they do without knowing the source. If it can make smaller or faster code inline with the same effect as specified in the standards, then it may do so. Typically for memcpy that means the compiler knows the size of the copy and the alignments at compiler time. For example: uint32_t rawfloat(float f) { uint32_t u; memcpy(&u, &f, sizeof(u)); return u; } This will be turned into a register move (if needed, depending on the cpu), with nothing stored in memory and no library calls made. And unlike faffing around with pointer casts, it is correct C code. And unlike using a type-punning union, it is correct C++ code as well as correct C code. But more general calls to memcpy will be passed on to the library function.