Reply by pozz June 28, 20192019-06-28
Il 27/06/2019 20:23, Niklas Holsti ha scritto:
> On 19-06-27 19:37 , pozz wrote: >> Il 27/06/2019 16:47, Niklas Holsti ha scritto: >>> On 19-06-27 16:59 , pozz wrote: >>>> Il 27/06/2019 13:42, upsidedown@downunder.com ha scritto: >>>>> On Thu, 27 Jun 2019 13:08:20 +0200, pozz <pozzugno@gmail.com> wrote: >>> >>> &#2013266080; [snip] >>> >>>>>> However the dynamic allocation algorithm appears very strange to >>>>>> me. I >>>>>> expected that malloc() returns always the start of the heap if the >>>>>> heap >>>>>> is completely free. It is so at the beginning, however when some >>>>>> allocations and deallocations happen, even if the heap is completely >>>>>> deallocated (all the allocations were freed), a new allocation >>>>>> doesn't >>>>>> return the start of the heap. >>>>>> How to explain this behaviour? >>>>> >>>>> malloc() needs to store the actual allocated size (user requested >>>>> size+possible align to word/longword boundary), [...snip...]&#2013266080; thus >>>>> the user >>>>> receiver never starts at the start of heap, but after this descriptor. >>>> >>>> Yes, I know.&#2013266080; When I wrote that malloc() returns a different address >>>> than the start of the heap, I meant it is *very* different. >>>> >>>> In other words, if the heap is at 0x1000, after many >>>> allocations/deallocations, when the heap is again completely free, >>>> malloc() returns 0x1100, 256 bytes from the start of the heap. >>> >>> If it is always 256 bytes, whatever the pattern of allocations and >>> releases, then I would guess that your heap manager allocates a >>> 256-byte block for its own use, on first call, and keeps that block >>> allocated thereafter. >> >> No, it isn't my case. >> >> >>> If the offset varies depending on the size and order of allocations >>> and releases, your are probably seeing an effect of the allocation >>> algorithm which tries to find the "best" (or a "good enough") free >>> block to allocate. Even if all heap blocks have been released, this >>> "good" block may not be at the start of the heap, probably because the >>> allocation algorithm does not coalesce all adjacent free blocks. >> >> Oh bad! >> >> >>> It is quite complex (and can be processor-heavy) to structure and >>> manage the heap area in such a way that releasing all allocated blocks >>> returns the heap to its original state as a single, large, empty area. >> >> Complex? Really? > > Ok, not complex in the special case when _all_ blocks are released, but > complex in general to detect and coalesce adjacent free blocks while > _some_ blocks remain allocated. > >>> Instead, the heap area tends to be left as a collection of "free" >>> blocks, of various sizes, and the next malloc() call picks one of >>> these blocks to use for the new allocation. >> >> However is strange to me. If I have some allocated and freed blocks >> scattered on the heap space, I could understand.&#2013266080; But when there aren't >> allocated blocks, but only free blocks, I think it's very simple to >> fallback to an initial state without *any* free blocks. > > Yes, this special case could be detected by keeping count of the number > of blocks allocated but not yet released, and reinitializing the heap > when releasing a block makes this count zero. However, in programs which > use the heap for more than one kind of data I expect this case to happen > rarely.
I see.
>>> But the details depend on the particular allocation algorithm, and >>> there are _many_ different algorithms. >> >> I'm using newlib and I think the allocator is this[1]. > > That seems to be using the "first-fit" algorithm and a single free-block > list holding blocks of any size. The free-block list is sorted into > address order which lets the algorithm coalesce adjacent free blocks > when such appear. However, it means that free() must traverse the > (potentially long) free-block list when inserting the released block > into the list at the proper place, which can take a while; Order(N) > where N = length of free-block list. > > After a glance at the code, it seems to me that this coalescing > algorithm should result in a single, large free block after all > allocated blocks are released, so I don't understand why that does not > happen in your case.
I don't know too. I'm not sure I'm using that allocator, because it is included in newlib-nano, that are the libraries I selected in NXP MCUXpresso IDE.
Reply by pozz June 28, 20192019-06-28
Il 27/06/2019 23:23, Paul Rubin ha scritto:
> pozz <pozzugno@gmail.com> writes: >> However the dynamic allocation algorithm appears very strange to me. I >> expected that malloc() returns always the start of the heap if the >> heap is completely free. > > I wouldn't necessarily expect that, but your best bet is to inspect the > malloc library source code if you want to know what it is doing. > > For your purpose I wonder if you could use an arena-style allocator > instead of a lot of small mallocs and frees. That means allocate a > single chunk of memory, then sequentially allocate from within the > chunk, then free the whole chunk. > > It's great that you have TCP and TLS working in so little ram,
Yes, it has been a very difficult project.
> but I > wonder if you could make your life easier by using a bigger MCU with > more memory, so you don't have to manage it quite so carefully.
Yes, I know. However we have an already designed and produced board that we wanted to reuse with TLS. Next hardware will have much more RAM :-)
Reply by Paul Rubin June 27, 20192019-06-27
pozz <pozzugno@gmail.com> writes:
> However the dynamic allocation algorithm appears very strange to me. I > expected that malloc() returns always the start of the heap if the > heap is completely free.
I wouldn't necessarily expect that, but your best bet is to inspect the malloc library source code if you want to know what it is doing. For your purpose I wonder if you could use an arena-style allocator instead of a lot of small mallocs and frees. That means allocate a single chunk of memory, then sequentially allocate from within the chunk, then free the whole chunk. It's great that you have TCP and TLS working in so little ram, but I wonder if you could make your life easier by using a bigger MCU with more memory, so you don't have to manage it quite so carefully.
Reply by George Neuner June 27, 20192019-06-27
On Thu, 27 Jun 2019 17:47:52 +0300, Niklas Holsti
<niklas.holsti@tidorum.invalid> wrote:


>The only heap-allocation algorithm I know that always returns the heap >to its initial state is the "buddy algorithm" >(https://en.wikipedia.org/wiki/Buddy_memory_allocation) but there may be >other, newer ones.
Or older ones. <grin> Mark-Release allocators also return to the initial state. For those that don't recognize this ancient term, Mark-Release is stack allocation from the heap. George
Reply by Niklas Holsti June 27, 20192019-06-27
On 19-06-27 19:37 , pozz wrote:
> Il 27/06/2019 16:47, Niklas Holsti ha scritto: >> On 19-06-27 16:59 , pozz wrote: >>> Il 27/06/2019 13:42, upsidedown@downunder.com ha scritto: >>>> On Thu, 27 Jun 2019 13:08:20 +0200, pozz <pozzugno@gmail.com> wrote: >> >> [snip] >> >>>>> However the dynamic allocation algorithm appears very strange to me. I >>>>> expected that malloc() returns always the start of the heap if the >>>>> heap >>>>> is completely free. It is so at the beginning, however when some >>>>> allocations and deallocations happen, even if the heap is completely >>>>> deallocated (all the allocations were freed), a new allocation doesn't >>>>> return the start of the heap. >>>>> How to explain this behaviour? >>>> >>>> malloc() needs to store the actual allocated size (user requested >>>> size+possible align to word/longword boundary), [...snip...] thus >>>> the user >>>> receiver never starts at the start of heap, but after this descriptor. >>> >>> Yes, I know. When I wrote that malloc() returns a different address >>> than the start of the heap, I meant it is *very* different. >>> >>> In other words, if the heap is at 0x1000, after many >>> allocations/deallocations, when the heap is again completely free, >>> malloc() returns 0x1100, 256 bytes from the start of the heap. >> >> If it is always 256 bytes, whatever the pattern of allocations and >> releases, then I would guess that your heap manager allocates a >> 256-byte block for its own use, on first call, and keeps that block >> allocated thereafter. > > No, it isn't my case. > > >> If the offset varies depending on the size and order of allocations >> and releases, your are probably seeing an effect of the allocation >> algorithm which tries to find the "best" (or a "good enough") free >> block to allocate. Even if all heap blocks have been released, this >> "good" block may not be at the start of the heap, probably because the >> allocation algorithm does not coalesce all adjacent free blocks. > > Oh bad! > > >> It is quite complex (and can be processor-heavy) to structure and >> manage the heap area in such a way that releasing all allocated blocks >> returns the heap to its original state as a single, large, empty area. > > Complex? Really?
Ok, not complex in the special case when _all_ blocks are released, but complex in general to detect and coalesce adjacent free blocks while _some_ blocks remain allocated.
>> Instead, the heap area tends to be left as a collection of "free" >> blocks, of various sizes, and the next malloc() call picks one of >> these blocks to use for the new allocation. > > However is strange to me. If I have some allocated and freed blocks > scattered on the heap space, I could understand. But when there aren't > allocated blocks, but only free blocks, I think it's very simple to > fallback to an initial state without *any* free blocks.
Yes, this special case could be detected by keeping count of the number of blocks allocated but not yet released, and reinitializing the heap when releasing a block makes this count zero. However, in programs which use the heap for more than one kind of data I expect this case to happen rarely.
>> But the details depend on the particular allocation algorithm, and >> there are _many_ different algorithms. > > I'm using newlib and I think the allocator is this[1].
That seems to be using the "first-fit" algorithm and a single free-block list holding blocks of any size. The free-block list is sorted into address order which lets the algorithm coalesce adjacent free blocks when such appear. However, it means that free() must traverse the (potentially long) free-block list when inserting the released block into the list at the proper place, which can take a while; Order(N) where N = length of free-block list. After a glance at the code, it seems to me that this coalescing algorithm should result in a single, large free block after all allocated blocks are released, so I don't understand why that does not happen in your case. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Reply by Joe Chisolm June 27, 20192019-06-27
On Thu, 27 Jun 2019 13:08:20 +0200, pozz wrote:

> I normally don't use dynamic allocation in embedded projects, however > there are situations where this isn't possible. > > I'm working on a project with LPC1768 Cortex-M3 MCU that features > 32kB+32kB RAM. I'm using lwip+mbedTLS to make a TLS connection to AWS > IoT Core through Ethernet. > > After a precise allocation of data blocks (incoming Ethernet packets, > outgoing packets and so on), it seems working. However I'm using dynamic > allocation, mainly for outgoing packets: when some application (MQTT, > DNS client, ...) wants to transmit, lwip requests some space from the > heap. > > lwip can be configured to use a set of memory pools of different sizes > to avoid dynamic allocation, but it's very difficult to configure. > During execution I saw allocation requests of very different sizes: 5-10 > bytes, 50-100 bytes, 800 bytes, 1500 bytes. > > However the dynamic allocation algorithm appears very strange to me. I > expected that malloc() returns always the start of the heap if the heap > is completely free. It is so at the beginning, however when some > allocations and deallocations happen, even if the heap is completely > deallocated (all the allocations were freed), a new allocation doesn't > return the start of the heap. > How to explain this behaviour? > > I noticed my allocation pattern is very simple and, after 1-5 > allocations, there are other 1-5 deallocations and I eventually have a > completely free heap, when all the outgoing packets are serially shifted > out and/or acked from the remote side. > > For example: > - malloc(3)=addr1 - free(addr1) -> free heap - malloc(5)=addr2 - > malloc(60)=add3 - free(addr2) > - free(addr3) -> free heap - malloc(800)=addr4 - malloc(5)=addr5 - > malloc(32)=addr6 - free(addr5) > - malloc(80)=addr7 - free(addr6) > - free(addr7) > - free(addr4) -> free heap > > With this pattern, I think I can simplify the allocation algorithm so > that it returns always the start of the heap when it is completely free > and reduce fragmentation of complex algorithm.
It could be trying to do a best fit algorithm. Also depends if it's trying to merge adjacent blocks. Might try malloc(3) malloc(5) free both malloc(8) But I suspect since there is still plenty of free space in the heap malloc will take the quick way and alloc a new block. It wont do the expensive best fit or merge until it's out of space. -- Chisolm Republic of Texas
Reply by pozz June 27, 20192019-06-27
Il 27/06/2019 16:47, Niklas Holsti ha scritto:
> On 19-06-27 16:59 , pozz wrote: >> Il 27/06/2019 13:42, upsidedown@downunder.com ha scritto: >>> On Thu, 27 Jun 2019 13:08:20 +0200, pozz <pozzugno@gmail.com> wrote: > > &#2013266080;[snip] > >>>> However the dynamic allocation algorithm appears very strange to me. I >>>> expected that malloc() returns always the start of the heap if the heap >>>> is completely free. It is so at the beginning, however when some >>>> allocations and deallocations happen, even if the heap is completely >>>> deallocated (all the allocations were freed), a new allocation doesn't >>>> return the start of the heap. >>>> How to explain this behaviour? >>> >>> malloc() needs to store the actual allocated size (user requested >>> size+possible align to word/longword boundary), [...snip...]&#2013266080; thus >>> the user >>> receiver never starts at the start of heap, but after this descriptor. >> >> Yes, I know.&#2013266080; When I wrote that malloc() returns a different address >> than the start of the heap, I meant it is *very* different. >> >> In other words, if the heap is at 0x1000, after many >> allocations/deallocations, when the heap is again completely free, >> malloc() returns 0x1100, 256 bytes from the start of the heap. > > If it is always 256 bytes, whatever the pattern of allocations and > releases, then I would guess that your heap manager allocates a 256-byte > block for its own use, on first call, and keeps that block allocated > thereafter.
No, it isn't my case.
> If the offset varies depending on the size and order of allocations and > releases, your are probably seeing an effect of the allocation algorithm > which tries to find the "best" (or a "good enough") free block to > allocate. Even if all heap blocks have been released, this "good" block > may not be at the start of the heap, probably because the allocation > algorithm does not coalesce all adjacent free blocks.
Oh bad!
> It is quite complex (and can be processor-heavy) to structure and manage > the heap area in such a way that releasing all allocated blocks returns > the heap to its original state as a single, large, empty area.
Complex? Really?
> Instead, > the heap area tends to be left as a collection of "free" blocks, of > various sizes, and the next malloc() call picks one of these blocks to > use for the new allocation.
However is strange to me. If I have some allocated and freed blocks scattered on the heap space, I could understand. But when there aren't allocated blocks, but only free blocks, I think it's very simple to fallback to an initial state without *any* free blocks.
> But the details depend on the particular > allocation algorithm, and there are _many_ different algorithms.
I'm using newlib and I think the allocator is this[1].
> The only heap-allocation algorithm I know that always returns the heap > to its initial state is the "buddy algorithm" > (https://en.wikipedia.org/wiki/Buddy_memory_allocation) but there may be > other, newer ones.
I tried to use mbedTLS buffer memory allocator[2] and it seems it works better than newlib allocator. I had a problem during a TLS connection that was caused by an out of memory. After some allocations/deallocations, the heap was completely free. mbedTLS requests two allocations: 300 and 1300 bytes. I had around 2000 bytes of heap space, but the malloc failed. The 300-bytes block was allocated at an offset of 500 bytes from the beginning of the heap. Of course, 1300 bytes couldn't be allocated. Increasing heap space solved the problem. But using mbedTLS allocator, solved the issue too. [1] https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdlib/nano-mallocr.c;h=13b72c99ffd7007b53c2e3270a56da237857742a;hb=HEAD [2] https://github.com/ARMmbed/mbedtls/blob/development/library/memory_buffer_alloc.c
Reply by Niklas Holsti June 27, 20192019-06-27
On 19-06-27 16:59 , pozz wrote:
> Il 27/06/2019 13:42, upsidedown@downunder.com ha scritto: >> On Thu, 27 Jun 2019 13:08:20 +0200, pozz <pozzugno@gmail.com> wrote:
[snip]
>>> However the dynamic allocation algorithm appears very strange to me. I >>> expected that malloc() returns always the start of the heap if the heap >>> is completely free. It is so at the beginning, however when some >>> allocations and deallocations happen, even if the heap is completely >>> deallocated (all the allocations were freed), a new allocation doesn't >>> return the start of the heap. >>> How to explain this behaviour? >> >> malloc() needs to store the actual allocated size (user requested >> size+possible align to word/longword boundary), [...snip...] thus the user >> receiver never starts at the start of heap, but after this descriptor. > > Yes, I know. When I wrote that malloc() returns a different address > than the start of the heap, I meant it is *very* different. > > In other words, if the heap is at 0x1000, after many > allocations/deallocations, when the heap is again completely free, > malloc() returns 0x1100, 256 bytes from the start of the heap.
If it is always 256 bytes, whatever the pattern of allocations and releases, then I would guess that your heap manager allocates a 256-byte block for its own use, on first call, and keeps that block allocated thereafter. If the offset varies depending on the size and order of allocations and releases, your are probably seeing an effect of the allocation algorithm which tries to find the "best" (or a "good enough") free block to allocate. Even if all heap blocks have been released, this "good" block may not be at the start of the heap, probably because the allocation algorithm does not coalesce all adjacent free blocks. It is quite complex (and can be processor-heavy) to structure and manage the heap area in such a way that releasing all allocated blocks returns the heap to its original state as a single, large, empty area. Instead, the heap area tends to be left as a collection of "free" blocks, of various sizes, and the next malloc() call picks one of these blocks to use for the new allocation. But the details depend on the particular allocation algorithm, and there are _many_ different algorithms. The only heap-allocation algorithm I know that always returns the heap to its initial state is the "buddy algorithm" (https://en.wikipedia.org/wiki/Buddy_memory_allocation) but there may be other, newer ones. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Reply by pozz June 27, 20192019-06-27
Il 27/06/2019 13:42, upsidedown@downunder.com ha scritto:
> On Thu, 27 Jun 2019 13:08:20 +0200, pozz <pozzugno@gmail.com> wrote: > >> I normally don't use dynamic allocation in embedded projects, however >> there are situations where this isn't possible. >> >> I'm working on a project with LPC1768 Cortex-M3 MCU that features >> 32kB+32kB RAM. I'm using lwip+mbedTLS to make a TLS connection to AWS >> IoT Core through Ethernet. >> >> After a precise allocation of data blocks (incoming Ethernet packets, >> outgoing packets and so on), it seems working. However I'm using dynamic >> allocation, mainly for outgoing packets: when some application (MQTT, >> DNS client, ...) wants to transmit, lwip requests some space from the heap. >> >> lwip can be configured to use a set of memory pools of different sizes >> to avoid dynamic allocation, but it's very difficult to configure. >> During execution I saw allocation requests of very different sizes: 5-10 >> bytes, 50-100 bytes, 800 bytes, 1500 bytes. >> >> However the dynamic allocation algorithm appears very strange to me. I >> expected that malloc() returns always the start of the heap if the heap >> is completely free. It is so at the beginning, however when some >> allocations and deallocations happen, even if the heap is completely >> deallocated (all the allocations were freed), a new allocation doesn't >> return the start of the heap. >> How to explain this behaviour? > > malloc() needs to store the actual allocated size (user requested > size+possible align to word/longword boundary), so that free() can > deallocate the total amount allocated. In addition dynamic allocation > is often allocated as linked lists so some descriptors may be used. > One convenient place to store this auxiliary data is at the top of the > heap and only then add the user requested buffer, thus the user > receiver never starts at the start of heap, but after this descriptor.
Yes, I know. When I wrote that malloc() returns a different address than the start of the heap, I meant it is *very* different. In other words, if the heap is at 0x1000, after many allocations/deallocations, when the heap is again completely free, malloc() returns 0x1100, 256 bytes from the start of the heap.
> >> >> I noticed my allocation pattern is very simple and, after 1-5 >> allocations, there are other 1-5 deallocations and I eventually have a >> completely free heap, when all the outgoing packets are serially shifted >> out and/or acked from the remote side. >> >> For example: >> - malloc(3)=addr1 >> - free(addr1) -> free heap >> - malloc(5)=addr2 >> - malloc(60)=add3 >> - free(addr2) >> - free(addr3) -> free heap >> - malloc(800)=addr4 >> - malloc(5)=addr5 >> - malloc(32)=addr6 >> - free(addr5) >> - malloc(80)=addr7 >> - free(addr6) >> - free(addr7) >> - free(addr4) -> free heap >> >> With this pattern, I think I can simplify the allocation algorithm so >> that it returns always the start of the heap when it is completely free >> and reduce fragmentation of complex algorithm. >
Reply by June 27, 20192019-06-27
On Thu, 27 Jun 2019 13:08:20 +0200, pozz <pozzugno@gmail.com> wrote:

>I normally don't use dynamic allocation in embedded projects, however >there are situations where this isn't possible. > >I'm working on a project with LPC1768 Cortex-M3 MCU that features >32kB+32kB RAM. I'm using lwip+mbedTLS to make a TLS connection to AWS >IoT Core through Ethernet. > >After a precise allocation of data blocks (incoming Ethernet packets, >outgoing packets and so on), it seems working. However I'm using dynamic >allocation, mainly for outgoing packets: when some application (MQTT, >DNS client, ...) wants to transmit, lwip requests some space from the heap. > >lwip can be configured to use a set of memory pools of different sizes >to avoid dynamic allocation, but it's very difficult to configure. >During execution I saw allocation requests of very different sizes: 5-10 >bytes, 50-100 bytes, 800 bytes, 1500 bytes. > >However the dynamic allocation algorithm appears very strange to me. I >expected that malloc() returns always the start of the heap if the heap >is completely free. It is so at the beginning, however when some >allocations and deallocations happen, even if the heap is completely >deallocated (all the allocations were freed), a new allocation doesn't >return the start of the heap. >How to explain this behaviour?
malloc() needs to store the actual allocated size (user requested size+possible align to word/longword boundary), so that free() can deallocate the total amount allocated. In addition dynamic allocation is often allocated as linked lists so some descriptors may be used. One convenient place to store this auxiliary data is at the top of the heap and only then add the user requested buffer, thus the user receiver never starts at the start of heap, but after this descriptor.
> >I noticed my allocation pattern is very simple and, after 1-5 >allocations, there are other 1-5 deallocations and I eventually have a >completely free heap, when all the outgoing packets are serially shifted >out and/or acked from the remote side. > >For example: >- malloc(3)=addr1 >- free(addr1) -> free heap >- malloc(5)=addr2 >- malloc(60)=add3 >- free(addr2) >- free(addr3) -> free heap >- malloc(800)=addr4 >- malloc(5)=addr5 >- malloc(32)=addr6 >- free(addr5) >- malloc(80)=addr7 >- free(addr6) >- free(addr7) >- free(addr4) -> free heap > >With this pattern, I think I can simplify the allocation algorithm so >that it returns always the start of the heap when it is completely free >and reduce fragmentation of complex algorithm.