Hi,

in the recent days I was reading an article describing
how to optimise instruction cache behavior by modifying
the placement of functions in memory. The article can 
be found under:

http://www.dspdesignline.com/howto/processors_fpga/202804122

I'm little bit confused about the effectiveness of the memory 
layout achieved by the described algorithm. Unfortunately, 
there's no detailed reasoning about why the found chains 
(functions placed continuously in memory) improve instruction
cache performance.

For example, I don't see an advantage of the third chain 
3-8-12-18 (see figure 3). After executing node 8, a large amount 
of instructions of other nodes will be fetched into the cache 
before node 12is executed. So, prefetching would not help here 
(or one assumes that all considered functions are very small 
which is not realistic).

Can anyone shed some light on this?

Thank you.

Regards,
Tim