Hi,
in the recent days I was reading an article describing
how to optimise instruction cache behavior by modifying
the placement of functions in memory. The article can
be found under:
http://www.dspdesignline.com/howto/processors_fpga/202804122
I'm little bit confused about the effectiveness of the memory
layout achieved by the described algorithm. Unfortunately,
there's no detailed reasoning about why the found chains
(functions placed continuously in memory) improve instruction
cache performance.
For example, I don't see an advantage of the third chain
3-8-12-18 (see figure 3). After executing node 8, a large amount
of instructions of other nodes will be fetched into the cache
before node 12is executed. So, prefetching would not help here
(or one assumes that all considered functions are very small
which is not realistic).
Can anyone shed some light on this?
Thank you.
Regards,
Tim