EmbeddedRelated.com
Forums

Optimizing Instruction Cache

Started by Tim Frink November 1, 2007
Hi,

I'm interested in compiler optimisations aiming at the
improvement of the instruction cache behavior. For data
caches there are numerous compiler optimisations, many
of them operating on loops. However, for instruction caches
I've just found some very few on google.

Could you give my any hints/references for any (further) 
instruction cache optimizations? I appreciate any answers.

Best regards,
Tim
Tim Frink wrote:
> > I'm interested in compiler optimisations aiming at the improvement > of the instruction cache behavior. For data caches there are > numerous compiler optimisations, many of them operating on loops. > However, for instruction caches I've just found some very few on > google. > > Could you give my any hints/references for any (further) > instruction cache optimizations? I appreciate any answers.
I don't think there is much here. Instructions are normally executed in sequence, barring jumps. The instruction cache is some sort of buffer holding a portion of the instruction segment. This works as long as the instruction to be next executed lies within that cache. So the things that will help it include taut code and short jump distances. Procedure/function calls will obviously wipe out the cache (usually), so you want to concentrate the loops in the lowest level procedures. You might do something by letting the cache hold several different areas, and selecting which to update under what conditions. But doing this intelligently requires program flow analysis. -- Chuck F (cbfalconer at maineline dot net) Available for consulting/temporary embedded and systems. <http://cbfalconer.home.att.net> -- Posted via a free Usenet account from http://www.teranews.com
"Tim Frink" <plfriko@yahoo.de> wrote in message
news:5oul8nFokv7tU2@mid.individual.net...
> Hi, > > I'm interested in compiler optimisations aiming at the > improvement of the instruction cache behavior. For data > caches there are numerous compiler optimisations, many > of them operating on loops. However, for instruction caches > I've just found some very few on google. > > Could you give my any hints/references for any (further) > instruction cache optimizations? I appreciate any answers. >
The whole point of using cache is not being bothered with the optimization. If you want to optimize the execution by hand, then you can load the critical parts in the L1 RAM and lock it there. The 4+ way cache does a good job on optimization by itself. The code just have to avoid the unlikely situations of the cache stall. Thus the linker has to arrange the code in the way to prevent the jumps to the distance close to the size of the cache. Vladimir Vassilevsky DSP and Mixed Signal Consultant www.abvolt.com
On Nov 1, 2:46 pm, Tim Frink <plfr...@yahoo.de> wrote:
> Hi, > > I'm interested in compiler optimisations aiming at the > improvement of the instruction cache behavior. For data > caches there are numerous compiler optimisations, many > of them operating on loops. However, for instruction caches > I've just found some very few on google. > > Could you give my any hints/references for any (further) > instruction cache optimizations? I appreciate any answers.
I seem to recall the DEC MIPS linker reordered basic blocks to improve code cache performance. Don't know if that's still a useful idea with today's architectures. Code is comparatively small and you have little control over access sequence (beyond basic block order). Data can be huge and all kinds of access sequence fiddling is possible. That's why you see much more work on data cache optimizations. Here's an old reference. Maybe place to start. Scott McFarling. Program optimization for instruction caches. Third International Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 183-191, April 1989. Published as Computer Architecture News 17 (2), Operating Systems Review 23 (special issue), SIGPLAN Notices 24 (special issue).
Tim Frink wrote:
> Hi, > > I'm interested in compiler optimisations aiming at the > improvement of the instruction cache behavior. For data > caches there are numerous compiler optimisations, many > of them operating on loops. However, for instruction caches > I've just found some very few on google. > > Could you give my any hints/references for any (further) > instruction cache optimizations? I appreciate any answers. > > Best regards, > Tim
Here's a recent article on the topic: http://www.embedded.com/columns/technicalinsights/202602959

CBFalconer wrote:

> Tim Frink wrote: > >>I'm interested in compiler optimisations aiming at the improvement >>of the instruction cache behavior. For data caches there are >>numerous compiler optimisations, many of them operating on loops. >>However, for instruction caches I've just found some very few on >>google. >> >>Could you give my any hints/references for any (further) >>instruction cache optimizations? I appreciate any answers. > > > I don't think there is much here. Instructions are normally > executed in sequence, barring jumps. The instruction cache is some > sort of buffer holding a portion of the instruction segment. This > works as long as the instruction to be next executed lies within > that cache. So the things that will help it include taut code and > short jump distances. Procedure/function calls will obviously wipe > out the cache (usually), so you want to concentrate the loops in > the lowest level procedures. > > You might do something by letting the cache hold several different > areas, and selecting which to update under what conditions. But > doing this intelligently requires program flow analysis.
This optimization in vain. The program is executed in the virtual address space whereas the cache is operating with the physical addresses. So the compiler and linker can't have any idea of how the pages are going to be mapped in the every particular case. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Vladimir Vassilevsky wrote:
> > > CBFalconer wrote: > >> Tim Frink wrote: >> >>> I'm interested in compiler optimisations aiming at the improvement >>> of the instruction cache behavior. ... >>> ... >>... >> You might do something by letting the cache hold several different >> areas, and selecting which to update under what conditions. But >> doing this intelligently requires program flow analysis. > > This optimization in vain. > > The program is executed in the virtual address space whereas the cache > is operating with the physical addresses.
Some systems have caches that use virtual addresses, for example the LEON series of SPARC processors. The cache tags then have a context (process number) field in addition to the address field so the hardware knows which process (which virtual address space) owns a given cache line. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On Nov 5, 12:21 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> CBFalconer wrote: > > Tim Frink wrote: > > >>I'm interested in compiler optimisations aiming at the improvement > >>of the instruction cache behavior. For data caches there are > >>numerous compiler optimisations, many of them operating on loops. > >>However, for instruction caches I've just found some very few on > >>google. > > >>Could you give my any hints/references for any (further) > >>instruction cache optimizations? I appreciate any answers. > > > I don't think there is much here. Instructions are normally > > executed in sequence, barring jumps. The instruction cache is some > > sort of buffer holding a portion of the instruction segment. This > > works as long as the instruction to be next executed lies within > > that cache. So the things that will help it include taut code and > > short jump distances. Procedure/function calls will obviously wipe > > out the cache (usually), so you want to concentrate the loops in > > the lowest level procedures. > > > You might do something by letting the cache hold several different > > areas, and selecting which to update under what conditions. But > > doing this intelligently requires program flow analysis. > > This optimization in vain. > > The program is executed in the virtual address space whereas the cache > is operating with the physical addresses. So the compiler and linker > can't have any idea of how the pages are going to be mapped in the every > particular case.
Assuming fairly typical cache behavior: It would be possible to position collections of routines with carefully selected page offsets so as to avoid conflict misses. Say you had ten small routines that called each other. Aligning each of them at the beginning of a 4K page would make for terrible cache thrashing with typical two- or four-way set associative i-caches. Aligning them on different 512 byte boundaries would likely allow all of them to fit in the cache.