Optimizing Instruction Cache

Hi,

I'm interested in compiler optimisations aiming at the
improvement of the instruction cache behavior. For data
caches there are numerous compiler optimisations, many
of them operating on loops. However, for instruction caches
I've just found some very few on google.

Could you give my any hints/references for any (further) 
instruction cache optimizations? I appreciate any answers.

Best regards,
Tim

Reply by CBFalconer ●November 1, 20072007-11-01

Tim Frink wrote:
> 
> I'm interested in compiler optimisations aiming at the improvement
> of the instruction cache behavior. For data caches there are
> numerous compiler optimisations, many of them operating on loops.
> However, for instruction caches I've just found some very few on
> google.
> 
> Could you give my any hints/references for any (further)
> instruction cache optimizations? I appreciate any answers.

I don't think there is much here.  Instructions are normally
executed in sequence, barring jumps.  The instruction cache is some
sort of buffer holding a portion of the instruction segment.  This
works as long as the instruction to be next executed lies within
that cache.  So the things that will help it include taut code and
short jump distances.  Procedure/function calls will obviously wipe
out the cache (usually), so you want to concentrate the loops in
the lowest level procedures.

You might do something by letting the cache hold several different
areas, and selecting which to update under what conditions.  But
doing this intelligently requires program flow analysis.

-- 
 Chuck F (cbfalconer at maineline dot net)
   Available for consulting/temporary embedded and systems.
   <http://cbfalconer.home.att.net>

-- 
Posted via a free Usenet account from http://www.teranews.com

Reply by Vladimir Vassilevsky ●November 1, 20072007-11-01

"Tim Frink" <plfriko@yahoo.de> wrote in message
news:5oul8nFokv7tU2@mid.individual.net...
> Hi,
>
> I'm interested in compiler optimisations aiming at the
> improvement of the instruction cache behavior. For data
> caches there are numerous compiler optimisations, many
> of them operating on loops. However, for instruction caches
> I've just found some very few on google.
>
> Could you give my any hints/references for any (further)
> instruction cache optimizations? I appreciate any answers.
>

The whole point of using cache is not being bothered with the optimization.
If you want to optimize the execution by hand, then you can load the
critical parts in the L1 RAM and lock it there. The 4+ way cache does a good
job on optimization by itself. The code just have to avoid the unlikely
situations of the cache stall. Thus the linker has to arrange the code in
the way to prevent the jumps to the distance close to the size of the cache.

Vladimir Vassilevsky
DSP and Mixed Signal Consultant
www.abvolt.com

Reply by Gene ●November 2, 20072007-11-02

On Nov 1, 2:46 pm, Tim Frink <plfr...@yahoo.de> wrote:
> Hi,
>
> I'm interested in compiler optimisations aiming at the
> improvement of the instruction cache behavior. For data
> caches there are numerous compiler optimisations, many
> of them operating on loops. However, for instruction caches
> I've just found some very few on google.
>
> Could you give my any hints/references for any (further)
> instruction cache optimizations? I appreciate any answers.

I seem to recall the DEC MIPS linker reordered basic blocks to improve
code cache performance.  Don't know if that's still a useful idea with
today's architectures.  Code is comparatively small and you have
little control over access sequence (beyond basic block order).  Data
can be huge and all kinds of access sequence fiddling is possible.
That's why you see much more work on data cache optimizations.

Here's an old reference.  Maybe  place to start.

Scott McFarling. Program optimization for instruction caches. Third
International
Symposium on Architectural Support for Programming Languages and
Operating Systems,
pp. 183-191, April 1989. Published as Computer Architecture News 17
(2),
Operating Systems Review 23 (special issue), SIGPLAN Notices 24
(special issue).

Reply by David Brown ●November 2, 20072007-11-02

Tim Frink wrote:
> Hi,
> 
> I'm interested in compiler optimisations aiming at the
> improvement of the instruction cache behavior. For data
> caches there are numerous compiler optimisations, many
> of them operating on loops. However, for instruction caches
> I've just found some very few on google.
> 
> Could you give my any hints/references for any (further) 
> instruction cache optimizations? I appreciate any answers.
> 
> Best regards,
> Tim

Here's a recent article on the topic:

http://www.embedded.com/columns/technicalinsights/202602959

Reply by Vladimir Vassilevsky ●November 5, 20072007-11-05


CBFalconer wrote:

> Tim Frink wrote:
> 
>>I'm interested in compiler optimisations aiming at the improvement
>>of the instruction cache behavior. For data caches there are
>>numerous compiler optimisations, many of them operating on loops.
>>However, for instruction caches I've just found some very few on
>>google.
>>
>>Could you give my any hints/references for any (further)
>>instruction cache optimizations? I appreciate any answers.
> 
> 
> I don't think there is much here.  Instructions are normally
> executed in sequence, barring jumps.  The instruction cache is some
> sort of buffer holding a portion of the instruction segment.  This
> works as long as the instruction to be next executed lies within
> that cache.  So the things that will help it include taut code and
> short jump distances.  Procedure/function calls will obviously wipe
> out the cache (usually), so you want to concentrate the loops in
> the lowest level procedures.
> 
> You might do something by letting the cache hold several different
> areas, and selecting which to update under what conditions.  But
> doing this intelligently requires program flow analysis.

This optimization in vain.

The program is executed in the virtual address space whereas the cache
is operating with the physical addresses. So the compiler and linker 
can't have any idea of how the pages are going to be mapped in the every 
particular case.


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Reply by Niklas Holsti ●November 5, 20072007-11-05

Vladimir Vassilevsky wrote:
> 
> 
> CBFalconer wrote:
> 
>> Tim Frink wrote:
>>
>>> I'm interested in compiler optimisations aiming at the improvement
>>> of the instruction cache behavior. ...
>>> ...
>>...
>> You might do something by letting the cache hold several different
>> areas, and selecting which to update under what conditions.  But
>> doing this intelligently requires program flow analysis.
> 
> This optimization in vain.
> 
> The program is executed in the virtual address space whereas the cache
> is operating with the physical addresses.

Some systems have caches that use virtual addresses, for example 
the LEON series of SPARC processors. The cache tags then have a 
context (process number) field in addition to the address field so 
the hardware knows which process (which virtual address space) owns 
a given cache line.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by robe...@yahoo.com ●November 5, 20072007-11-05

On Nov 5, 12:21 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> CBFalconer wrote:
> > Tim Frink wrote:
>
> >>I'm interested in compiler optimisations aiming at the improvement
> >>of the instruction cache behavior. For data caches there are
> >>numerous compiler optimisations, many of them operating on loops.
> >>However, for instruction caches I've just found some very few on
> >>google.
>
> >>Could you give my any hints/references for any (further)
> >>instruction cache optimizations? I appreciate any answers.
>
> > I don't think there is much here.  Instructions are normally
> > executed in sequence, barring jumps.  The instruction cache is some
> > sort of buffer holding a portion of the instruction segment.  This
> > works as long as the instruction to be next executed lies within
> > that cache.  So the things that will help it include taut code and
> > short jump distances.  Procedure/function calls will obviously wipe
> > out the cache (usually), so you want to concentrate the loops in
> > the lowest level procedures.
>
> > You might do something by letting the cache hold several different
> > areas, and selecting which to update under what conditions.  But
> > doing this intelligently requires program flow analysis.
>
> This optimization in vain.
>
> The program is executed in the virtual address space whereas the cache
> is operating with the physical addresses. So the compiler and linker
> can't have any idea of how the pages are going to be mapped in the every
> particular case.


Assuming fairly typical cache behavior:

It would be possible to position collections of routines with
carefully selected page offsets so as to avoid conflict misses.  Say
you had ten small routines that called each other.  Aligning each of
them at the beginning of a 4K page would make for terrible cache
thrashing with typical two- or four-way set associative i-caches.
Aligning them on different 512 byte boundaries would likely allow all
of them to fit in the cache.

Optimizing Instruction Cache

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group