PeteS wrote:

> I somewhere had a document going into this (done at Stanford IIRC) in
> some detail and a doc from Broadcom looking at it for the parts I used
> (MIPS core based) because it impacted the internal bus significantly.

MIPS is fine. I am mostly illustrating problems, definitely not going
into fine bit-level "this is how you fix it" solutions.

If you come across this document by Wednesday or so, please do let me
know about it. (I'm planning the presentation for sometime in the
latter half of next week).

Thanks!

larwe wrote:
> Once I finish preparing the materials, I'm giving a short lunchtime
> presentation at work about estimating CPU loading and latency.
> 
> I'd like to add some discussion about the difficulty of doing a simple
> count-the-cycles analysis on multi-cached, pipelined RISC
> architectures, where things start to get nondeterministic. I'm
> particularly keen to describe how this might affect ARM7[xxx] and ARM9
> designs, because a lot of teams here are starting to migrate 8051 and
> other 8-bit designs into ARM micros.
> 
> I don't mind groveling through the ARM ARM and working it out from
> first principles if I have to, but is there a reference that already
> discusses these issues? For instance, if you're running with the MMU in
> full swing, L1 and L2 page tables in use, can you lock your ISR's table
> entries in the TLBs so the MMU doesn't have to touch RAM to look them
> up? How to lock code into cache? Is a cache line fill aligned on a hard
> memory boundary or will it fill from an arbitrary starting address,
> based on where you just touched memory?
> 

I somewhere had a document going into this (done at Stanford IIRC) in 
some detail and a doc from Broadcom looking at it for the parts I used 
(MIPS core based) because it impacted the internal bus significantly.

Even though it's MIPS based, the basic issues would be the same, one 
might think.

I'll dig around and see if I can find them.

Cheers

PeteS

Vladimir Vassilevsky wrote:

> 1. Take a CPU which is good enough so you don't have to account for
> every bit and every cycle.
> 2. Find the average load.

Latency is really more important than overall loading, usually. The
thrust of the presentation is really working on the small 8-bitters,
but I want to illustrate in as much detail as possible how the simple
analysis methods break down with complex micros.

> If it did fit in 8051, then you don't have to worry if it fits in the ARM.

Expansion and numerous complicated options are the reason they are
migrating to ARM :)

> Don't do that. The whole point of using cache and TLBs is that you don't
> have to bother about the access to code and data. If you have to, then
> go find a faster CPU.

Not a good answer if you also have to guarantee latency with boundaries
on both sides of the window.

larwe wrote:

> Once I finish preparing the materials, I'm giving a short lunchtime
> presentation at work about estimating CPU loading and latency.
> 
> I'd like to add some discussion about the difficulty of doing a simple
> count-the-cycles analysis on multi-cached, pipelined RISC
> architectures, where things start to get nondeterministic.

That may be entertaining but not particularly usefull.

1. Take a CPU which is good enough so you don't have to account for 
every bit and every cycle.
2. Find the average load.
3. Assume the peak load as 3...4 times higher then the average.

That's it.

  I'm
> particularly keen to describe how this might affect ARM7[xxx] and ARM9
> designs, because a lot of teams here are starting to migrate 8051 and
> other 8-bit designs into ARM micros.

If it did fit in 8051, then you don't have to worry if it fits in the ARM.

> 
> I don't mind groveling through the ARM ARM and working it out from
> first principles if I have to, but is there a reference that already
> discusses these issues? For instance, if you're running with the MMU in
> full swing, L1 and L2 page tables in use, can you lock your ISR's table
> entries in the TLBs so the MMU doesn't have to touch RAM to look them
> up? How to lock code into cache?

Don't do that. The whole point of using cache and TLBs is that you don't 
have to bother about the access to code and data. If you have to, then 
go find a faster CPU.

  Is a cache line fill aligned on a hard
> memory boundary or will it fill from an arbitrary starting address,
> based on where you just touched memory?

The cache lines are aligned on the boundary of their size.

Vladimir Vassilevsky

DSP and Mixed Signal Design Consultant

http://www.abvolt.com

Once I finish preparing the materials, I'm giving a short lunchtime
presentation at work about estimating CPU loading and latency.

I'd like to add some discussion about the difficulty of doing a simple
count-the-cycles analysis on multi-cached, pipelined RISC
architectures, where things start to get nondeterministic. I'm
particularly keen to describe how this might affect ARM7[xxx] and ARM9
designs, because a lot of teams here are starting to migrate 8051 and
other 8-bit designs into ARM micros.

I don't mind groveling through the ARM ARM and working it out from
first principles if I have to, but is there a reference that already
discusses these issues? For instance, if you're running with the MMU in
full swing, L1 and L2 page tables in use, can you lock your ISR's table
entries in the TLBs so the MMU doesn't have to touch RAM to look them
up? How to lock code into cache? Is a cache line fill aligned on a hard
memory boundary or will it fill from an arbitrary starting address,
based on where you just touched memory?