## Counting clock cycles

Started by 6 years ago4 replieslatest reply 6 years ago2998 views

So I have a piece of code for which I am trying to get the exact number of clock cycles it will take to execute. However, my experiments indicate that this number is incorrect. Please don't ask details of how I run the experiment. What I need to know is where I am going wrong in counting the clock cycles. I am working with TI launchpad for MSP430. Is it adding any optimizations I am not aware of? Any other ideas anyone can give me.? Code and assembly with clock cycles is given below. Initial value of i is 0.

int get_sign(int i){

int j=i+1000;

int y=0;

while(i<100){

if (y == 0){

j=j-i;

y=1;

}

else{

j=j*2+i*2;

y=0;

}

i++;

}

return j;

}

 Assembly generated Clock cycles push r4 3 mov r1, r4 1 add #2, r4 1 add #llo(-6), r1 2 mov r15, -4(r4) 4 mov -4(r4), r15 3 Add #1000, r15 2 mov r15, -8(r4) 4 mov #0, -6(r4) 4 jmp 2 cmp #0, -6(r4) 4 x100 jne 2 x100 sub -4(r4), -8(r4) 6 x50 mov #1, -6(r4) 4 x50 jmp 2 x50 Mov -8(r4), r15 3 x50 add -4(r4), r15 3 x50 mov r15, -8(r4) 4 x50 Rla -8(r4) 6 x50 mov #0, -6(r4) 4 x50 add #1, -4(r4) 4 x100 cmp #100, -4(r4) 5 x100 jl 2 x100 Mov -8(r4), r15 3 Add #6, r1 2 pop r4 2 ret 3 Total 3336
[ - ]

There are various factors that can affect instruction timing.  Some are shown in the list below:

• Interrupts (the biggest offender)

If interrupts are enabled, an interrupt firing in the middle of the test will cause the timing to be wrong.

• Flash wait states

Reading flash is slower than reading RAM so if your code is running from flash, you have to account for wait states when an instruction is loaded.  Burst mode further complicates things since some processors can perform a read-ahead of multiple instructions so that execution times are reduced.  Even slower RAM may require wait states.

• Instruction caching

Some processors have instruction caches that can hold many instructions to eliminate the need to read them from flash.  Figuring out the timing is very difficult since a jump taken may clear the cache and instructions would have to be read from flash again.

• Hardware register access

Reading data from a hardware register introduces variable clock cycles since the processor may have to execute wait states until the data is ready at the register.

These are only four examples out of many.  There are others such as bus arbitration, DRAM refresh, turbo modes, etc.  The best you can hope for is an approximation.  As for the MSP430, I'm not familiar with the processor but I'm pretty sure that it has one or more of the items I posted above.

[ - ]

Aaaand -- this is why I gave up on counting clock cycles a long long time ago.  I benchmark code instead, and allow for a wide margin in timing in actual use.  Not only does benchmarking give me (in my opinion) a more accurate picture, it also accounts for instances where I might be committing pure old-fashioned screwups, like failing to set up the flash memory accesses for the best-case processor speed.

I suppose that if I were working on something time-critical, on a processor that had an easy-to-determine cycle count, then I might count clock ticks -- but on the other hand, such a processor would, almost by definition, be much slower than a modern alternative.  So if I were doing that sort of thing with that sort of processor, and there wasn't some compelling reason (legacy hardware, radiation hardness, etc.) to keep that processor, I might advocate for something newer and faster.

[ - ]

You're not giving away much are you ?

How many clock cycles do you measure?

Is the error between what you measure and what you predict constant?

You haven't told us what processor you are using, what else is running on it etc etc.

I'm struggling to see why you care - I often measure how long things take but I haven't needed to do anything complicated in an exact number of clock cycles for over 30 years (battery powered instrument using a very feeble early CMOS processor to generate sine waves and measure the response to them at the same time, no on chip timers.)

MK

[ - ]