EmbeddedRelated.com
Forums

Accurate delay routine in assembly for ARM7, CortexM3 and M0

Started by Alexan_e June 7, 2012
An interesting observation, but I believe that falls under both "2" for the delays at startup (although here not the whole application is "no care", just the startup part), and "3" for timing of the read/write accesses (it would be quite strange to see the 100-200ns delays been solved by a timer/ISR, which at the end of the day even for the 100-150MHz ARMs would probably burn a number of cycles of the same order than NOP-based delays).

A library which needs timing seriously would probably have some mechanism through which the application could provide it, and then it's upon the application how would it "create" and "distribute" the timing among the libraries. That's one of the things what I mean by "laborious but used when justified".

JW

An Engineer's Guide to the LPC2100 Series

> Add the IMHO most important reason: I want to write a library (let's say
> for an HD44780 display) and Iw ant my library to be useable *without
> imposing restrictions on the rest of the program* (like: thou shalt not
> use TIMER0).
>
> Imagine using 5 libraries in your application that each claim one of the
> (3?) available timers :(

And by the time you use you library in an multitasking environment things
go freaky.

Or someone uses it on a CPU which is faster and all of the sudden the
display driver does not work any more. IIRC: At the time of Turbo C,
Borland placed a NOP loop somewhere in code, which made programs crash
on 486 because of its better cache.

Picking your example: A display library needs two other "libraries": Timer
and IO.

Unless you are running really short of ROM, I see no need to write a
library specialized to only one chip.

--
42Bastian
+
| http://www.sciopta.com
| Fastest direct message passing kernel.
| IEC61508 certified.
+
--- In l..., Wouter van Ooijen wrote:
> Add the IMHO most important reason: I want to write a library (let's say
> for an HD44780 display) and Iw ant my library to be useable *without

The OP asked for a "accurate delay" not "any delay".

This whole discussion stated with you can not get "accurate delays" in software for many reasons.



This will work as well for "any delay", but accurate it is not.

don

PS: What will the compiler remove ??
accuracy is relative, it all depends on the situation/application.
Therefore even an "any delay" can be accurate. You will need to define the timing bounds in order to be able to call a timing mechanism (in)accurate. These bound are often already defined for you by the external hardware you are trying to interface with.

ps: the compiler will remove what you tell it to remove
That is why I'm asking for an assembly routine which will have a very specific predictable duration, I can easily use a C loop but
it will be unpredictable.

I have started this discussion because I found such a function in the file area of the forum (delay.zip) but it was intended for
LPC2xxx and doesn't compile in uvision.
As I have explained in a previous post what I meant by accurate to have a min delay accuracy, for example delay_us(5) should give
a delay of 5us if not interrupted or more if for any reason it is interrupted but it shouldn't give 4.5us in any case.

As a continuation of the error I get "(#1113: Inline assembler not permitted when generating Thumb code)" I have tried the
following function which is actually included in CMSIS , it is the __NOP() function but while it compiles fine as part of the
CMSIS library when I try it to define it with with a different name in my C code I get the Inline assembler not permitted error.



I'm trying to figure out what I should change to make my function compile too, there is probably a setting or a directive used
that lets the function compile fine in CMSIS but not in my code.

Alex
On 8.6.2012 15:21, Alexan_e wrote:
> That is why I'm asking for an assembly routine which will have a very specific predictable duration, I can easily use a C loop but
> it will be unpredictable.
>

When I need delay in some xy lib I usually make some void(*)(int)
pointer that will be initialized in run-time to point to some delay
function. That way my xy lib is independent of delay lib.
In case that I need specifically sw loop delay (for whatever reason) I
would make function like this:



Of course, loops_per_ms needs calibration of some sort, in run-time
using hw timer or in compile time, using come constant.
This is IMHO the best way to make sw delay because C is easier to
maintain than ASM and calibration provides accuracy (probably better
than counting cycles manually in ASM code)
--- In l..., Alexan_e wrote:

> As I have explained in a previous post what I meant by accurate to have a min delay accuracy, for example delay_us(5) should give
> a delay of 5us if not interrupted or more if for any reason it is interrupted but it shouldn't give 4.5us in any case.
>

OK - we are are getting closer to knowing what you really want but your requirements could still be misinterpreted.

e.g. are you only interested in a minimum delay or do you also require that it shouldn't give 5.5us if uninterrupted either?

Is the maximum acceptable margin of error always 0.5us or does it depend on the magnitude of the delay i.e.

a) do you mean you want a timer that is accurate to +/- 0.5us

b) do you mean you want a timer that is accurate to +/- 10%

c) do you mean something else?

Regards,
Chris

Chris Burrows
Astrobe v4.2: Oberon for Cortex-M3
http://www.astrobe.com
Someone has mentioned that if one build libraries than resources could
become scarce.

There are rules that must be followed in order to modules making makes
sense.

I do only build system in software modules, being experienced on that I can
give a few advices

1) Modules should be platform independent.
2) In case a module need to access a peripheral, its preferable to abstract
that access using a callback or lowerlvl calls for two resons:
1st: It truly makes a module platform independent.
2nd: By being platform independent, performing unit test on a PC
environment gives the debugging process a great efficiency, which means,
high system quality.
ps: lowlvl call / callback is, for example, if you need to read a byte
from a SD Card, than you call a function that you expect to give you that
byte. On unit test, this function will read from a pseudo SDCARD on the
test ram, if on a target platform, there will be a real function that will
do read the sd card. Can you see the great power achieved by platform
independence?
3) In case that you need a delay/uart routine, do call backs, call a
function that you expect to write on char to a logging port, and build a
module's private delay routine that do callback to a function that return a
system tick counter.

In the end, if you have one thousands modules using delays, there will be
only one counter/timer being spent.

At.
My original request was an accurate delay because that is what the delay
routine I found was supposed to be but I couldn't use it.
Basically I was asking for someone experienced in assembly (and probably
a uvision user) to tell me what I should change in the given routine to
make it compile in the ARM compiler for a Cortex mcu but instead this
started a whole conversation about how wrong this type of delay is.

Many of you said that an accurate delay is not possible because the loop
may be interrupted, I agree so I have rephrased what I requested to a
minimum delay when uninterrupted although I think that this goes without
saying in this type of delay.

I'm not asking for a time critical delay but I wouldn't want to get
random delays either that is why I asked for a delay that is completely
written in assembly so we know exactly the cpu clocks it will take to
execute , then depending on the delay needed this routine will be
executed multiple times.
I have used that type of delay in AVR for years, why is it so hart for
Cortex/ARM?

The routine should also be portable in M0,M3 and ATM7TDMI but if this is
not possible then I can use a define to choose from two delay routines
depending on the core but so far I have none for Cortex.
The problem is still that I can't use any assembly instruction in Cortex
since I get the error I described ( Inline assembler not permitted) so
does anyone have a suggestion of how I can use assembly?

Paul suggested that If everything else fails I should write it using
UNIFIED ASSEMBLY LANGUAGE.
I searched in google, I think I have to write an .s file but I have no
idea what to write inside the file in order to be able to call it as a
function from C, I have no experience in assembly.

The first place I'm going to use this delay is in a HD44780 display,
there is noting critical about it as long as the minimum delay is
achieved but that doesn't mean that every time I ask for 5us I should
get 10us.
I'm also going to use it in other external peripheral device libraries.

I don't have any specific tolerance requirement , if 1% is possible then
that is what I want or maybe a higher or lower accuracy but I don't want
to use a volatile variable loop in C for this delay since I think it
will give a far worce accuracy than a predictable assembly loop .

Alex
> Paul suggested that If everything else fails I should write it using
> UNIFIED ASSEMBLY LANGUAGE.
> I searched in google, I think I have to write an .s file but I have no
> idea what to write inside the file in order to be able to call it as a
> function from C, I have no experience in assembly.

I gave you, written out, the complete code to plug into gas. I mean, what more can I do? I also gave C-level versions of the loop which you can run. The fact you have some problem with Keil is just that there is no standardised assembly language writing mechanism in C, and mnemonics in assemblers will usually be compatible across implementations, but the periphery of directives will not. Before UAL it was pretty much a nightmare writing code in the assembly language parts of the CrossWorks library. It's still not much fun, but it's manageable.

> The first place I'm going to use this delay is in a HD44780 display,
> there is noting critical about it as long as the minimum delay is
> achieved but that doesn't mean that every time I ask for 5us I should
> get 10us.
> I'm also going to use it in other external peripheral device libraries.

You know, I dedicate a single timer to ticking at the core frequency--that way I know how long something takes and minimal delays where burning a few cycles isn't a problem are taken care of.

-- Paul.