I'm driving some hardware with the SAM9XE that has some hard real time
limits: I need to lower some pins, wait for *two microseconds* then
raise those pins.
Any longer and I let the magic smoke out. Any shorter and the right
thing fails to happen.
I'm doing this on the AT91SAM9XE-EK with code that looks like:
void Strobe( unsigned int bits )
{
LowerPins(...); // an inline function, pin->pio->CODR = bits;
for (int i = 0; i < 7; ++i)
{
__asm__ __volatile__( "nop\n" );
}
__asm__ __volatile__( "nop\n" );
__asm__ __volatile__( "nop\n" );
__asm__ __volatile__( "nop\n" );
__asm__ __volatile__( "nop\n" );
RaisePins(...); // an inline function, pin->pio->SODR = bits;
}
This function is off in its own object module.
Changing the "7" in there is distinctly non-linear, but I have the
feeling that that's because the optimizer sometimes chooses to unroll
the loop. However, it also seems like there's some sort of difference
in how long this takes to run depending on linking: The delay seems to
change every time I compile the project.
Anyone have any idea what's going on? I'm way too acquainted with
the
'scope recently.
Dan
Super accurate short delays, SAM9XE and GCC?
Started by ●June 3, 2009
Reply by ●June 3, 20092009-06-03
On Wed, Jun 3, 2009 at 2:43 PM, Dan Lyke wrote:
> I'm driving some hardware with the SAM9XE that has some hard real time
> limits: I need to lower some pins, wait for *two microseconds* then
> raise those pins.
>
> Any longer and I let the magic smoke out. Any shorter and the right
> thing fails to happen.
>
> I'm doing this on the AT91SAM9XE-EK with code that looks like:
>
> void Strobe( unsigned int bits )
> {
> LowerPins(...); // an inline function, pin->pio->CODR = bits;
> for (int i = 0; i < 7; ++i)
> {
> __asm__ __volatile__( "nop\n" );
> }
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> RaisePins(...); // an inline function, pin->pio->SODR = bits;
> }
>
> This function is off in its own object module.
>
> Changing the "7" in there is distinctly non-linear, but I have the
> feeling that that's because the optimizer sometimes chooses to unroll
> the loop. However, it also seems like there's some sort of difference
> in how long this takes to run depending on linking: The delay seems to
> change every time I compile the project.
>
> Anyone have any idea what's going on? I'm way too acquainted with the
> 'scope recently.
>
> Dan
>
> Hi Dan, First thought is that you have to turn off any interrupts,
> secondly, set the Assembler to save it's output and look at the assembly
> code. Good Luck.
>
>
> __.
>
> I'm driving some hardware with the SAM9XE that has some hard real time
> limits: I need to lower some pins, wait for *two microseconds* then
> raise those pins.
>
> Any longer and I let the magic smoke out. Any shorter and the right
> thing fails to happen.
>
> I'm doing this on the AT91SAM9XE-EK with code that looks like:
>
> void Strobe( unsigned int bits )
> {
> LowerPins(...); // an inline function, pin->pio->CODR = bits;
> for (int i = 0; i < 7; ++i)
> {
> __asm__ __volatile__( "nop\n" );
> }
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> RaisePins(...); // an inline function, pin->pio->SODR = bits;
> }
>
> This function is off in its own object module.
>
> Changing the "7" in there is distinctly non-linear, but I have the
> feeling that that's because the optimizer sometimes chooses to unroll
> the loop. However, it also seems like there's some sort of difference
> in how long this takes to run depending on linking: The delay seems to
> change every time I compile the project.
>
> Anyone have any idea what's going on? I'm way too acquainted with the
> 'scope recently.
>
> Dan
>
> Hi Dan, First thought is that you have to turn off any interrupts,
> secondly, set the Assembler to save it's output and look at the assembly
> code. Good Luck.
>
>
> __.
>
Reply by ●June 3, 20092009-06-03
On Wed, 3 Jun 2009 16:44:23 -0400
Eric Haver wrote:
> Hi Dan, First thought is that you have to turn off any interrupts,
> secondly, set the Assembler to save it's output and look at the
> assembly code.
Yeah, interrupts are definitely off. No room for them in the hard real
time stuff, and I spent half an hour this morning cleaning off my desk,
mouse and half of my keyboard with denatured alcohol because of the kind
of thing that happens if I go too long on that pulse, definitely not a
customer experience we can allow...
And I haven't dug into the assembly yet because its a single function
that hasn't changed, and yet the timing changes, which makes me think
there's something about alignment and linking that anyone who's worked
with the ARM before probably knows immediately, but that I don't.
Dan
Eric Haver wrote:
> Hi Dan, First thought is that you have to turn off any interrupts,
> secondly, set the Assembler to save it's output and look at the
> assembly code.
Yeah, interrupts are definitely off. No room for them in the hard real
time stuff, and I spent half an hour this morning cleaning off my desk,
mouse and half of my keyboard with denatured alcohol because of the kind
of thing that happens if I go too long on that pulse, definitely not a
customer experience we can allow...
And I haven't dug into the assembly yet because its a single function
that hasn't changed, and yet the timing changes, which makes me think
there's something about alignment and linking that anyone who's worked
with the ARM before probably knows immediately, but that I don't.
Dan
Reply by ●June 3, 20092009-06-03
Dan Lyke wrote:
> I'm driving some hardware with the SAM9XE that has some hard real time
> limits: I need to lower some pins, wait for *two microseconds* then
> raise those pins.
>
> Any longer and I let the magic smoke out. Any shorter and the right
> thing fails to happen.
>
> I'm doing this on the AT91SAM9XE-EK with code that looks like:
>
> void Strobe( unsigned int bits )
> {
> LowerPins(...); // an inline function, pin->pio->CODR = bits;
> for (int i = 0; i < 7; ++i)
> {
> __asm__ __volatile__( "nop\n" );
> }
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> RaisePins(...); // an inline function, pin->pio->SODR = bits;
> }
>
> This function is off in its own object module.
>
> Changing the "7" in there is distinctly non-linear, but I have the
> feeling that that's because the optimizer sometimes chooses to unroll
> the loop. However, it also seems like there's some sort of difference
> in how long this takes to run depending on linking: The delay seems to
> change every time I compile the project.
>
> Anyone have any idea what's going on? I'm way too acquainted with the
> 'scope recently.
>
I'm sure the performance will depend upon the alignment of the loop.
The STR9 suffers problems too, it had distinctly strange behaviour when
the flash controller, flash burst queue, and ARM9 core collided.
-- Paul.
> I'm driving some hardware with the SAM9XE that has some hard real time
> limits: I need to lower some pins, wait for *two microseconds* then
> raise those pins.
>
> Any longer and I let the magic smoke out. Any shorter and the right
> thing fails to happen.
>
> I'm doing this on the AT91SAM9XE-EK with code that looks like:
>
> void Strobe( unsigned int bits )
> {
> LowerPins(...); // an inline function, pin->pio->CODR = bits;
> for (int i = 0; i < 7; ++i)
> {
> __asm__ __volatile__( "nop\n" );
> }
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> RaisePins(...); // an inline function, pin->pio->SODR = bits;
> }
>
> This function is off in its own object module.
>
> Changing the "7" in there is distinctly non-linear, but I have the
> feeling that that's because the optimizer sometimes chooses to unroll
> the loop. However, it also seems like there's some sort of difference
> in how long this takes to run depending on linking: The delay seems to
> change every time I compile the project.
>
> Anyone have any idea what's going on? I'm way too acquainted with the
> 'scope recently.
>
I'm sure the performance will depend upon the alignment of the loop.
The STR9 suffers problems too, it had distinctly strange behaviour when
the flash controller, flash burst queue, and ARM9 core collided.
-- Paul.
Reply by ●June 3, 20092009-06-03
Surely if the pulse timing is that critical you should use dedicated
hardware?
Frog
_____
From: A... [mailto:A...] On Behalf Of
Dan Lyke
Sent: Thursday, 4 June 2009 9:38 a.m.
To: A...
Subject: Re: [AT91SAM] Super accurate short delays, SAM9XE and GCC?
On Wed, 3 Jun 2009 16:44:23 -0400
Eric Haver com> wrote:
> Hi Dan, First thought is that you have to turn off any interrupts,
> secondly, set the Assembler to save it's output and look at the
> assembly code.
Yeah, interrupts are definitely off. No room for them in the hard real
time stuff, and I spent half an hour this morning cleaning off my desk,
mouse and half of my keyboard with denatured alcohol because of the kind
of thing that happens if I go too long on that pulse, definitely not a
customer experience we can allow...
And I haven't dug into the assembly yet because its a single function
that hasn't changed, and yet the timing changes, which makes me think
there's something about alignment and linking that anyone who's worked
with the ARM before probably knows immediately, but that I don't.
Dan
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.339 / Virus Database: 270.12.37/2129 - Release Date: 06/03/09
18:00:00
hardware?
Frog
_____
From: A... [mailto:A...] On Behalf Of
Dan Lyke
Sent: Thursday, 4 June 2009 9:38 a.m.
To: A...
Subject: Re: [AT91SAM] Super accurate short delays, SAM9XE and GCC?
On Wed, 3 Jun 2009 16:44:23 -0400
Eric Haver com> wrote:
> Hi Dan, First thought is that you have to turn off any interrupts,
> secondly, set the Assembler to save it's output and look at the
> assembly code.
Yeah, interrupts are definitely off. No room for them in the hard real
time stuff, and I spent half an hour this morning cleaning off my desk,
mouse and half of my keyboard with denatured alcohol because of the kind
of thing that happens if I go too long on that pulse, definitely not a
customer experience we can allow...
And I haven't dug into the assembly yet because its a single function
that hasn't changed, and yet the timing changes, which makes me think
there's something about alignment and linking that anyone who's worked
with the ARM before probably knows immediately, but that I don't.
Dan
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.339 / Virus Database: 270.12.37/2129 - Release Date: 06/03/09
18:00:00
Reply by ●June 3, 20092009-06-03
Dan Lyke schrieb:
> I'm doing this on the AT91SAM9XE-EK with code that looks like:
>
> void Strobe( unsigned int bits )
> {
> LowerPins(...); // an inline function, pin->pio->CODR = bits;
> for (int i = 0; i < 7; ++i)
> {
> __asm__ __volatile__( "nop\n" );
> }
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> RaisePins(...); // an inline function, pin->pio->SODR = bits;
> }
>
> This function is off in its own object module.
>
> Changing the "7" in there is distinctly non-linear, but I have the
> feeling that that's because the optimizer sometimes chooses to unroll
> the loop. However, it also seems like there's some sort of difference
> in how long this takes to run depending on linking: The delay seems to
> change every time I compile the project.
a) Don't write such code in C, use assembly (even not inline assembly)
b) Does this part have I-TCM, if so, place the code in there.
Otherwise you get runtime differences due to the cache.
c) Lock interrupts.
--
42Bastian
------------------
Parts of this email are written with invisible ink.
Note: SPAM-only account, direct mail to bs42@...
> I'm doing this on the AT91SAM9XE-EK with code that looks like:
>
> void Strobe( unsigned int bits )
> {
> LowerPins(...); // an inline function, pin->pio->CODR = bits;
> for (int i = 0; i < 7; ++i)
> {
> __asm__ __volatile__( "nop\n" );
> }
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> RaisePins(...); // an inline function, pin->pio->SODR = bits;
> }
>
> This function is off in its own object module.
>
> Changing the "7" in there is distinctly non-linear, but I have the
> feeling that that's because the optimizer sometimes chooses to unroll
> the loop. However, it also seems like there's some sort of difference
> in how long this takes to run depending on linking: The delay seems to
> change every time I compile the project.
a) Don't write such code in C, use assembly (even not inline assembly)
b) Does this part have I-TCM, if so, place the code in there.
Otherwise you get runtime differences due to the cache.
c) Lock interrupts.
--
42Bastian
------------------
Parts of this email are written with invisible ink.
Note: SPAM-only account, direct mail to bs42@...
Reply by ●June 4, 20092009-06-04
Hi Dan,
Assuming you can't poll a hardware counter perhaps you could
1) Always compile the function with -O0/Write the function in assembly code.
2) Place the function in non-cacheable memory. Alternatively you may be able to lock the function into the icache but I've never tried this.
Regards
Michael
>
> I'm driving some hardware with the SAM9XE that has some hard real time
> limits: I need to lower some pins, wait for *two microseconds* then
> raise those pins.
>
> Any longer and I let the magic smoke out. Any shorter and the right
> thing fails to happen.
>
> I'm doing this on the AT91SAM9XE-EK with code that looks like:
>
> void Strobe( unsigned int bits )
> {
> LowerPins(...); // an inline function, pin->pio->CODR = bits;
> for (int i = 0; i < 7; ++i)
> {
> __asm__ __volatile__( "nop\n" );
> }
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> RaisePins(...); // an inline function, pin->pio->SODR = bits;
> }
>
> This function is off in its own object module.
>
> Changing the "7" in there is distinctly non-linear, but I have the
> feeling that that's because the optimizer sometimes chooses to unroll
> the loop. However, it also seems like there's some sort of difference
> in how long this takes to run depending on linking: The delay seems to
> change every time I compile the project.
>
> Anyone have any idea what's going on? I'm way too acquainted with the
> 'scope recently.
>
> Dan
>
Assuming you can't poll a hardware counter perhaps you could
1) Always compile the function with -O0/Write the function in assembly code.
2) Place the function in non-cacheable memory. Alternatively you may be able to lock the function into the icache but I've never tried this.
Regards
Michael
>
> I'm driving some hardware with the SAM9XE that has some hard real time
> limits: I need to lower some pins, wait for *two microseconds* then
> raise those pins.
>
> Any longer and I let the magic smoke out. Any shorter and the right
> thing fails to happen.
>
> I'm doing this on the AT91SAM9XE-EK with code that looks like:
>
> void Strobe( unsigned int bits )
> {
> LowerPins(...); // an inline function, pin->pio->CODR = bits;
> for (int i = 0; i < 7; ++i)
> {
> __asm__ __volatile__( "nop\n" );
> }
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> RaisePins(...); // an inline function, pin->pio->SODR = bits;
> }
>
> This function is off in its own object module.
>
> Changing the "7" in there is distinctly non-linear, but I have the
> feeling that that's because the optimizer sometimes chooses to unroll
> the loop. However, it also seems like there's some sort of difference
> in how long this takes to run depending on linking: The delay seems to
> change every time I compile the project.
>
> Anyone have any idea what's going on? I'm way too acquainted with the
> 'scope recently.
>
> Dan
>
Reply by ●June 4, 20092009-06-04
Hi Dan,
What about writting these lines in an assembler file linked with your project ?
Eric.
----- Original Message -----
From: nutleycottage
To: A...
Sent: Thursday, June 04, 2009 3:42 PM
Subject: [AT91SAM] Re: Super accurate short delays, SAM9XE and GCC?
Hi Dan,
Assuming you can't poll a hardware counter perhaps you could
1) Always compile the function with -O0/Write the function in assembly code.
2) Place the function in non-cacheable memory. Alternatively you may be able to lock the function into the icache but I've never tried this.
Regards
Michael
>
> I'm driving some hardware with the SAM9XE that has some hard real time
> limits: I need to lower some pins, wait for *two microseconds* then
> raise those pins.
>
> Any longer and I let the magic smoke out. Any shorter and the right
> thing fails to happen.
>
> I'm doing this on the AT91SAM9XE-EK with code that looks like:
>
> void Strobe( unsigned int bits )
> {
> LowerPins(...); // an inline function, pin->pio->CODR = bits;
> for (int i = 0; i < 7; ++i)
> {
> __asm__ __volatile__( "nop\n" );
> }
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> RaisePins(...); // an inline function, pin->pio->SODR = bits;
> }
>
> This function is off in its own object module.
>
> Changing the "7" in there is distinctly non-linear, but I have the
> feeling that that's because the optimizer sometimes chooses to unroll
> the loop. However, it also seems like there's some sort of difference
> in how long this takes to run depending on linking: The delay seems to
> change every time I compile the project.
>
> Anyone have any idea what's going on? I'm way too acquainted with the
> 'scope recently.
>
> Dan
>
What about writting these lines in an assembler file linked with your project ?
Eric.
----- Original Message -----
From: nutleycottage
To: A...
Sent: Thursday, June 04, 2009 3:42 PM
Subject: [AT91SAM] Re: Super accurate short delays, SAM9XE and GCC?
Hi Dan,
Assuming you can't poll a hardware counter perhaps you could
1) Always compile the function with -O0/Write the function in assembly code.
2) Place the function in non-cacheable memory. Alternatively you may be able to lock the function into the icache but I've never tried this.
Regards
Michael
>
> I'm driving some hardware with the SAM9XE that has some hard real time
> limits: I need to lower some pins, wait for *two microseconds* then
> raise those pins.
>
> Any longer and I let the magic smoke out. Any shorter and the right
> thing fails to happen.
>
> I'm doing this on the AT91SAM9XE-EK with code that looks like:
>
> void Strobe( unsigned int bits )
> {
> LowerPins(...); // an inline function, pin->pio->CODR = bits;
> for (int i = 0; i < 7; ++i)
> {
> __asm__ __volatile__( "nop\n" );
> }
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> __asm__ __volatile__( "nop\n" );
> RaisePins(...); // an inline function, pin->pio->SODR = bits;
> }
>
> This function is off in its own object module.
>
> Changing the "7" in there is distinctly non-linear, but I have the
> feeling that that's because the optimizer sometimes chooses to unroll
> the loop. However, it also seems like there's some sort of difference
> in how long this takes to run depending on linking: The delay seems to
> change every time I compile the project.
>
> Anyone have any idea what's going on? I'm way too acquainted with the
> 'scope recently.
>
> Dan
>
Reply by ●June 4, 20092009-06-04
On Thu, 4 Jun 2009 09:56:33 +1200
"Frog Twissell, Blue Sky Solutions" wrote:
> Surely if the pulse timing is that critical you should use dedicated
> hardware?
That's what the 9XE is for: handling all the timing critical bits. It'd
be nice to have an ASIC to do all this stuff, but that's not in the
cards. Could probably have spec'd out an FPGA, but there's enough
general purpose compute and memory shuffling that needs to happen in
this subsystem that just throwing a processor at it seemed like the
right (ie: fastest and least expensive with quantity flexibility) thing
to do.
So Paul suggests that I need to learn a few things about alignment, and
probably delve into my linker files a bit to make sure that functions
are landing where I intend them to. And keep shuffling those no-ops
around and learn a little bit more about ARM assembly language. Sigh,
more bedtime reading...
Dan
"Frog Twissell, Blue Sky Solutions" wrote:
> Surely if the pulse timing is that critical you should use dedicated
> hardware?
That's what the 9XE is for: handling all the timing critical bits. It'd
be nice to have an ASIC to do all this stuff, but that's not in the
cards. Could probably have spec'd out an FPGA, but there's enough
general purpose compute and memory shuffling that needs to happen in
this subsystem that just throwing a processor at it seemed like the
right (ie: fastest and least expensive with quantity flexibility) thing
to do.
So Paul suggests that I need to learn a few things about alignment, and
probably delve into my linker files a bit to make sure that functions
are landing where I intend them to. And keep shuffling those no-ops
around and learn a little bit more about ARM assembly language. Sigh,
more bedtime reading...
Dan
Reply by ●June 4, 20092009-06-04
On Thu, 4 Jun 2009 16:53:15 +0200
"Eric Pasquier" wrote:
> What about writting these lines in an assembler file linked with your
> project ?
Now that I've got it in its own separate C file the changes on rebuild
seem to have settled down, but if it is, as several have suggested, a
linking alignment issue, then the assembler code wouldn't fix the
problem.
I've got a request for a little clarification on this in to Atmel,
hopefully I'll get all the caching and alignment issues ironed out soon.
Dan
"Eric Pasquier" wrote:
> What about writting these lines in an assembler file linked with your
> project ?
Now that I've got it in its own separate C file the changes on rebuild
seem to have settled down, but if it is, as several have suggested, a
linking alignment issue, then the assembler code wouldn't fix the
problem.
I've got a request for a little clarification on this in to Atmel,
hopefully I'll get all the caching and alignment issues ironed out soon.
Dan