pulse counter using LPC1768 proving to very challenging| page 3

Reply by Jon Kirwan ●June 7, 20112011-06-07

On Tue, 07 Jun 2011 21:31:47 +0200, David Brown
<david.brown@removethis.hesbynett.no> wrote:

><snip>
>Long before you consider writing the ISR in assembly, check that the C 
>code is decently written (if you can't write appropriate C code for an 
>interrupt routine, you are unlikely to be able to write good assembly), 
>and check that you are using your compiler properly (and that you have a 
>decent compiler).  Used properly (which includes studying the generated 
>assembly), C on a processor like this should be very close to the speed 
>of optimal assembly.
><snip>

A decent warning for those not fluent in assembly.  Not for
those of us who know it cold.

C code is almost _never_ as good as hand-written assembly
code and unless you are doing math expressions which is NOT a
good idea in an interrupt routine not much different than
writing in c.  Same hardware needs dealing with and the
compilers usually have "constraints" that an assembly
programmer does not have, at all.

We've been down this c vs assembly path a million times here.
Some points are good, but I hate broad brush stuff.  Look up
the discussion we had a few years ago on a GCD algorithm.  To
this day, not even the best x86 compilers can come close even
when ALL of the c-constraints must be fully observed by the
hand assembly coder.  Compilers cannot do topology inversion
and they handle status bits somewhat poorly.  There are many
other issues that may relate to interrupts, as well, where
there is no syntax in c for certain semantics.

How all this applies in the ARM case I'll leave to folks
better informed than me.  But I really don't like it when I
see "you are unlikely to be able to write good assembly" and
"should be very close to ... optimal assembly."  Most
particularly, when discussing interrupt routines.

I will leave it there.

Jon

Reply by Jim Granville ●June 7, 20112011-06-07

On Jun 8, 1:16=A0am, "navman" <naveen_pn@n_o_s_p_a_m.yahoo.com> wrote:
> Hi,
> I'm trying to counter some pulses (2-10usec width) using the LPC1768
> Cortex-M3 microcontroller. There are 2 channels on which I have count the
> pulses on. I have to allow only pulses that are >=3D2us pulse width (so w=
e
> cannot simply use the counter function). =A0

You have not said what is driving this unusual spec, nor the repeat-
rate, and SW alone may not be enough to reject all noise types.
ie Two 1.8us pulses close together, could pass.

 So an external filter, either Schmitt + RC, or a simple state-engine
in a SPLD/CPLD or HC163 counter may be needed.

 You need to minimise the SW & interrupt calls, by helping in HW, eg
capturing a value on each edge, but only INT on trailing edge, then
check the delta-time.

 Some of the new NXPs have a capture-clears-timer feature, which would
be very useful on this type of problem.

Reply by David Brown ●June 8, 20112011-06-08

On 07/06/2011 21:45, Tim Wescott wrote:
> On 06/07/2011 12:31 PM, David Brown wrote:
>> On 07/06/11 17:52, Tim Wescott wrote:
>>> On Tue, 07 Jun 2011 15:49:22 +0200, David Brown wrote:
>>>
>>>> On 07/06/2011 15:33, Bruce Varley wrote:
>>>>> "navman"<naveen_pn@n_o_s_p_a_m.yahoo.com> wrote in message
>>>>> news:rMadnYYXWcwzuXPQnZ2dnUVZ_oydnZ2d@giganews.com...
>>>>>> Hi,
>>>>>> I'm trying to counter some pulses (2-10usec width) using the LPC1768
>>>>>> Cortex-M3 microcontroller. There are 2 channels on which I have count
>>>>>> the pulses on. I have to allow only pulses that are>=2us pulse width
>>>>>> (so we cannot simply use the counter function).
>>>>>>
>>>>>> But it is turning out to be an incredibly difficult feat to achieve
>>>>>> this on
>>>>>> a 100MHz Cortex-M3. The problem arises when we try to measure the
>>>>>> pulse width to allow only pulses lasting 2us or higher. We are trying
>>>>>> to use a capture pin and the capture interrupt. First we set it for a
>>>>>> falling edge and capture the timer value, then set it to rising edge
>>>>>> and again capture the timer value. Then take the difference between
>>>>>> two values to see if it
>>>>>>> 2usec. But the processing itself is taking over 6-8usec. We also
>>>>>>> tried
>>>>>> simply using a external interrupt& reading timer registers with each
>>>>>> edge,
>>>>>> but with the same results.
>>>>>>
>>>>>>
>>>> Can you connect the signal to two pins, so that one will capture times
>>>> on a falling edge, and the other will capture times and cause an
>>>> interrupt on the rising edge?
>>>>
>>>> Have you considered some analogue tricks, assuming you don't need too
>>>> much accuracy for your measurements? A diode, a capacitor and a couple
>>>> of resistors should let you charge up a capacitor during the pulse.
>>>> Measure the voltage on the capacitor with the ADC when the pulse is
>>>> complete. Or use an analogue comparator to trigger an interrupt on the
>>>> processor once the capacitor voltage is over a certain level.
>>>>
>>>>>> We cannot seem to understand how or why the processing is taking so
>>>>>> long there are hardly 3-4 "C" statements in the interrupt routine
>>>>>> (change edge of capture, take the difference in captured values and
>>>>>> compare if it is
>>>>>>> =2us). Any ideas how this feat could be accomplished on a LPC1768?
>>>>>
>>>>> I can't help with this specific micro, but I've encountered the
>>>>> problem
>>>>> on various other platforms, and solved it by using polling rather than
>>>>> interrupts. Avoiding the context saving associated with interrupts can
>>>>> save you a significant amount of time if your processing task is
>>>>> otherwise reasonably trivial, as yours seems to be. This solution does
>>>>> depend on being able to sacrifice some processing time for the polling
>>>>> loop (interrupts disabled), that will depend on the pulse frequency
>>>>> and
>>>>> what else the device has to contend with.
>>>>>
>>>>> You might also consider doing your time-critical coding in asm, if
>>>>> that's possible. Have oyu checked your listings to see how many
>>>>> removeable instructions the compiler is inserting?
>>>>>
>>>>>
>>>>>
>>>> The compiler may be generating too much code for context saving. A
>>>> common cause of that is to call external functions from within the
>>>> interrupt function - since the compiler doesn't know what registers it
>>>> uses, it must save everything.
>>>>
>>>> And are you using appropriate flags for the compiler? Many people
>>>> complain their compiler code is poor, when it turns out they have
>>>> disabled optimisation...
>>>
>>> This is almost everything that I was going to suggest.
>>>
>>> Look at the assembly that your compiler is generating, and make sure
>>> that
>>> it's really as efficient as can possibly be. If it isn't, just go to the
>>> well and write the ISR in assembly language.
>>>
>>
>> Long before you consider writing the ISR in assembly, check that the C
>> code is decently written (if you can't write appropriate C code for an
>> interrupt routine, you are unlikely to be able to write good assembly),
>> and check that you are using your compiler properly (and that you have a
>> decent compiler). Used properly (which includes studying the generated
>> assembly), C on a processor like this should be very close to the speed
>> of optimal assembly.
>
> The only caveat would be if the compiler wasn't very good at generating
> efficient code -- but I'd have a hard time believing that for a Cortex
> processor. Not setting up the optimization flags correctly, and
> inadvertently writing inefficient C code -- yes.
>
> I'm just old and suspicious, and remembering too many bad experiences
> with compilers that _were_ crappy.
>

There are still some inefficient compilers around, but they are mainly 
for the smaller processors that are hard to work with.  On something 
like the Cortex, it's easy to generate reasonable code for short C 
functions.  The big differences are for things like automatic use of 
vector or DSP functions, smarter loop unrolling, interprocedural 
optimisations, etc. - but they should not make a difference in a case 
like this.

>>> Even on a 100MHz processor, 2us is an awfully short period of time.
>>> Doing some sort of preconditioning makes a lot of sense to me, although
>>> Schmitt trigger logic is rarely accurate enough for any practical
>>> purpose
>>> beyond glitch reduction. It should be possible to use a multivibrator
>>> (74xx126??) and some gates to do this if an RC and a Schmitt isn't
>>> accurate enough. An asynchronous clear counter would do the trick as
>>> well -- hold it in reset when the pulse is inactive, and trigger the
>>> micro whenever it counts to it's 'carry out' value. Then you just need
>>> to feed it an appropriate clock to hit your "more than 2us" criterion.
>>>
>>> If your ISR pops off quickly enough, and if it doesn't waste too much
>>> time, spin in the ISR until the signal goes inactive, and check the
>>> time. If that ">2us" can mean "sometimes _much_ greater than 2us" then
>>> this obviously won't work.
>>>
>>> I like the "two pins" approach, if you can make it unambiguous. Make
>>> that microcontroller hardware work for you, if you can.
>>>
>>
>> I fully agree here. Software is not good at doing things with a few
>> microsecond timing - any processor fast enough to have plenty of
>> instruction cycles in that time will have unpredictable latencies due to
>> caches, pipelines, buffers, etc. But this should be fairly simple code -
>> with enough care, it could be done.
>
> However: you'll need to be exceedingly strict about your interrupt
> response time elsewhere. If you have a habit of turning off interrupts
> to make sure that operations are atomic, and _particularly_ if you're
> one of a team of programmers that do this, then you have to be Really
> Really Strict about just how long these intervals last.
>
> Because all it'll take is one guy turning off interrupts while he
> calculates pi to 100 decimal places in some bit of shared memory, and
> your little interval counter will fail.
>

Yes, there is always someone that thinks the UART receive interrupt 
routine is the best place to interpret incoming telegrams, act on them, 
and build up a reply...

Reply by navman ●June 8, 20112011-06-08

Thanks for your valuable inputs. We tried toggling an IO pin inside
while(1) and see that it only generates pulses of 150ns width. So is there
something wrong with the clock configuration? 

We use the LPCXpresso compiler. I'll try to post the code here for the
clock init a little later. 	   
					
---------------------------------------		
Posted through http://www.EmbeddedRelated.com

Reply by David Brown ●June 8, 20112011-06-08

On 08/06/2011 12:44, navman wrote:
> Thanks for your valuable inputs. We tried toggling an IO pin inside
> while(1) and see that it only generates pulses of 150ns width. So is there
> something wrong with the clock configuration?
>

It /sounds/ likely that there is something wrong, but I am not sure what 
you should expect here.  Certainly for some ARM devices IO pin access is 
surprisingly slow.  I don't know this chip, so I'll let others give more 
definite answers.

> We use the LPCXpresso compiler. I'll try to post the code here for the
> clock init a little later. 	
> 					

LPCXpresso uses gcc, which will produce solid and efficient code.  But 
that depends on the compiler flags - if optimisation is turned off, you 
will get very big and slow object code.

Reply by Rich Webb ●June 8, 20112011-06-08

On Wed, 08 Jun 2011 05:44:08 -0500, "navman"
<naveen_pn@n_o_s_p_a_m.yahoo.com> wrote:

>Thanks for your valuable inputs. We tried toggling an IO pin inside
>while(1) and see that it only generates pulses of 150ns width. So is there
>something wrong with the clock configuration? 

Take a look at the generated assembly to see how many actual
instructions are executed to implement the toggle feature. You should be
able to works back from there to the effective instruction cycle time.

-- 
Rich Webb     Norfolk, VA

Reply by Arlet Ottens ●June 8, 20112011-06-08

On 06/08/2011 02:12 PM, David Brown wrote:
> On 08/06/2011 12:44, navman wrote:
>> Thanks for your valuable inputs. We tried toggling an IO pin inside
>> while(1) and see that it only generates pulses of 150ns width. So is
>> there
>> something wrong with the clock configuration?
>>
>
> It /sounds/ likely that there is something wrong, but I am not sure what
> you should expect here. Certainly for some ARM devices IO pin access is
> surprisingly slow. I don't know this chip, so I'll let others give more
> definite answers.

The LPC series use a special fast gpio interface, which is actually 
pretty good compared to older APB based GPIO interfaces.

I don't have a LPC17xx, but I just tried it on a LPC2478 which has a 
similar FGPIO interface (but an ARM7 core instead of Cortex-M3), doing:

while( 1 )
{
     FIO0SET = BITMASK;
     FIO0CLR = BITMASK;
}

This results in pulses of 2 cycles high, and 5 cycles low.

In assembly, this loop is implemented as 2 stores and a branch.

Toggling the same pin with:

while( 1 )
{
     FIO0PIN ^= BITMASK;
}

results in 9 cycles high, 9 cycles low for a load, exor, store, and 
branch. All of this using gcc -O2.

15 cycles for the pulse width seems a bit high in comparison.

Reply by Ulf Samuelsson ●June 8, 20112011-06-08

navman skrev 2011-06-07 15:16:
> Hi,
> I'm trying to counter some pulses (2-10usec width) using the LPC1768
> Cortex-M3 microcontroller. There are 2 channels on which I have count the
> pulses on. I have to allow only pulses that are>=2us pulse width (so we
> cannot simply use the counter function).
>
> But it is turning out to be an incredibly difficult feat to achieve this on
> a 100MHz Cortex-M3. The problem arises when we try to measure the pulse
> width to allow only pulses lasting 2us or higher. We are trying to use a
> capture pin and the capture interrupt. First we set it for a falling edge
> and capture the timer value, then set it to rising edge and again capture
> the timer value. Then take the difference between two values to see if it
>> 2usec. But the processing itself is taking over 6-8usec. We also tried
> simply using a external interrupt&  reading timer registers with each edge,
> but with the same results.
>
> We cannot seem to understand how or why the processing is taking so long
> there are hardly 3-4 "C" statements in the interrupt routine (change edge
> of capture, take the difference in captured values and compare if it is
>> =2us). Any ideas how this feat could be accomplished on a LPC1768?
> 	
> 					
> ---------------------------------------		
> Posted through http://www.EmbeddedRelated.com

I know how I would implement this on an Atmel AT32UC3C.

You use the pulse input as a gate to the clock of a counter (CNT0).
CNT0 will count up, while the pulse is active.
When the pulse ends, the counter should be reset.

A compare register is used to determine if the signal is > 2 us.
If CNT0 matches the compare register, an "event" is triggered.
The event is used to clock another counter CNT1.

Best Regards
Ulf Samuelsson

Reply by Leon ●June 9, 20112011-06-09

On Jun 8, 11:44=A0am, "navman" <naveen_pn@n_o_s_p_a_m.yahoo.com> wrote:
> Thanks for your valuable inputs. We tried toggling an IO pin inside
> while(1) and see that it only generates pulses of 150ns width. So is ther=
e
> something wrong with the clock configuration?
>
> We use the LPCXpresso compiler. I'll try to post the code here for the
> clock init a little later. =A0 =A0 =A0 =A0

ARM chips have surprisingly slow I/O. It's better that it was with the
earlier devices like the LPC2106, but it's still not very good.

Leon

Reply by Mark Borgerson ●June 9, 20112011-06-09

In article <24a2423f-e48f-4981-88f7-c761120c2e63@
32g2000vbe.googlegroups.com>, leon355@btinternet.com says...
> 
> On Jun 8, 11:44&#4294967295;am, "navman" <naveen_pn@n_o_s_p_a_m.yahoo.com> wrote:
> > Thanks for your valuable inputs. We tried toggling an IO pin inside
> > while(1) and see that it only generates pulses of 150ns width. So is there
> > something wrong with the clock configuration?
> >
> > We use the LPCXpresso compiler. I'll try to post the code here for the
> > clock init a little later. &#4294967295; &#4294967295; &#4294967295; &#4294967295;
> 
> ARM chips have surprisingly slow I/O. It's better that it was with the
> earlier devices like the LPC2106, but it's still not very good.
> 
That was particularly true where setting or clearing a bit required
a read-modify-write sequence.  Many of the Cortex M3 systems I'm
working with now have separate  bit set and bit clear registers which
reduces the instruction count.

I just looked at a code sequence that toggles a bit to clock data
from a FIFO to an LCD display.   It is a partially-unrolled loop with
a sequence of 16 bit bit clear and bit set instructions.

In C it is a sequence of: 

     GPIOB->BRR = FIFO_RD_BIT | LCD_WR_BIT;
     GPIOB->BSRR = FIFO_RD_BIT | LCD_WR_BIT;

     GPIOB->BRR = FIFO_RD_BIT | LCD_WR_BIT;
     GPIOB->BSRR = FIFO_RD_BIT | LCD_WR_BIT;

     . . .

The Thumb code generated is

     STR  R3, [R0]
     STR  R3, [R0, #4]

     STR  R3, [R0]
     STR  R3, [R0, #4]

R0 is loaded with the port base address and R3 is loaded with the
bit pattern before the start of the loop.  Each instruction is a
single 16-bit word.

I don't think I'm going to beat that with any assembly-language 
optimizations. ;-)  The code was generated with the IAR compiler
and with optimizations level set to high.

When running on an STM32F103 with the main clock set to 64mHz,  the
bit toggles at 16mHz.  This is consistent with the fact that
the local peripheral bus for the general purpose IO bits  is running at 
1/2 the main clock rate, since that bus is rated for a maximum clock
rate of 36MHz.  Updating the whole QVGA display with 2 bytes/pixel
(in RGB(565) format) takes about 14mSec.

It certainly helps that the engineer who designed the board put all
the FIFO and LCD clocking bits on pins from the same peripheral
port.  If they were on different ports, it would take a separate
instruction for each clock bit---doubling the number of instructions.

The Cortex M3 also implements bit banding,  where  each bit in
a peripheral register or RAM word is assigned  a memory location
of  its own.    That means a bit test operation can be reduced
to:
   bitstatus =   UART_RCV_StatusBitBand; // returns either 0x01 or 0x00 

instead of 

   bitstatus =  UART_Status_Register  &  RCV_Status_Mask;
                 // returns either the mask bit or zero

I haven't yet had to optimize an interrupt handler to the
degree that would benefit from  this capability,  but it
could cut some instructions from  a handler that required
you to figure out which of a number of possible bits caused
an interrupt.  Writing the code to use the bit banding
would require that you pre-calculate the proper bit band
address for each port bit that you want to test.

Mark Borgerson

Previous 1 234 5 6 Next

pulse counter using LPC1768 proving to very challenging

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group