Cortex-M3 vs PIC32 divide instruction| page 2

Reply by Jon Kirwan ●September 6, 20112011-09-06

On Tue, 6 Sep 2011 12:40:00 +0000 (UTC),
Anders.Montonen@kapsi.spam.stop.fi.invalid wrote:

>Jon Kirwan <jonk@infinitefactors.org> wrote:
>
>> I still need to find out if the DIV can be interrupted.
>
>Footnote e to table 18-1 in the Cortex-M3 r2p0 TRM states that
>"DIV is interruptible (abandoned/restarted), with worst case latency of
>one cycle."
>
>-a

Thanks!

Jon

Reply by David Brown ●September 6, 20112011-09-06

On 06/09/2011 13:30, Jon Kirwan wrote:
> On Tue, 06 Sep 2011 12:21:08 +0200, David Brown
> <david@westcontrol.removethisbit.com>  wrote:
> The purpose is due diligence and to illuminate speculations I
> may yet develop.  It's not a crystal clear process that I can
> readily explain.  But I do know _what_ I want to know.
>
> If it helps, imagine that I'd like to develop a cycle-
> accurate simulator.
>
>>> I simply need very detailed information.  I've been having a
>>> little difficultly laying hands on it in the Cortex-M3 case.
>>> I'm hoping someone can point me well.
>>
>> I would be surprised if you can get the detailed information you would
>> like - such implementation details tend to be well hidden from mere mortals.
>
> Appears to be hidden from me, tonight.  So maybe you are
> right.
>
> I _am_ able to garner better information from the M4k.  I
> still need to find out if the DIV can be interrupted.
>

The M4K is an older architecture (or at least it is closer to the older 
MIPS architectures), with a simpler structure and lots more information 
about it.  You'll get better luck there.

>> One thing you might be able to find out about is how the division
>> affects pipelining - but on an M3, with its short pipeline, that won't
>> make a big difference.
>
> Yes, 3 stage vs 5 stage on the M4k.  I also took note that
> Microchip licensed the M14k, too.
>
>> Regarding interrupts, AFAIK instructions on the M3 (and MIPS) are not
>> interruptable (unlike some m68k cpus, for example), so maximum interrupt
>> latency will be affected by division instructions.
>
> Yes, that is one of several considerations I have in mind.
> Only one of them.  But an important one.  I am not yet
> certain about the M4k on this point.
>

The key thing to look for here is the data that is stored on the stack, 
or in dedicated registers, when an interrupt or other exception hits. 
On the m68k, for example, the processor can generate a rather extensive 
stack frame including the state of internal registers that are not 
otherwise accessible, holding partial results for division, progress 
counters for move-multiple instructions, etc.  On RISC architectures you 
don't get a stack frame for exceptions, but critical context data is put 
into dedicated registers that must be preserved if you are going to 
enable nested interrupts.  You should be able to see from the details of 
these registers where things can be interrupted.

> Anyway, thanks for the thoughts.  I will see what I can find
> out there.  It is an omen that you don't know.  So that
> suggests your earlier point about the difficulty here may be
> correct.
>

While I know many things, I don't know everything!  My knowledge of MIPS 
is based on a book on the "MIPS RISC Microarchitecture" I found in a 
second hand bookstore 20 years ago, and read for fun before I had even 
thought of doing embedded programming as a job.

> Jon
>
>>>
>>> But thanks for the time.  It is appreciated.
>>>
>>> Jon

Reply by David Brown ●September 6, 20112011-09-06

On 06/09/2011 14:40, Anders.Montonen@kapsi.spam.stop.fi.invalid wrote:
> Jon Kirwan<jonk@infinitefactors.org>  wrote:
>
>> I still need to find out if the DIV can be interrupted.
>
> Footnote e to table 18-1 in the Cortex-M3 r2p0 TRM states that
> "DIV is interruptible (abandoned/restarted), with worst case latency of
> one cycle."
>

OK, it is interruptible in that way - that's good for avoiding long 
interrupt latency.  Some cpu's (such as some m68k devices) can be 
interrupted in the middle of an instruction like divide, and then 
continue where they left off rather than starting anew.

Reply by Jon Kirwan ●September 6, 20112011-09-06

On Tue, 06 Sep 2011 15:31:58 +0200, David Brown
<david@westcontrol.removethisbit.com> wrote:

>On 06/09/2011 13:30, Jon Kirwan wrote:
>> On Tue, 06 Sep 2011 12:21:08 +0200, David Brown
>> <david@westcontrol.removethisbit.com>  wrote:
>> The purpose is due diligence and to illuminate speculations I
>> may yet develop.  It's not a crystal clear process that I can
>> readily explain.  But I do know _what_ I want to know.
>>
>> If it helps, imagine that I'd like to develop a cycle-
>> accurate simulator.
>>
>>>> I simply need very detailed information.  I've been having a
>>>> little difficultly laying hands on it in the Cortex-M3 case.
>>>> I'm hoping someone can point me well.
>>>
>>> I would be surprised if you can get the detailed information you would
>>> like - such implementation details tend to be well hidden from mere mortals.
>>
>> Appears to be hidden from me, tonight.  So maybe you are
>> right.
>>
>> I _am_ able to garner better information from the M4k.  I
>> still need to find out if the DIV can be interrupted.
>
>The M4K is an older architecture (or at least it is closer to the older 
>MIPS architectures), with a simpler structure and lots more information 
>about it.  You'll get better luck there.

ARM has been around a LONG time.  But I worked on MIPS R2000
back circa 1986/1987.  Was that before the ARM/Acorn?  I
don't recall when the R4000 came out but it must have been
after the Acorn.  I think trying to decide which is older is
going to be a bunch of quibbling.

>>> One thing you might be able to find out about is how the division
>>> affects pipelining - but on an M3, with its short pipeline, that won't
>>> make a big difference.
>>
>> Yes, 3 stage vs 5 stage on the M4k.  I also took note that
>> Microchip licensed the M14k, too.
>>
>>> Regarding interrupts, AFAIK instructions on the M3 (and MIPS) are not
>>> interruptable (unlike some m68k cpus, for example), so maximum interrupt
>>> latency will be affected by division instructions.
>>
>> Yes, that is one of several considerations I have in mind.
>> Only one of them.  But an important one.  I am not yet
>> certain about the M4k on this point.
>
>The key thing to look for here is the data that is stored on the stack, 
>or in dedicated registers, when an interrupt or other exception hits. 
>On the m68k, for example, the processor can generate a rather extensive 
>stack frame including the state of internal registers that are not 
>otherwise accessible, holding partial results for division, progress 
>counters for move-multiple instructions, etc.  On RISC architectures you 
>don't get a stack frame for exceptions, but critical context data is put 
>into dedicated registers that must be preserved if you are going to 
>enable nested interrupts.  You should be able to see from the details of 
>these registers where things can be interrupted.

There's a point for me to go look up.

>> Anyway, thanks for the thoughts.  I will see what I can find
>> out there.  It is an omen that you don't know.  So that
>> suggests your earlier point about the difficulty here may be
>> correct.
>
>While I know many things, I don't know everything!  My knowledge of MIPS 
>is based on a book on the "MIPS RISC Microarchitecture" I found in a 
>second hand bookstore 20 years ago, and read for fun before I had even 
>thought of doing embedded programming as a job.

Mine all comes from working with the R2000 and a nice, long
lecture for a couple of days from Hennessey when I visited
them back when they first opened up an office near Weitek
(their first office.)  I'm very comfortable with the R2000.

Jon

Reply by Arlet Ottens ●September 6, 20112011-09-06

On 09/06/2011 03:31 PM, David Brown wrote:

> The key thing to look for here is the data that is stored on the stack,
> or in dedicated registers, when an interrupt or other exception hits. On
> the m68k, for example, the processor can generate a rather extensive
> stack frame including the state of internal registers that are not
> otherwise accessible, holding partial results for division, progress
> counters for move-multiple instructions, etc. On RISC architectures you
> don't get a stack frame for exceptions, but critical context data is put
> into dedicated registers that must be preserved if you are going to
> enable nested interrupts. You should be able to see from the details of
> these registers where things can be interrupted.

Interestingly, the Cortex isn't very pure RISC anymore, and it does have 
a stack frame for exceptions. It doesn't save partial results, but it 
does save a couple of registers, which allow an interrupt handler to be 
written in pure C, and it allows hardware nesting of interrupts. The 
link register which normally contains the return address is set to a 
magic value, so on function return, the core knows to do a return from 
exception instead.

Reply by David Brown ●September 6, 20112011-09-06

On 06/09/2011 15:51, Jon Kirwan wrote:
> On Tue, 06 Sep 2011 15:31:58 +0200, David Brown
> <david@westcontrol.removethisbit.com>  wrote:
>
>> On 06/09/2011 13:30, Jon Kirwan wrote:
>>> On Tue, 06 Sep 2011 12:21:08 +0200, David Brown
>>> <david@westcontrol.removethisbit.com>   wrote:
>>> The purpose is due diligence and to illuminate speculations I
>>> may yet develop.  It's not a crystal clear process that I can
>>> readily explain.  But I do know _what_ I want to know.
>>>
>>> If it helps, imagine that I'd like to develop a cycle-
>>> accurate simulator.
>>>
>>>>> I simply need very detailed information.  I've been having a
>>>>> little difficultly laying hands on it in the Cortex-M3 case.
>>>>> I'm hoping someone can point me well.
>>>>
>>>> I would be surprised if you can get the detailed information you would
>>>> like - such implementation details tend to be well hidden from mere mortals.
>>>
>>> Appears to be hidden from me, tonight.  So maybe you are
>>> right.
>>>
>>> I _am_ able to garner better information from the M4k.  I
>>> still need to find out if the DIV can be interrupted.
>>
>> The M4K is an older architecture (or at least it is closer to the older
>> MIPS architectures), with a simpler structure and lots more information
>> about it.  You'll get better luck there.
>
> ARM has been around a LONG time.  But I worked on MIPS R2000
> back circa 1986/1987.  Was that before the ARM/Acorn?  I
> don't recall when the R4000 came out but it must have been
> after the Acorn.  I think trying to decide which is older is
> going to be a bunch of quibbling.
>

Yes, ARM has been around for ages - it was probably around 1988 that I 
first used an ARM (Acorn Risc Machine) on an Archimedes.  But the 
architecture has gone through a great many changes since then - the 
Cortex M3 is significantly different both in programming model and in 
implementation.  MIPS has remained a lot more constant.  So the M3 is 
really one a few years old, while the R4000 is /much/ older, and much 
more studied.

>>>> One thing you might be able to find out about is how the division
>>>> affects pipelining - but on an M3, with its short pipeline, that won't
>>>> make a big difference.
>>>
>>> Yes, 3 stage vs 5 stage on the M4k.  I also took note that
>>> Microchip licensed the M14k, too.
>>>
>>>> Regarding interrupts, AFAIK instructions on the M3 (and MIPS) are not
>>>> interruptable (unlike some m68k cpus, for example), so maximum interrupt
>>>> latency will be affected by division instructions.
>>>
>>> Yes, that is one of several considerations I have in mind.
>>> Only one of them.  But an important one.  I am not yet
>>> certain about the M4k on this point.
>>
>> The key thing to look for here is the data that is stored on the stack,
>> or in dedicated registers, when an interrupt or other exception hits.
>> On the m68k, for example, the processor can generate a rather extensive
>> stack frame including the state of internal registers that are not
>> otherwise accessible, holding partial results for division, progress
>> counters for move-multiple instructions, etc.  On RISC architectures you
>> don't get a stack frame for exceptions, but critical context data is put
>> into dedicated registers that must be preserved if you are going to
>> enable nested interrupts.  You should be able to see from the details of
>> these registers where things can be interrupted.
>
> There's a point for me to go look up.
>
>>> Anyway, thanks for the thoughts.  I will see what I can find
>>> out there.  It is an omen that you don't know.  So that
>>> suggests your earlier point about the difficulty here may be
>>> correct.
>>
>> While I know many things, I don't know everything!  My knowledge of MIPS
>> is based on a book on the "MIPS RISC Microarchitecture" I found in a
>> second hand bookstore 20 years ago, and read for fun before I had even
>> thought of doing embedded programming as a job.
>
> Mine all comes from working with the R2000 and a nice, long
> lecture for a couple of days from Hennessey when I visited
> them back when they first opened up an office near Weitek
> (their first office.)  I'm very comfortable with the R2000.
>
> Jon

Reply by Tim Wescott ●September 6, 20112011-09-06

On Tue, 06 Sep 2011 09:54:00 +0200, David Brown wrote:

> On 06/09/2011 09:39, Jon Kirwan wrote:

>> snip <<

> If you want to do very fast floating point, get a processor that has
> hardware floating point (Cortex-M4 will be available soon, there are
> real MIPS cpu's available instead of PIC32, there are plenty of
> PPC-based microcontrollers with hardware floating point, etc.).

Sometimes the goal is to write fast-enough floating point in a processor 
that won't otherwise break the system budget, be it power consumption/
dissipation, size, BOM cost, etc.

Jon's asking about _writing_ a floating point library, so I assume he's 
working at a project front-end, counting clock cycles to make sure that 
things will work.

-- 
www.wescottdesign.com

Reply by Jim Granville ●September 7, 20112011-09-07

On Sep 6, 11:17=A0pm, Jon Kirwan <j...@infinitefactors.org> wrote:
> On Tue, 6 Sep 2011 10:32:53 +0000 (UTC),
>
> Anders.Monto...@kapsi.spam.stop.fi.invalid wrote:
> >Jon Kirwan <j...@infinitefactors.org> wrote:
> >> Also, it's been a bit of a pain searching for good assembler docs on t=
he
> >> Cortex-M3. =A0But I've only been at it for about an hour or so,
> >> so it's likely I am just slow and ignorant -- not that there
> >> aren't good caches out there I should have found.
>
> >You want the ARMv7-M Architecture Reference Manual off of ARM's website.
>
> I think I have that for the assembly part of things. =A0If you
> are referring to the near-end where the Appendices are at,
> then I'm already aware of those sections (B, C, F, G, H.) =A0I
> did also look at the timing information in Chapter 18-1, for
> example, of DDI0337 on the Cortex-M3 for r1p1, r2p0, and
> r2p1. =A0Though perhaps I haven't read it well enough.
>
> I think I have been there. =A0But I may have missed something,
> too, and I appreciate the suggestion
>
> Jon

 If the speed of this matters a lot, you are best to simply get a
device, and try it.
 'Modern data' tends to be more and more superficial, and that is one
reason there are more cheap Eval/Starter kits.
 Note that other  devices are not standing still either - I see both
TI and ADI are now boasting of sub $2 DSPs (tho RAM based)

 TI's strangely lacks Timer capture, (they must want you to buy other
variants there) but does have high speed USB for a small cost adder.
 ADIs has good timers, but no USB.
 Both, of course, have very fast maths support, and quite large ROMS
with Floating point as well.
 -jg

Reply by dp ●September 7, 20112011-09-07

On Sep 7, 2:42=A0pm, Jim Granville <j.m.granvi...@gmail.com> wrote:
> On Sep 6, 11:17=A0pm, Jon Kirwan <j...@infinitefactors.org> wrote:
>
>
>
> > On Tue, 6 Sep 2011 10:32:53 +0000 (UTC),
>
> > Anders.Monto...@kapsi.spam.stop.fi.invalid wrote:
> > >Jon Kirwan <j...@infinitefactors.org> wrote:
> > >> Also, it's been a bit of a pain searching for good assembler docs on=
 the
> > >> Cortex-M3. =A0But I've only been at it for about an hour or so,
> > >> so it's likely I am just slow and ignorant -- not that there
> > >> aren't good caches out there I should have found.
>
> > >You want the ARMv7-M Architecture Reference Manual off of ARM's websit=
e.
>
> > I think I have that for the assembly part of things. =A0If you
> > are referring to the near-end where the Appendices are at,
> > then I'm already aware of those sections (B, C, F, G, H.) =A0I
> > did also look at the timing information in Chapter 18-1, for
> > example, of DDI0337 on the Cortex-M3 for r1p1, r2p0, and
> > r2p1. =A0Though perhaps I haven't read it well enough.
>
> > I think I have been there. =A0But I may have missed something,
> > too, and I appreciate the suggestion
>
> > Jon
>
> =A0If the speed of this matters a lot, you are best to simply get a
> device, and try it.
> =A0'Modern data' tends to be more and more superficial, and that is one
> reason there are more cheap Eval/Starter kits.
> =A0Note that other =A0devices are not standing still either - I see both
> TI and ADI are now boasting of sub $2 DSPs (tho RAM based)
>
> =A0TI's strangely lacks Timer capture, (they must want you to buy other
> variants there) but does have high speed USB for a small cost adder.
> =A0ADIs has good timers, but no USB.
> =A0Both, of course, have very fast maths support, and quite large ROMS
> with Floating point as well.
> =A0-jg

Last (only...:) ) time I used a TI DSP was apr. 10 years ago,
the 5420. Their divide was straight forward, use "subtract
conditionally"
in a repeat (penalty free) loop.
I also have wondered - just vaguely, though - how do they accelerate
division on various architectures, e.g. the power core I use now
needs only 14 (or was it 16?) cycles for a 32/32, older
implementations
of that core (the original 603e, that is) needed 30+, 37 IIRC.
I have been moaning so many times of having to write yet another
division - I think the only architecture which saved me that was the
68k, ppc didn't, it does not have the 64/32 68k has, not on 32 bit
machines) that I use the chance to ask Jon to share his findings,
I am also really curious about it.

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Reply by Jon Kirwan ●September 7, 20112011-09-07

On Wed, 7 Sep 2011 04:42:13 -0700 (PDT), Jim Granville
<j.m.granville@gmail.com> wrote:

>On Sep 6, 11:17&#4294967295;pm, Jon Kirwan <j...@infinitefactors.org> wrote:
>> On Tue, 6 Sep 2011 10:32:53 +0000 (UTC),
>>
>> Anders.Monto...@kapsi.spam.stop.fi.invalid wrote:
>> >Jon Kirwan <j...@infinitefactors.org> wrote:
>> >> Also, it's been a bit of a pain searching for good assembler docs on the
>> >> Cortex-M3. &#4294967295;But I've only been at it for about an hour or so,
>> >> so it's likely I am just slow and ignorant -- not that there
>> >> aren't good caches out there I should have found.
>>
>> >You want the ARMv7-M Architecture Reference Manual off of ARM's website.
>>
>> I think I have that for the assembly part of things. &#4294967295;If you
>> are referring to the near-end where the Appendices are at,
>> then I'm already aware of those sections (B, C, F, G, H.) &#4294967295;I
>> did also look at the timing information in Chapter 18-1, for
>> example, of DDI0337 on the Cortex-M3 for r1p1, r2p0, and
>> r2p1. &#4294967295;Though perhaps I haven't read it well enough.
>>
>> I think I have been there. &#4294967295;But I may have missed something,
>> too, and I appreciate the suggestion
>>
>> Jon
>
> If the speed of this matters a lot, you are best to simply get a
>device, and try it.

Jim, there is a difference between knowing something through
theory and knowing something only through experimental
result.  Although it is _practical_ and often _sufficient_ to
know through result, it is also true that all I'd learn is
the results for the specific cases I'm able to spend time
testing.  Theory informs a volume.  Results inform specific
points within that volume.  I want both.  Just buying a
device only gives me a few data points.  That's not enough.

In the case of the PIC32, I have the theory.  So I am fully
able to predict just about any situation I'm given.  (Except
that I still don't have the theory about what happens in the
presence of an exception -- but I will get that from
Microchip directly.)

Anyway, I know you are being practical.  But I want to go
beyond knowing only what a few tests may tell me.

> 'Modern data' tends to be more and more superficial, and that is one
>reason there are more cheap Eval/Starter kits.

Yes, but the designers _know_ the theory.  So it is available
somewhere.  And I'm not really wanting to poke out
experimental results and try and develop theories of my own
that match what I observe when it might just be nice to get
the low-down from someone who actually knows what is going
on.  Which is why I decided to just ask here.  (The other
option would be to write ARM, I suppose -- and I will do that
if nothing comes of the details here and simply hope they are
moved to respond to me.  I _know_ Microchip will respond,
from past experience with them.)

> Note that other  devices are not standing still either - I see both
>TI and ADI are now boasting of sub $2 DSPs (tho RAM based)
> TI's strangely lacks Timer capture, (they must want you to buy other
>variants there) but does have high speed USB for a small cost adder.
> ADIs has good timers, but no USB.
> Both, of course, have very fast maths support, and quite large ROMS
>with Floating point as well.

I am familiar with older families from both through coding
applications -- the ADSP-21xx from ADI; the TMS320C30 and C40
from TI.  I'm not completely unaware of newer parts, too.

But like most projects, there are a number of boundary
conditions involved and the DIV details I mentioned is only
one of many.  But DSP processing is decidely NOT the main
focus nor is floating point.  I merely mentioned FP as a
segue, because I felt that anyone writing assembly coded FP
would possibly know the theory I was looking for.  That
doesn't mean that is my focus.  I also mentioned interrupt
latency issues, later.  There are many considerations.

Jon