> The PIC32's MDU operates "autonomously." So it continues on
> a division _and_ following instructions so long as an IU
> pipeline stall isn't triggered with the use of an MDU op. I
> am curious about how it functions in the presence of
> interrupts.
I couldn't find any information in either the PIC32 docs or the MIPS
architecture docs, but I'm leaning towards the MDU ignoring interrupts
altogether. If the ISR uses the MDU, then the pipeline will stall until
the previous instruction completes.
> I haven't read up on the Cortex-M3 DIV and SDIV. Doc seems
> in several different places, too. But the Cortex-M3 may not
> be autonomous. Looking at DDI0337G, page 1-12, Figure 1-2,
> seems to suggest it isn't autonomous and is squarely in the
> "Ex" pathway. So I'd guess a Cortex-M3 must wait for it.
That is my understanding. You could try asking for information regarding
the division implementation on ARM's tech forum[1]. There's a lot of
noise, but also some interesting posts there.
-a
[1] <http://forums.arm.com/index.php?/forum/3-arm-tech/>
Reply by ●September 9, 20112011-09-09
Anders.Montonen@kapsi.spam.stop.fi.invalid wrote:
> I couldn't find any information in either the PIC32 docs or the MIPS
> architecture docs, but I'm leaning towards the MDU ignoring interrupts
> altogether. If the ISR uses the MDU, then the pipeline will stall until
> the previous instruction completes.
Replying to myself, but this is apparently how the MDU worked in
pre-MIPS32/64 days, nowadays it's a bit smarter. See pp. 108-109 in See
MIPS Run 2nd ed. (Seems to be easy enough to find naughty PDF versions if
you don't own a paper copy.)
-a
Reply by dp●September 9, 20112011-09-09
On Sep 9, 7:29=A0am, Jon Kirwan <j...@infinitefactors.org> wrote:
> ....
> Interesting you bring up DMA. =A0I was using the SiLabs part to
> gain access to an internal, 1MHz, 16 bit SAR ADC (which, if
> external, would have cost me as much as the chip or more) and
> it required DMA -- (given that this is an 8051 style cpu,
> that won't be surprising to anyone.) =A0Turns out, the
> documentation on the DMA section is poor and if you really
> want to know exactly what you are doing when you copy someone
> else's supposedly working code then the doc is not adequate
> to the task. =A0So I sent off my questions, worked with the
> local disty, they pressed, I pressed, we all pressed. =A0But
> SiLabs (US, anyway) didn't know. =A0Thing is, they had to track
> down the DMA section designer who apparently now is off-
> shore; I think in Singapore or something. =A0Took them months.
> He'd designed it 8 years before that time. =A0But I got my
> answer, at least. =A0Took maybe three months to do?
I was much luckier with the SDMA on the 5200B. Got in
touch with the guy who had architected it, he was as
helpful as he could possibly be, sent me some
more data in addition to what was to be found on the web.
Took me a few days to grasp it all and begin using it,
but once I did it was really useful. The final thing
I made for it was the Ethernet engine, the on-board
Ethernet controller is only FIFO-ed both ways, the rest
must be done by the SDMA. Took me a only day or so to
implement the queue of buffers with pointers to packets
etc. (obviously it took me a lot longer to integrate
Ethernet completely, but this was CPU code).
>=A0I am enjoying this.
>
> Jon
Me too.
Dimiter
Reply by Jon Kirwan●September 9, 20112011-09-09
On Fri, 9 Sep 2011 12:55:47 +0000 (UTC),
Anders.Montonen@kapsi.spam.stop.fi.invalid wrote:
>Anders.Montonen@kapsi.spam.stop.fi.invalid wrote:
>> I couldn't find any information in either the PIC32 docs or the MIPS
>> architecture docs, but I'm leaning towards the MDU ignoring interrupts
>> altogether. If the ISR uses the MDU, then the pipeline will stall until
>> the previous instruction completes.
>
>Replying to myself, but this is apparently how the MDU worked in
>pre-MIPS32/64 days, nowadays it's a bit smarter. See pp. 108-109 in See
>MIPS Run 2nd ed. (Seems to be easy enough to find naughty PDF versions if
>you don't own a paper copy.)
>
>-a
Well, it wasn't all that easy to find. Lots of Chinese
language versions floating about. But my chinese vocabulary
is about 30 characters or so. So I'm very limited. I can
count, draw "man" "big" and "too much", and things like
"mouth" "door" and "window." Then I start running out.
I did find this as the only 'good' version easily findable:
http://people.openrays.org/~comcat/godson/doc/See.MIPS.Run.2nd.en.pdf
2007, apparently, for the book. Some years, now.
I read the relevant material. It mostly addresses older
designs. For example, when it brings up the newer MIPS32
architecture at the top of page 109, it says "the
instructions behave themselves," which is not entirely
descriptive for me. Then it goes on to talk about "older
CPUs" for the entire rest of that paragraph, the next one,
and the next one before going on to another section. It
never does talk, in detail, about the very architecture I
want to know about. Microchip and the PIC32 aren't even
mentioned. The R4000 is, but the M4k is mentioned just once
near the top in a long list of names. On page 38 they say
that the multiply takes 4-12 clocks, so I know we are on
"different pages" already. It's an old book, by now.
But I love it, too!!! Thanks.
By the way, it says at the bottom on page 38, "Integer
multiply and divide operations never produce an exception;
not even divide by zero..." Maybe I can take it that the
PIC32 follows this guideline? Also, I read from this book
that pipeline exceptions are delayed (allowed to flush
through the pipeline) before being acted upon. But since by
the earlier note a DIV cannot cause a divide-by-zero error,
they never generate an exception. In fact, since the MDU is
"autonomous" it probably would be very hard for it to
"insert" any kind of exception into that pipeline flow,
anyway. So that makes sense, too.
Elsewhere, they do talk about the floating point coprocessing
taking multiple clocks. I can probably use their handling of
that, by analogy, to help me understand what would happen in
the MDU case. here, it says on page 52 that "FP computations
are allowed to proceed in parallel" with the execution of
later instructions, and the CPU is stalled if an instrction
reads a result register before the computation finishes."
Which suggests the answer to a question I'd posed earlier as
a possible difference between the Cortex-M3 and the M4k --
that the Cortex-M3 stalls until complete but the M4k does NOT
stall. But it also suggests that even in the face of an
interrupt event, the division continues and doesn't cause a
stall unless there is a need for that in the new instruction
stream.
Thanks for the book suggestion. It supplements what I
already know about the old R2000 TLBs, too. This is bringing
back lots of memories, now.
Jon
Reply by ●September 9, 20112011-09-09
Jon Kirwan <jonk@infinitefactors.org> wrote:
> Microchip and the PIC32 aren't even mentioned. The R4000 is, but the
> M4k is mentioned just once near the top in a long list of names. On
> page 38 they say that the multiply takes 4-12 clocks, so I know we are
> on "different pages" already. It's an old book, by now.
It is more concerned with the architecture (ie. MIPS32/64) than any
particular implementation. The first edition touched more on different
cores as the system level architecture wasn't standardised yet, but that's
at least supposed to be fixed by now.
> By the way, it says at the bottom on page 38, "Integer
> multiply and divide operations never produce an exception;
> not even divide by zero..." Maybe I can take it that the
> PIC32 follows this guideline?
Well, the MIPS32 architecture manual states that they don't, so I would
assume so.
> Which suggests the answer to a question I'd posed earlier as
> a possible difference between the Cortex-M3 and the M4k --
> that the Cortex-M3 stalls until complete but the M4k does NOT
> stall. But it also suggests that even in the face of an
> interrupt event, the division continues and doesn't cause a
> stall unless there is a need for that in the new instruction
> stream.
That was my conclusion as well.
> Thanks for the book suggestion. It supplements what I
> already know about the old R2000 TLBs, too. This is bringing
> back lots of memories, now.
I think it's a good computer architecture book, with a more practical bent
than the usual textbooks. I actually like the 2002 first edition better,
but the second one is definitely more relevant to modern CPUs.
-a
Reply by Jon Kirwan●September 9, 20112011-09-09
On Sat, 10 Sep 2011 00:55:21 +0000 (UTC),
Anders.Montonen@kapsi.spam.stop.fi.invalid wrote:
>Jon Kirwan <jonk@infinitefactors.org> wrote:
>
>><snip>
>> Thanks for the book suggestion. It supplements what I
>> already know about the old R2000 TLBs, too. This is bringing
>> back lots of memories, now.
>
>I think it's a good computer architecture book, with a more practical bent
>than the usual textbooks. I actually like the 2002 first edition better,
>but the second one is definitely more relevant to modern CPUs.
I'll look for the earlier one, as well, then. Knowledge like
this only very gradually fades away, if at all. I also like
paper, especially for something like this where I just lay
out on the floor to read, so I will likely see about getting
genuine editions.
Jon
Reply by Jim Granville●September 10, 20112011-09-10
On Sep 9, 11:49=A0am, Jon Kirwan <j...@infinitefactors.org> wrote:
>
> Interestingly, although the PIC32 achieves a fair pace (up to
> 80MHz), it isn't up to the MIPS synthesizable core, claimed
> to be over 400MHz capable at 90nm and over 200Mhz capable at
> 130nm.
That's more a flash artifact, than a process one.
Flash based uC seem to be rather stuck, for the last half decade, in
speed at the 80/100/120MHz region, and only the RAM based ones get
into the hundreds of MHz (like the sub $2 DSPs I mentioned above)
Even if the Flash limits the CPU speed, one of my peeves, is very few
parts allow the peripherals to run to the silicon process speed,
instead forcing the peripheral clock to be <=3D CPU clock.
-jg
Reply by Mark Borgerson●September 11, 20112011-09-11
In article <c831ff5d-0f43-4707-9788-
efc01c5e13ce@n19g2000prh.googlegroups.com>, j.m.granville@gmail.com
says...
>
> On Sep 9, 11:49�am, Jon Kirwan <j...@infinitefactors.org> wrote:
> >
> > Interestingly, although the PIC32 achieves a fair pace (up to
> > 80MHz), it isn't up to the MIPS synthesizable core, claimed
> > to be over 400MHz capable at 90nm and over 200Mhz capable at
> > 130nm.
>
> That's more a flash artifact, than a process one.
>
> Flash based uC seem to be rather stuck, for the last half decade, in
> speed at the 80/100/120MHz region, and only the RAM based ones get
> into the hundreds of MHz (like the sub $2 DSPs I mentioned above)
>
> Even if the Flash limits the CPU speed, one of my peeves, is very few
> parts allow the peripherals to run to the silicon process speed,
> instead forcing the peripheral clock to be <= CPU clock.
Peripherals, having to produce and accept signals from the outside
world using traces with larger capacitance and inductance, will
naturally be more limited in speed.
Mark Borgerson
Reply by Jim Granville●September 11, 20112011-09-11
On Sep 11, 3:32=A0pm, Mark Borgerson <mborger...@comcast.net> wrote:
> In article <c831ff5d-0f43-4707-9788-
> efc01c5e1...@n19g2000prh.googlegroups.com>, j.m.granvi...@gmail.com
> says...
>
>
>
>
>
> > On Sep 9, 11:49 am, Jon Kirwan <j...@infinitefactors.org> wrote:
>
> > > Interestingly, although the PIC32 achieves a fair pace (up to
> > > 80MHz), it isn't up to the MIPS synthesizable core, claimed
> > > to be over 400MHz capable at 90nm and over 200Mhz capable at
> > > 130nm.
>
> > That's more a flash artifact, than a process one.
>
> > Flash based uC seem to be rather stuck, for the last half decade, in
> > speed at the 80/100/120MHz region, and only the RAM based ones get
> > into the hundreds of MHz (like the sub $2 DSPs I mentioned above)
>
> > =A0Even if the Flash limits the CPU speed, one of my peeves, is very fe=
w
> > parts allow the peripherals to run to the silicon process speed,
> > instead forcing the peripheral clock to be <=3D CPU clock.
>
> Peripherals, having to produce and accept signals from the outside
> world using traces with larger capacitance and inductance, will
> naturally be more limited in speed.
Perhaps, but that ceiling is rather above the Flash-Speed limit I was
mentioning, and it clearly is not too much of an actual problem, as
SOME uC vendors can manage Peripheral clocks faster than core speeds.
It is a slow trend, I'd like to see become faster...
-jg
Reply by Jon Kirwan●September 11, 20112011-09-11
On Sat, 10 Sep 2011 16:50:32 -0700 (PDT), Jim Granville
<j.m.granville@gmail.com> wrote:
>On Sep 9, 11:49�am, Jon Kirwan <j...@infinitefactors.org> wrote:
>>
>> Interestingly, although the PIC32 achieves a fair pace (up to
>> 80MHz), it isn't up to the MIPS synthesizable core, claimed
>> to be over 400MHz capable at 90nm and over 200Mhz capable at
>> 130nm.
>
>That's more a flash artifact, than a process one.
I'm aware. A wide flash bus is often used these days to
compensate (with a little bit of ram to hold at least one
line of it.) There's some commentary about this regarding
the PIC32 in the architecture overview.
It's also true for the MSP430's new FRAM device, as well. A
wide FRAM bus is used with a little bit of cache ram to help
reads. And on any day, FRAM writes are much faster than
flash writes.
>Flash based uC seem to be rather stuck, for the last half decade, in
>speed at the 80/100/120MHz region, and only the RAM based ones get
>into the hundreds of MHz (like the sub $2 DSPs I mentioned above)
I didn't know which ones from TI and ADI you were referring
to. I had done a quick check on Digikey, selected DSPs and
selected ADI and TI and then sorted on price and that they
actually have one or two in stock. But nothing under $2, or
close, showed up on the first sorted page.
>Even if the Flash limits the CPU speed, one of my peeves, is very few
>parts allow the peripherals to run to the silicon process speed,
>instead forcing the peripheral clock to be <= CPU clock.
>-jg
Flash read cycle times may be slow (and writes so slow it
needn't be mentioned), but anything external to the device
will also be slower than the cpu core can achieve. Within
the chip, outputs have known loads and the transmission gates
and inverters can be sized exactly for that known situation.
Anything leaving the chip must go through oversized drivers
which drive unknown loads and must therefore be sized for the
worst design case. And that must also include the wire
bonds, chip carrier, and leads as well as whatever an end use
might add to that. Trace widths must be wide enough to
handle, at least in microcontrollers whose pins often these
days must handle tens of milliamps, to sustain those currents
and survive metal migration for some given lifetime, as well.
I have a hard time imagining that unknown external loading
drivers/inputs can ever (on the same die) achieve speeds that
internal signals can, with known-in-advance loads used to
size them.
But a closer reading of your comment might be that you mean
to talk about cases where the processor itself is built on a
fast, clocked process, let's say 400MHz given worst case
pathways and pipelining limits (M4k at 90nm.) But that the
flash and cache (and, I suppose, also the necessity these
days for marketing purposes that a microcontroller include
bullet-proof, class-A crystal drivers rather than specify
some more complex high-speed design) may limit the useful
speed to something less, say 80MHz. (Though I've read
someone to say they've clocked the PIC32 at 120MHz, just had
to set wait states for the flash.) That in this case, say, a
sampling ADC of the SAR style might be clocked still at
400MHz to process a captured sample at whatever resolution
without regard to the CPU clock rate? Is that it?
Jon
Signal Processing Engineer Seeking a DSP Engineer to tackle complex technical challenges. Requires expertise in DSP algorithms, EW, anti-jam, and datalink vulnerability. Qualifications: Bachelor's degree, Secret Clearance, and proficiency in waveform modulation, LPD waveforms, signal detection, MATLAB, algorithm development, RF, data links, and EW systems. The position is on-site in Huntsville, AL and can support candidates at 3+ or 10+ years of experience.