Microchip PIC32MZ Flash Microcontroller is the World's Fastest 32-bit MCU| page 2

Reply by ●November 20, 20132013-11-20

> It is not clear what kind of bandwidth you will get from the flash,
> but most flash memories will not run more than 20 MHz
> so running out of DRAM is typically faster.

If i read it correctly, PIC32MZ requires 2 wait states at 200MHz.  So, program flash is probably running at around 70 to 80MHz.  BTW, flash instruction path is 128 bits with 16K cache.

Reply by dp ●November 20, 20132013-11-20

On Wednesday, November 20, 2013 6:55:34 PM UTC+2, Tim Wescott wrote:
> On Wed, 20 Nov 2013 00:24:33 -0800, Paul Rubin wrote:
> 
> > Bill Giovino <billgiovino@gmail.com> writes:
> >> http://microcontroller.com/news/Microchip_PIC32MZ.asp The Microchip
> >> PIC32MZ runs at 330MIPS at 200MHz
> > 
> > What does this mean about being the fastest MCU?  Why is it interesting,
> > since there are SOC's running at 1ghz and faster, not to mention vector
> > DSP's and that sort of thing?  Also, the PIC32MZ doesn't appear to have
> > any floating point arithmetic, unlike the M4 which it seems to position
> > itself against.  It would be more interesting if the PIC had IEEE double
> > precision, since the ARM M4F only has single precision.
> 
> IEEE double precision takes a lot more hardware to get really fast 
> operation.
> 
> I sometimes wonder if there wouldn't be a way to implement double-
> precision floating point in hardware that wouldn't take up more space 
> than 'fast' single-precision FP, but at the cost of a few clock ticks.  
> For a lot of algorithms, a double-precision calculation that happened 1/4 
> as fast as an in-hardware single-precision calculation would still be far 
> better than either taking the precision hit of 32-bit, or the speed hit 
> of software synthesized 64 bit.

On the e300 core (and likely on others I am not so intimately familiar
with) Freescale have the FPU doing 2 cycle FMUL, FMADD etc. on 64 bit
operands and 1 cycle on 32 bit ones. Works fine, on 400 MHz core
clock they talk about 800 MIPS which if not 100% practically usable
does help, interleaving FPU with integer (and perhaps more importantly,
load/store) instructions does work OK (I have managed a 2.2 cycle total
within a 64 bit FIR loop, load/store included).
Now how did they compromise die size vs. performance I have no idea,
I am just a user of theirs.

Dimiter

------------------------------------------------------
Dimiter Popoff, TGI             http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Reply by Tim Wescott ●November 20, 20132013-11-20

On Wed, 20 Nov 2013 18:42:12 -0800, dp wrote:

> On Wednesday, November 20, 2013 6:55:34 PM UTC+2, Tim Wescott wrote:
>> On Wed, 20 Nov 2013 00:24:33 -0800, Paul Rubin wrote:
>> 
>> > Bill Giovino <billgiovino@gmail.com> writes:
>> >> http://microcontroller.com/news/Microchip_PIC32MZ.asp The Microchip
>> >> PIC32MZ runs at 330MIPS at 200MHz
>> > 
>> > What does this mean about being the fastest MCU?  Why is it
>> > interesting,
>> > since there are SOC's running at 1ghz and faster, not to mention
>> > vector DSP's and that sort of thing?  Also, the PIC32MZ doesn't
>> > appear to have any floating point arithmetic, unlike the M4 which it
>> > seems to position itself against.  It would be more interesting if
>> > the PIC had IEEE double precision, since the ARM M4F only has single
>> > precision.
>> 
>> IEEE double precision takes a lot more hardware to get really fast
>> operation.
>> 
>> I sometimes wonder if there wouldn't be a way to implement double-
>> precision floating point in hardware that wouldn't take up more space
>> than 'fast' single-precision FP, but at the cost of a few clock ticks.
>> For a lot of algorithms, a double-precision calculation that happened
>> 1/4 as fast as an in-hardware single-precision calculation would still
>> be far better than either taking the precision hit of 32-bit, or the
>> speed hit of software synthesized 64 bit.
> 
> On the e300 core (and likely on others I am not so intimately familiar
> with) Freescale have the FPU doing 2 cycle FMUL, FMADD etc. on 64 bit
> operands and 1 cycle on 32 bit ones. Works fine, on 400 MHz core clock
> they talk about 800 MIPS which if not 100% practically usable does help,
> interleaving FPU with integer (and perhaps more importantly,
> load/store) instructions does work OK (I have managed a 2.2 cycle total
> within a 64 bit FIR loop, load/store included).
> Now how did they compromise die size vs. performance I have no idea,
> I am just a user of theirs.

What chips does one find that core in?

Thanks.

-- 
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

Reply by dp ●November 20, 20132013-11-20

On Thursday, November 21, 2013 4:53:52 AM UTC+2, Tim Wescott wrote:
> On Wed, 20 Nov 2013 18:42:12 -0800, dp wrote:
> 
> > On Wednesday, November 20, 2013 6:55:34 PM UTC+2, Tim Wescott wrote:
> ...
> >> IEEE double precision takes a lot more hardware to get really fast
> >> operation.
> >> 
> >> I sometimes wonder if there wouldn't be a way to implement double-
> >> precision floating point in hardware that wouldn't take up more space
> >> than 'fast' single-precision FP, but at the cost of a few clock ticks.
> >> For a lot of algorithms, a double-precision calculation that happened
> >> 1/4 as fast as an in-hardware single-precision calculation would still
> >> be far better than either taking the precision hit of 32-bit, or the
> >> speed hit of software synthesized 64 bit.
> > 
> > On the e300 core (and likely on others I am not so intimately familiar
> > with) Freescale have the FPU doing 2 cycle FMUL, FMADD etc. on 64 bit
> > operands and 1 cycle on 32 bit ones. Works fine, on 400 MHz core clock
> > they talk about 800 MIPS which if not 100% practically usable does help,
> > interleaving FPU with integer (and perhaps more importantly,
> > load/store) instructions does work OK (I have managed a 2.2 cycle total
> > within a 64 bit FIR loop, load/store included).
> > Now how did they compromise die size vs. performance I have no idea
> > I am just a user of theirs.
> 
> What chips does one find that core in?

The one I am using is the MPC5200B (watchout for the old 5200, still
available but much buggier etc.). I have also used it on the 8240 (too
old to consider now); they also have the MPC5125 and the MPC5121 (I have
just been eyeing these, never used one).

I should detail the 2.2 cycles - I do these reading/writing 32 bit FP
data, just the MAC loop is done 64-bit (using FMAD.d , 64*64+64 ).
Loading a 32 bit FP automatically converts it to a 64 bit one,
storing as .s does 64 ->32 (rounding included). I would expect the same
speeds for r/w with 64 bit data (the bus to the cache is 64 bit wide)
but I have not tried that.

Dimiter

------------------------------------------------------
Dimiter Popoff, TGI             http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Reply by Rocky ●November 21, 20132013-11-21

On Thursday, November 21, 2013 1:25:06 AM UTC+2, Ulf Samuelsson wrote:
> 2013-11-20 06:34, Bill Giovino skrev:
> 
> > http://microcontroller.com/news/Microchip_PIC32MZ.asp
> 
> >
> 
> > The Microchip PIC32MZ runs at 330MIPS at 200MHz and easily competes against the Cortex-M4.

> In a real application, you are going to have problems if peripherals
> have to be handled in interrupes, and not with DMA,
> and the PIC32MZ only has 8 channels, which is not a lot.
Actually more. Things like the Ethernet have dedicated DMA which is not part of the general DMA pool.

Reply by David Brown ●November 21, 20132013-11-21

On 21/11/13 16:00, Rocky wrote:
> On Thursday, November 21, 2013 1:25:06 AM UTC+2, Ulf Samuelsson wrote:
>> 2013-11-20 06:34, Bill Giovino skrev:
>>
>>> http://microcontroller.com/news/Microchip_PIC32MZ.asp
>>
>>>
>>
>>> The Microchip PIC32MZ runs at 330MIPS at 200MHz and easily competes against the Cortex-M4.
> 
>> In a real application, you are going to have problems if peripherals
>> have to be handled in interrupes, and not with DMA,
>> and the PIC32MZ only has 8 channels, which is not a lot.
> Actually more. Things like the Ethernet have dedicated DMA which is not part of the general DMA pool.
> 

From the picture it looks like there are dedicated DMA's for the
Ethernet, CAN, USB, SQI and crypto engine - which is nice.

One thing that can make a big difference to usability is whether the
data cache is synchronised with these DMA channels (i.e., does the cache
snoop their transfers?).  I've worked with processors where the
dedicated Ethernet DMA was not snooped - you have to make sure your
Ethernet buffers are mapped to non-cached memory areas (assuming the
processor has an MMC or MMU supporting that), or you have to add extra
cache flush and invalidate code for any accesses.

But given the state of the errata for this chip, it is a toss-up whether
such snooping works or not.

It's a shame that Microchip have released this device in its current
state.  The MIPS microAptiv core is a great cpu, and it would be good
for the market for ARM to get some real competition.  But with these
half-tested devices from Microchip being the best-known general
microAptiv microcontrollers, there is a real danger that people will
assume the /core/ is bad rather than just incompetence of Microchip's
test engineers combined with over-enthusiastic PHB's and sales folk.
With the current errata - full of modules that simply don't work and
have no fixes or workarounds - this chip should never have been released
for the general public.

Reply by Simon Clubley ●November 21, 20132013-11-21

On 2013-11-21, David Brown <david@westcontrol.removethisbit.com> wrote:
>
> It's a shame that Microchip have released this device in its current
> state.  The MIPS microAptiv core is a great cpu, and it would be good
> for the market for ARM to get some real competition.  But with these
> half-tested devices from Microchip being the best-known general
> microAptiv microcontrollers, there is a real danger that people will
> assume the /core/ is bad rather than just incompetence of Microchip's
> test engineers combined with over-enthusiastic PHB's and sales folk.
> With the current errata - full of modules that simply don't work and
> have no fixes or workarounds - this chip should never have been released
> for the general public.
>

So the question becomes _why_ have they released it now ?

It was pointed out in another message that evaluation boards have been
released which depend on these non-working features.

That means after a few months have elapsed and these problems have been
(hopefully) fixed, some people are still going to start with a negative
impression of the PIC32MZ instead of starting with a neutral impression
of the PIC32MZ based on it's capabilities at that time.

IOW, I don't see how releasing it now instead of taking the hit caused
by another delay could have ever been considered to be a good idea.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by Tim Wescott ●November 21, 20132013-11-21

On Wed, 20 Nov 2013 19:11:24 -0800, dp wrote:

> On Thursday, November 21, 2013 4:53:52 AM UTC+2, Tim Wescott wrote:
>> On Wed, 20 Nov 2013 18:42:12 -0800, dp wrote:
>> 
>> > On Wednesday, November 20, 2013 6:55:34 PM UTC+2, Tim Wescott wrote:
>> ...
>> >> IEEE double precision takes a lot more hardware to get really fast
>> >> operation.
>> >> 
>> >> I sometimes wonder if there wouldn't be a way to implement double-
>> >> precision floating point in hardware that wouldn't take up more
>> >> space than 'fast' single-precision FP, but at the cost of a few
>> >> clock ticks.
>> >> For a lot of algorithms, a double-precision calculation that
>> >> happened 1/4 as fast as an in-hardware single-precision calculation
>> >> would still be far better than either taking the precision hit of
>> >> 32-bit, or the speed hit of software synthesized 64 bit.
>> > 
>> > On the e300 core (and likely on others I am not so intimately
>> > familiar with) Freescale have the FPU doing 2 cycle FMUL, FMADD etc.
>> > on 64 bit operands and 1 cycle on 32 bit ones. Works fine, on 400 MHz
>> > core clock they talk about 800 MIPS which if not 100% practically
>> > usable does help,
>> > interleaving FPU with integer (and perhaps more importantly,
>> > load/store) instructions does work OK (I have managed a 2.2 cycle
>> > total within a 64 bit FIR loop, load/store included).
>> > Now how did they compromise die size vs. performance I have no idea I
>> > am just a user of theirs.
>> 
>> What chips does one find that core in?
> 
> The one I am using is the MPC5200B (watchout for the old 5200, still
> available but much buggier etc.). I have also used it on the 8240 (too
> old to consider now); they also have the MPC5125 and the MPC5121 (I have
> just been eyeing these, never used one).
> 
> I should detail the 2.2 cycles - I do these reading/writing 32 bit FP
> data, just the MAC loop is done 64-bit (using FMAD.d , 64*64+64 ).
> Loading a 32 bit FP automatically converts it to a 64 bit one,
> storing as .s does 64 ->32 (rounding included). I would expect the same
> speeds for r/w with 64 bit data (the bus to the cache is 64 bit wide)
> but I have not tried that.

Just four days ago I searched through the documentation for that chip, 
and I came to the conclusion that the FPU only supported single-precision 
floating point in hardware.  Aside from having to go tell a customer that 
I had my head buried in my assumptions, I guess I should be pleased to be 
wrong.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Joseph H Allen ●November 21, 20132013-11-21

In article <l6lh4k$1it$1@dont-email.me>,
Simon Clubley  <clubley@remove_me.eisner.decus.org-Earth.UFP> wrote:

>So the question becomes _why_ have they released it now ?

Maybe a competitor is about to release something...

Wish is had floating point like stm32f4.

-- 
/*  jhallen@world.std.com AB1GO */                        /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Reply by John Devereux ●November 21, 20132013-11-21

jhallen@TheWorld.com (Joseph H Allen) writes:

> In article <l6lh4k$1it$1@dont-email.me>,
> Simon Clubley  <clubley@remove_me.eisner.decus.org-Earth.UFP> wrote:
>
>>So the question becomes _why_ have they released it now ?
>
> Maybe a competitor is about to release something...

Like ADI CM4xx? 240MHz M4.

<http://www.analog.com/en/processors-dsp/cm4xx/products/index.html>


> Wish is had floating point like stm32f4.

-- 

John Devereux