EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Microchip PIC32MZ Flash Microcontroller is the World's Fastest 32-bit MCU

Started by Bill Giovino November 20, 2013
> It is not clear what kind of bandwidth you will get from the flash, > but most flash memories will not run more than 20 MHz > so running out of DRAM is typically faster.
If i read it correctly, PIC32MZ requires 2 wait states at 200MHz. So, program flash is probably running at around 70 to 80MHz. BTW, flash instruction path is 128 bits with 16K cache.
On Wednesday, November 20, 2013 6:55:34 PM UTC+2, Tim Wescott wrote:
> On Wed, 20 Nov 2013 00:24:33 -0800, Paul Rubin wrote: > > > Bill Giovino <billgiovino@gmail.com> writes: > >> http://microcontroller.com/news/Microchip_PIC32MZ.asp The Microchip > >> PIC32MZ runs at 330MIPS at 200MHz > > > > What does this mean about being the fastest MCU? Why is it interesting, > > since there are SOC's running at 1ghz and faster, not to mention vector > > DSP's and that sort of thing? Also, the PIC32MZ doesn't appear to have > > any floating point arithmetic, unlike the M4 which it seems to position > > itself against. It would be more interesting if the PIC had IEEE double > > precision, since the ARM M4F only has single precision. > > IEEE double precision takes a lot more hardware to get really fast > operation. > > I sometimes wonder if there wouldn't be a way to implement double- > precision floating point in hardware that wouldn't take up more space > than 'fast' single-precision FP, but at the cost of a few clock ticks. > For a lot of algorithms, a double-precision calculation that happened 1/4 > as fast as an in-hardware single-precision calculation would still be far > better than either taking the precision hit of 32-bit, or the speed hit > of software synthesized 64 bit.
On the e300 core (and likely on others I am not so intimately familiar with) Freescale have the FPU doing 2 cycle FMUL, FMADD etc. on 64 bit operands and 1 cycle on 32 bit ones. Works fine, on 400 MHz core clock they talk about 800 MIPS which if not 100% practically usable does help, interleaving FPU with integer (and perhaps more importantly, load/store) instructions does work OK (I have managed a 2.2 cycle total within a 64 bit FIR loop, load/store included). Now how did they compromise die size vs. performance I have no idea, I am just a user of theirs. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On Wed, 20 Nov 2013 18:42:12 -0800, dp wrote:

> On Wednesday, November 20, 2013 6:55:34 PM UTC+2, Tim Wescott wrote: >> On Wed, 20 Nov 2013 00:24:33 -0800, Paul Rubin wrote: >> >> > Bill Giovino <billgiovino@gmail.com> writes: >> >> http://microcontroller.com/news/Microchip_PIC32MZ.asp The Microchip >> >> PIC32MZ runs at 330MIPS at 200MHz >> > >> > What does this mean about being the fastest MCU? Why is it >> > interesting, >> > since there are SOC's running at 1ghz and faster, not to mention >> > vector DSP's and that sort of thing? Also, the PIC32MZ doesn't >> > appear to have any floating point arithmetic, unlike the M4 which it >> > seems to position itself against. It would be more interesting if >> > the PIC had IEEE double precision, since the ARM M4F only has single >> > precision. >> >> IEEE double precision takes a lot more hardware to get really fast >> operation. >> >> I sometimes wonder if there wouldn't be a way to implement double- >> precision floating point in hardware that wouldn't take up more space >> than 'fast' single-precision FP, but at the cost of a few clock ticks. >> For a lot of algorithms, a double-precision calculation that happened >> 1/4 as fast as an in-hardware single-precision calculation would still >> be far better than either taking the precision hit of 32-bit, or the >> speed hit of software synthesized 64 bit. > > On the e300 core (and likely on others I am not so intimately familiar > with) Freescale have the FPU doing 2 cycle FMUL, FMADD etc. on 64 bit > operands and 1 cycle on 32 bit ones. Works fine, on 400 MHz core clock > they talk about 800 MIPS which if not 100% practically usable does help, > interleaving FPU with integer (and perhaps more importantly, > load/store) instructions does work OK (I have managed a 2.2 cycle total > within a 64 bit FIR loop, load/store included). > Now how did they compromise die size vs. performance I have no idea, > I am just a user of theirs.
What chips does one find that core in? Thanks. -- Tim Wescott Control system and signal processing consulting www.wescottdesign.com
On Thursday, November 21, 2013 4:53:52 AM UTC+2, Tim Wescott wrote:
> On Wed, 20 Nov 2013 18:42:12 -0800, dp wrote: > > > On Wednesday, November 20, 2013 6:55:34 PM UTC+2, Tim Wescott wrote: > ... > >> IEEE double precision takes a lot more hardware to get really fast > >> operation. > >> > >> I sometimes wonder if there wouldn't be a way to implement double- > >> precision floating point in hardware that wouldn't take up more space > >> than 'fast' single-precision FP, but at the cost of a few clock ticks. > >> For a lot of algorithms, a double-precision calculation that happened > >> 1/4 as fast as an in-hardware single-precision calculation would still > >> be far better than either taking the precision hit of 32-bit, or the > >> speed hit of software synthesized 64 bit. > > > > On the e300 core (and likely on others I am not so intimately familiar > > with) Freescale have the FPU doing 2 cycle FMUL, FMADD etc. on 64 bit > > operands and 1 cycle on 32 bit ones. Works fine, on 400 MHz core clock > > they talk about 800 MIPS which if not 100% practically usable does help, > > interleaving FPU with integer (and perhaps more importantly, > > load/store) instructions does work OK (I have managed a 2.2 cycle total > > within a 64 bit FIR loop, load/store included). > > Now how did they compromise die size vs. performance I have no idea > > I am just a user of theirs. > > What chips does one find that core in?
The one I am using is the MPC5200B (watchout for the old 5200, still available but much buggier etc.). I have also used it on the 8240 (too old to consider now); they also have the MPC5125 and the MPC5121 (I have just been eyeing these, never used one). I should detail the 2.2 cycles - I do these reading/writing 32 bit FP data, just the MAC loop is done 64-bit (using FMAD.d , 64*64+64 ). Loading a 32 bit FP automatically converts it to a 64 bit one, storing as .s does 64 ->32 (rounding included). I would expect the same speeds for r/w with 64 bit data (the bus to the cache is 64 bit wide) but I have not tried that. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On Thursday, November 21, 2013 1:25:06 AM UTC+2, Ulf Samuelsson wrote:
> 2013-11-20 06:34, Bill Giovino skrev: > > > http://microcontroller.com/news/Microchip_PIC32MZ.asp > > > > > > The Microchip PIC32MZ runs at 330MIPS at 200MHz and easily competes against the Cortex-M4.
> In a real application, you are going to have problems if peripherals > have to be handled in interrupes, and not with DMA, > and the PIC32MZ only has 8 channels, which is not a lot.
Actually more. Things like the Ethernet have dedicated DMA which is not part of the general DMA pool.
On 21/11/13 16:00, Rocky wrote:
> On Thursday, November 21, 2013 1:25:06 AM UTC+2, Ulf Samuelsson wrote: >> 2013-11-20 06:34, Bill Giovino skrev: >> >>> http://microcontroller.com/news/Microchip_PIC32MZ.asp >> >>> >> >>> The Microchip PIC32MZ runs at 330MIPS at 200MHz and easily competes against the Cortex-M4. > >> In a real application, you are going to have problems if peripherals >> have to be handled in interrupes, and not with DMA, >> and the PIC32MZ only has 8 channels, which is not a lot. > Actually more. Things like the Ethernet have dedicated DMA which is not part of the general DMA pool. >
From the picture it looks like there are dedicated DMA's for the Ethernet, CAN, USB, SQI and crypto engine - which is nice. One thing that can make a big difference to usability is whether the data cache is synchronised with these DMA channels (i.e., does the cache snoop their transfers?). I've worked with processors where the dedicated Ethernet DMA was not snooped - you have to make sure your Ethernet buffers are mapped to non-cached memory areas (assuming the processor has an MMC or MMU supporting that), or you have to add extra cache flush and invalidate code for any accesses. But given the state of the errata for this chip, it is a toss-up whether such snooping works or not. It's a shame that Microchip have released this device in its current state. The MIPS microAptiv core is a great cpu, and it would be good for the market for ARM to get some real competition. But with these half-tested devices from Microchip being the best-known general microAptiv microcontrollers, there is a real danger that people will assume the /core/ is bad rather than just incompetence of Microchip's test engineers combined with over-enthusiastic PHB's and sales folk. With the current errata - full of modules that simply don't work and have no fixes or workarounds - this chip should never have been released for the general public.
On 2013-11-21, David Brown <david@westcontrol.removethisbit.com> wrote:
> > It's a shame that Microchip have released this device in its current > state. The MIPS microAptiv core is a great cpu, and it would be good > for the market for ARM to get some real competition. But with these > half-tested devices from Microchip being the best-known general > microAptiv microcontrollers, there is a real danger that people will > assume the /core/ is bad rather than just incompetence of Microchip's > test engineers combined with over-enthusiastic PHB's and sales folk. > With the current errata - full of modules that simply don't work and > have no fixes or workarounds - this chip should never have been released > for the general public. >
So the question becomes _why_ have they released it now ? It was pointed out in another message that evaluation boards have been released which depend on these non-working features. That means after a few months have elapsed and these problems have been (hopefully) fixed, some people are still going to start with a negative impression of the PIC32MZ instead of starting with a neutral impression of the PIC32MZ based on it's capabilities at that time. IOW, I don't see how releasing it now instead of taking the hit caused by another delay could have ever been considered to be a good idea. Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
On Wed, 20 Nov 2013 19:11:24 -0800, dp wrote:

> On Thursday, November 21, 2013 4:53:52 AM UTC+2, Tim Wescott wrote: >> On Wed, 20 Nov 2013 18:42:12 -0800, dp wrote: >> >> > On Wednesday, November 20, 2013 6:55:34 PM UTC+2, Tim Wescott wrote: >> ... >> >> IEEE double precision takes a lot more hardware to get really fast >> >> operation. >> >> >> >> I sometimes wonder if there wouldn't be a way to implement double- >> >> precision floating point in hardware that wouldn't take up more >> >> space than 'fast' single-precision FP, but at the cost of a few >> >> clock ticks. >> >> For a lot of algorithms, a double-precision calculation that >> >> happened 1/4 as fast as an in-hardware single-precision calculation >> >> would still be far better than either taking the precision hit of >> >> 32-bit, or the speed hit of software synthesized 64 bit. >> > >> > On the e300 core (and likely on others I am not so intimately >> > familiar with) Freescale have the FPU doing 2 cycle FMUL, FMADD etc. >> > on 64 bit operands and 1 cycle on 32 bit ones. Works fine, on 400 MHz >> > core clock they talk about 800 MIPS which if not 100% practically >> > usable does help, >> > interleaving FPU with integer (and perhaps more importantly, >> > load/store) instructions does work OK (I have managed a 2.2 cycle >> > total within a 64 bit FIR loop, load/store included). >> > Now how did they compromise die size vs. performance I have no idea I >> > am just a user of theirs. >> >> What chips does one find that core in? > > The one I am using is the MPC5200B (watchout for the old 5200, still > available but much buggier etc.). I have also used it on the 8240 (too > old to consider now); they also have the MPC5125 and the MPC5121 (I have > just been eyeing these, never used one). > > I should detail the 2.2 cycles - I do these reading/writing 32 bit FP > data, just the MAC loop is done 64-bit (using FMAD.d , 64*64+64 ). > Loading a 32 bit FP automatically converts it to a 64 bit one, > storing as .s does 64 ->32 (rounding included). I would expect the same > speeds for r/w with 64 bit data (the bus to the cache is 64 bit wide) > but I have not tried that.
Just four days ago I searched through the documentation for that chip, and I came to the conclusion that the FPU only supported single-precision floating point in hardware. Aside from having to go tell a customer that I had my head buried in my assumptions, I guess I should be pleased to be wrong. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
In article <l6lh4k$1it$1@dont-email.me>,
Simon Clubley  <clubley@remove_me.eisner.decus.org-Earth.UFP> wrote:

>So the question becomes _why_ have they released it now ?
Maybe a competitor is about to release something... Wish is had floating point like stm32f4. -- /* jhallen@world.std.com AB1GO */ /* Joseph H. Allen */ int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0) +r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2 ]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}
jhallen@TheWorld.com (Joseph H Allen) writes:

> In article <l6lh4k$1it$1@dont-email.me>, > Simon Clubley <clubley@remove_me.eisner.decus.org-Earth.UFP> wrote: > >>So the question becomes _why_ have they released it now ? > > Maybe a competitor is about to release something...
Like ADI CM4xx? 240MHz M4. <http://www.analog.com/en/processors-dsp/cm4xx/products/index.html>
> Wish is had floating point like stm32f4.
-- John Devereux

The 2024 Embedded Online Conference