EmbeddedRelated.com
Forums

Compare ARM MCU Vendors

Started by Dave Graffio September 1, 2010
Hi Dave,

Dave Graffio wrote:
> How do you compare ARM MCU manufacturers for a project in the USA?
Like you would any other vendor! See who has what you want/need. How much they want for it. What their reputation is. etc. Then, see who *else* has "something that you can *tweek*" to do the same job -- possibly better/worse -- and repeat the process. Finally, make a "value judgement" on all of the candidates that fall through the above process.
> I see Atmel, St micro, nxp, Texas instr, Freescale, Marvell - are they all selling the > same stuff or is there any real difference? I see St has faster parts but Atmel has more > of them. Is price and support all the same?
I don't think you will find "the same part" from any two vendors. The ARM world is like the "stereo" (HiFi) business of ages past (modern parallel would be multimedia): you bought a turntable from vendor A, the *stylus* for that turntable from vendor B, the (phono) preamp from vendor C, amplifier from vendor D, speakers from vendor E, etc. Until you got the "system" that fit your price/performance/ego. With ARM, each vendor *packages* various "components" (referencing the above analogy) into an MCU. So, the processing power of the "core", amount of memory (and flavors thereof) included/supported, other peripherals onboard, etc. varies. In theory, you can find The Ideal MCU for your application -- but, chances are, it is only sold by *one* vendor (though the various components inside it may appear in a smattering of offerings from other vendors... though not in the exact same configuration).
> Google doesn't seem to show any information anywhere on this, which is really shocking.
I dunno... google doesn't tell me which *car* is "right" for me, either! Amazing!
> I am wondering if I should move between them or standardize on one company.
That's a value judgement. Do you want to establish a relationship with *one* company? (there are pros and cons, of course) Do you want to tailor your solutions to your problems (or pick the closest fit from the offerings of that *one* company)? That's why they call it "Engineering" instead of "shopping for shoes"... --don
On Sep 16, 11:08 am, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi Dave, > > Dave Graffio wrote: > > Google doesn't seem to show any information anywhere on this, which is really shocking. > > I dunno... google doesn't tell me which *car* is "right" for me, > either! Amazing!
Google is just a way to search for info that others provide. There are some comparisons of ARM devices, at one time I put up a comparison of ARM7 devices myself. But it is very hard to keep updated. Now ARM7 is on the down slope and the Cortex architectures are the hot, new thing. Good luck trying to keep up with all the new product introductions there! I have a lot of things on my plate before I could do this, but I may take a stab at another comparison chart for ARM CM devices over the winter. The last one was done when I needed the info for myself and I may need to evaluate ARM cores again soon.
> > I am wondering if I should move between them or standardize on one company. > > That's a value judgement. Do you want to establish a relationship with > *one* company? (there are pros and cons, of course) Do you want to > tailor your solutions to your problems (or pick the closest fit from > the offerings of that *one* company)?
The reasons for standardizing on one company in the (distant) past had to do with the differences in CPU architectures. Once you learned the PIC12 devices you didn't want to restart with the MSP430 parts, both learning about the CPU as well as the tools. Now that ARM has provided a more complete MCU core in the CM3/1/0 the CPUs are nearly all the same eliminating this issue. But the peripherals are very different between brands. The tools have a lot more support for the peripherals. So there is still reason for developing a brand loyalty. At some point I expect the tool vendors may have reason to help mitigate this issue and it will be much easier to port between brands. But that may be a long time off, if ever. If you think you will have many designs that need a variety of MCU parts with different capabilities, I would suggest that you consider the major players with broad product lines. This can save cost on tools as well as relearning how to use the peripherals. It should cost you little in terms of recurring part costs or a match to your application. In other words, don't switch vendors unless you have a reason. Rick
David Brown skrev:
> On 13/09/2010 17:05, Ulf Samuelsson wrote: >> linnix skrev: >>>> As for speed, most current flash based ARMs/CM3s are limited by >>>> flash waitstates and there is very little performance increase >>>> once you reach those 60-70 MHz. >>>> >>>> While a CM3 at zero waitstates is 1.25 Dhrystone MIPS/MHz, >>>> the performance at 100 MHz with 4 waitstates is more like >>>> 0,85 Dhrystone MIPS/MHz. >>>> >>>> Using a Keil compiler, I measured the difference between >>>> running at 84 MHz and 96 Mhz to be less than 1%. >>>> This was more than one year ago, but I doubt that this will change >>>> until people start to put faster flash memories into the products. >>>> >>> >>> Or putting instruction cache there. >> >> That might work but I am not aware of this solution beeing implemented >> anywhere. >> >> It was not neccessarily a good idea with the ARM7. >> If you added an insruction cache you >> added 1 waitstate to all accesses. >> >> Good for top performance on some apps, but certainly >> reduced the worst case performance, which sometimes >> is more important. >> > > A better solution for micros like that is a wider flash design with an > sram buffer in the flash module - that is certainly how some > manufacturers handle the problem. It is a simpler solution than a full > instruction cache because you have only a single "tag" (or perhaps two, > if you have two such buffers), and there are no issues with coherence or > anything else. The buffer of perhaps 256 bytes gets filled whenever you > access a new "page" in the flash, so that the processor then reads from > the buffer rather than directly from the flash. And if space/economics > allow, you have have a wider flash-to-buffer bus to keep up a high > bandwidth even with slow flash and a fast processor. >
The disadvantage of having a 256 byte wide memory, is power consumption. You will have 2048 active sense amplifiers. I dont see that coming soon. -- Best Regards Ulf Samuelsson These are my own personal opinions, which may or may not be shared by my employer Atmel Nordic AB
On 20/09/2010 10:30, Ulf Samuelsson wrote:
> David Brown skrev: >> On 13/09/2010 17:05, Ulf Samuelsson wrote: >>> linnix skrev: >>>>> As for speed, most current flash based ARMs/CM3s are limited by >>>>> flash waitstates and there is very little performance increase >>>>> once you reach those 60-70 MHz. >>>>> >>>>> While a CM3 at zero waitstates is 1.25 Dhrystone MIPS/MHz, >>>>> the performance at 100 MHz with 4 waitstates is more like >>>>> 0,85 Dhrystone MIPS/MHz. >>>>> >>>>> Using a Keil compiler, I measured the difference between >>>>> running at 84 MHz and 96 Mhz to be less than 1%. >>>>> This was more than one year ago, but I doubt that this will change >>>>> until people start to put faster flash memories into the products. >>>>> >>>> >>>> Or putting instruction cache there. >>> >>> That might work but I am not aware of this solution beeing implemented >>> anywhere. >>> >>> It was not neccessarily a good idea with the ARM7. >>> If you added an insruction cache you >>> added 1 waitstate to all accesses. >>> >>> Good for top performance on some apps, but certainly >>> reduced the worst case performance, which sometimes >>> is more important. >>> >> >> A better solution for micros like that is a wider flash design with an >> sram buffer in the flash module - that is certainly how some >> manufacturers handle the problem. It is a simpler solution than a full >> instruction cache because you have only a single "tag" (or perhaps >> two, if you have two such buffers), and there are no issues with >> coherence or anything else. The buffer of perhaps 256 bytes gets >> filled whenever you access a new "page" in the flash, so that the >> processor then reads from the buffer rather than directly from the >> flash. And if space/economics allow, you have have a wider >> flash-to-buffer bus to keep up a high bandwidth even with slow flash >> and a fast processor. >> > > The disadvantage of having a 256 byte wide memory, is power consumption. > You will have 2048 active sense amplifiers. > I dont see that coming soon. >
You don't need 256 byte wide memory - you need a 256 byte sram buffer on the flash. If we assume that the processor ideally wants to read 32-bit wide data from the flash at 100 MHz, and the flash itself is capable of providing data once per cycle at 50 MHz (perhaps with a couple of cycles delay for initial access to a page), then the flash-to-buffer width should be 64 bits. Then there is a brief stall when accessing a new page, but otherwise the processor gets its instructions at full speed. Yes, those 64 bits means 64 sense amplifiers, compared to 16 amplifiers that might be used on a slower flash setup. But apart from a small leakage current, the amplifiers only take power when they are used, so the number of amplifiers doesn't affect the power much - the total power is proportional to the bits read from the flash. With a buffer arrangement, you'll get some unnecessary reads to fill the buffer, but you'll avoid duplicate reads on many loops - my guess is you'd reduce the total number of reads.
On Sep 20, 5:10=A0am, David Brown <da...@westcontrol.removethisbit.com>
wrote:
> On 20/09/2010 10:30, Ulf Samuelsson wrote: > > > > > David Brown skrev: > >> On 13/09/2010 17:05, Ulf Samuelsson wrote: > >>> linnix skrev: > >>>>> As for speed, most current flash based ARMs/CM3s are limited by > >>>>> flash waitstates and there is very little performance increase > >>>>> once you reach those 60-70 MHz. > > >>>>> While a CM3 at zero waitstates is 1.25 Dhrystone MIPS/MHz, > >>>>> the performance at 100 MHz with 4 waitstates is more like > >>>>> 0,85 Dhrystone MIPS/MHz. > > >>>>> Using a Keil compiler, I measured the difference between > >>>>> running at 84 MHz and 96 Mhz to be less than 1%. > >>>>> This was more than one year ago, but I doubt that this will change > >>>>> until people start to put faster flash memories into the products. > > >>>> Or putting instruction cache there. > > >>> That might work but I am not aware of this solution beeing implemente=
d
> >>> anywhere. > > >>> It was not neccessarily a good idea with the ARM7. > >>> If you added an insruction cache you > >>> added 1 waitstate to all accesses. > > >>> Good for top performance on some apps, but certainly > >>> reduced the worst case performance, which sometimes > >>> is more important. > > >> A better solution for micros like that is a wider flash design with an > >> sram buffer in the flash module - that is certainly how some > >> manufacturers handle the problem. It is a simpler solution than a full > >> instruction cache because you have only a single "tag" (or perhaps > >> two, if you have two such buffers), and there are no issues with > >> coherence or anything else. The buffer of perhaps 256 bytes gets > >> filled whenever you access a new "page" in the flash, so that the > >> processor then reads from the buffer rather than directly from the > >> flash. And if space/economics allow, you have have a wider > >> flash-to-buffer bus to keep up a high bandwidth even with slow flash > >> and a fast processor. > > > The disadvantage of having a 256 byte wide memory, is power consumption=
.
> > You will have 2048 active sense amplifiers. > > I dont see that coming soon. > > You don't need 256 byte wide memory - you need a 256 byte sram buffer on > the flash. =A0If we assume that the processor ideally wants to read 32-bi=
t
> wide data from the flash at 100 MHz, and the flash itself is capable of > providing data once per cycle at 50 MHz (perhaps with a couple of cycles > delay for initial access to a page), then the flash-to-buffer width > should be 64 bits. =A0Then there is a brief stall when accessing a new > page, but otherwise the processor gets its instructions at full speed. > > Yes, those 64 bits means 64 sense amplifiers, compared to 16 amplifiers > that might be used on a slower flash setup. =A0But apart from a small > leakage current, the amplifiers only take power when they are used, so > the number of amplifiers doesn't affect the power much - the total power > is proportional to the bits read from the flash. =A0With a buffer > arrangement, you'll get some unnecessary reads to fill the buffer, but > you'll avoid duplicate reads on many loops - my guess is you'd reduce > the total number of reads.
LPC1800... can operate at 150MHz straight from its 1Mbyte flash memory, or from RAM... The flexible dual-bank 256bit wide flash memories... Dual-bank seems to be not for performance - doesn't get the benefit of 512bit width as they aren't interleaved. See: http://www.electronicsweekly.com/Articles/2010/09/20/49475/NXP-reveals-150M= Hz-ARM-Cortex-M3.htm Interesting trade-off ! Best Regards, Dave
2010-09-20 21:04, Dave Nadler skrev:
> On Sep 20, 5:10 am, David Brown<da...@westcontrol.removethisbit.com> > wrote: >> On 20/09/2010 10:30, Ulf Samuelsson wrote: >> >> >> >>> David Brown skrev: >>>> On 13/09/2010 17:05, Ulf Samuelsson wrote: >>>>> linnix skrev: >>>>>>> As for speed, most current flash based ARMs/CM3s are limited by >>>>>>> flash waitstates and there is very little performance increase >>>>>>> once you reach those 60-70 MHz. >> >>>>>>> While a CM3 at zero waitstates is 1.25 Dhrystone MIPS/MHz, >>>>>>> the performance at 100 MHz with 4 waitstates is more like >>>>>>> 0,85 Dhrystone MIPS/MHz. >> >>>>>>> Using a Keil compiler, I measured the difference between >>>>>>> running at 84 MHz and 96 Mhz to be less than 1%. >>>>>>> This was more than one year ago, but I doubt that this will change >>>>>>> until people start to put faster flash memories into the products. >> >>>>>> Or putting instruction cache there. >> >>>>> That might work but I am not aware of this solution beeing implemented >>>>> anywhere. >> >>>>> It was not neccessarily a good idea with the ARM7. >>>>> If you added an insruction cache you >>>>> added 1 waitstate to all accesses. >> >>>>> Good for top performance on some apps, but certainly >>>>> reduced the worst case performance, which sometimes >>>>> is more important. >> >>>> A better solution for micros like that is a wider flash design with an >>>> sram buffer in the flash module - that is certainly how some >>>> manufacturers handle the problem. It is a simpler solution than a full >>>> instruction cache because you have only a single "tag" (or perhaps >>>> two, if you have two such buffers), and there are no issues with >>>> coherence or anything else. The buffer of perhaps 256 bytes gets >>>> filled whenever you access a new "page" in the flash, so that the >>>> processor then reads from the buffer rather than directly from the >>>> flash. And if space/economics allow, you have have a wider >>>> flash-to-buffer bus to keep up a high bandwidth even with slow flash >>>> and a fast processor. >> >>> The disadvantage of having a 256 byte wide memory, is power consumption. >>> You will have 2048 active sense amplifiers. >>> I dont see that coming soon. >> >> You don't need 256 byte wide memory - you need a 256 byte sram buffer on >> the flash. If we assume that the processor ideally wants to read 32-bit >> wide data from the flash at 100 MHz, and the flash itself is capable of >> providing data once per cycle at 50 MHz (perhaps with a couple of cycles >> delay for initial access to a page), then the flash-to-buffer width >> should be 64 bits. Then there is a brief stall when accessing a new >> page, but otherwise the processor gets its instructions at full speed. >> >> Yes, those 64 bits means 64 sense amplifiers, compared to 16 amplifiers >> that might be used on a slower flash setup. But apart from a small >> leakage current, the amplifiers only take power when they are used, so >> the number of amplifiers doesn't affect the power much - the total power >> is proportional to the bits read from the flash. With a buffer >> arrangement, you'll get some unnecessary reads to fill the buffer, but >> you'll avoid duplicate reads on many loops - my guess is you'd reduce >> the total number of reads. > > LPC1800... can operate at 150MHz straight from its 1Mbyte flash > memory, or from RAM... The flexible dual-bank 256bit wide flash > memories... > > Dual-bank seems to be not for performance - doesn't get the benefit of > 512bit width as they aren't interleaved. > > See: > http://www.electronicsweekly.com/Articles/2010/09/20/49475/NXP-reveals-150MHz-ARM-Cortex-M3.htm > > Interesting trade-off ! > Best Regards, Dave
In practice you see that the 128 flash LPC2xxx draws a lot more current than the 32 bit SAM7. In thumb mode, the SAM7 is faster than the LPC (at the same clock frequency) due to the faster flash. The wide flash memories will give you some extra boost at the top performance level. The programmable nature of the SAM3, allowed me to test the difference between 64 & 128 bit and it is ~5%. Normally it is better to increase the clock than it is to increase the width of the flash. Same performance, but less power. -- Best Regards Ulf Samuelsson These are my own personal opinions, which may (or may not) be shared by my employer Atmel Nordic AB
On Sep 3, 12:08=A0am, An Schwob in the USA <schwo...@aol.com> wrote:
> On Sep 2, 7:37=A0pm, "Dave Graffio" <wscra...@yahoo.com> wrote: > > > "antedeluvian" wrote... > > >A really great part is the Cypress PSOC5 which gives a great deal of > > > flexibilty because of its configurabilty. > > > > Unfortunately it appears to be made of pure unobtanium. > > > Not true. I've heard it's being designed by the engineering firm of Tut=
tle and Dunsel.
> > (Capt, Retired) > > Dave, > > you heard strange things such as Luminary (TI) being low quality. They > manufacture on one of the highest quality production lines in the > world TSMC. Marvell does not design and manufacture MCUs, they do high > end application processors, no flash but lots of MHz. Atmel started > strong with ARM7 and ARM9 but is weak in Cortex-M3, their focus > shifted very much towards AVR32. NXP offers the fastest Cortex-M3 with > Flash, btw. did you know that Toshiba has the fastest M3 running from > internal SDRAM? Did you know that Energy Micro achieve better power > numbers using the Cortex-M3 then any other vendor even those using > Cortex-M0?
Not sure what you mean by "Atmel is weak in Cortex-M3". The CM3 is new enough that not everyone has their products out yet. I think Atmel dilly dallied too long with the CM3, but I expect this was due to company goal issues and not because of "weakness" of any kind. They have a competing 32 bit MCU product and I expect they could only throw so many resources at bringing out a totally new MCU line. Give them a few more months and I think they will not disappoint. "Fastest" is always a short lived title. Clock speed is seldom a determining criterion in selecting an MCU and I expect it is often given too much weight by engineers when initially winnowing their MCU choices. It is a simple number that is easy to verify. CPU speed is a much more complex measurement that is very hard to verify for your application, but this is the one that may actually make a difference in your design.
> PSoC5 is a great product and if your volume production does not start > before 2011, you might want to order a FirstTouch for PSoC 5, just $49 > free tools, several sensors for acceleration, temperature, capacitive > touch and readily available. Got one on my desk, like it.http://www.cypre=
ss.com/psoc5is a good place to start. How can you plan to use a part, even if you can wait six months for production, if you don't know the price? Has anyone heard a number for production pricing on the PSOC5?
> I could write a lot more about ARM / Cortex-MCUs because that's what I > have been dealing with since the first ARM7 MCUs hit the market. If > you need professional help, with the selection write an email to > microcontroller (skip this at gmail) -dod comm > It would go a long way if you would list your requirements, you get > better answers. > > For a list with many articles about Cortex based MCUs check out this > one:http://mcu-related.com/architectures/35-cortex-m3
Some three or four years ago I put together a list of ARM7 devices available. By the time Luminary came on the scene it got to be too much work to update. Now with all the CMx devices out there it would be a major effort to keep this updated. Does anyone have a comprehensive comparison of features and capabilities of the CMx MCUs available? Rick
On Sep 20, 4:30 am, Ulf Samuelsson <u...@a-t-m-e-l.com> wrote:
> David Brown skrev: > > A better solution for micros like that is a wider flash design with an > > sram buffer in the flash module - that is certainly how some > > manufacturers handle the problem. It is a simpler solution than a full > > instruction cache because you have only a single "tag" (or perhaps two, > > if you have two such buffers), and there are no issues with coherence or > > anything else. The buffer of perhaps 256 bytes gets filled whenever you > > access a new "page" in the flash, so that the processor then reads from > > the buffer rather than directly from the flash. And if space/economics > > allow, you have have a wider flash-to-buffer bus to keep up a high > > bandwidth even with slow flash and a fast processor. > > The disadvantage of having a 256 byte wide memory, is power consumption. > You will have 2048 active sense amplifiers. > I dont see that coming soon.
I hope you aren't involved in architecting new MCU designs. I don't think anyone said they wanted 2048 sense amplifiers. I would either interpret the above to be "256 bits" or I would consider an implementation that used a 256 byte cache of some sort. What would be the utility of a 256 byte wide interface to the Flash? Even the fastest CM3 CPUs can't run at nearly that speed. Rick
On 21/09/2010 13:16, rickman wrote:
> On Sep 20, 4:30 am, Ulf Samuelsson<u...@a-t-m-e-l.com> wrote: >> David Brown skrev: >>> A better solution for micros like that is a wider flash design with an >>> sram buffer in the flash module - that is certainly how some >>> manufacturers handle the problem. It is a simpler solution than a full >>> instruction cache because you have only a single "tag" (or perhaps two, >>> if you have two such buffers), and there are no issues with coherence or >>> anything else. The buffer of perhaps 256 bytes gets filled whenever you >>> access a new "page" in the flash, so that the processor then reads from >>> the buffer rather than directly from the flash. And if space/economics >>> allow, you have have a wider flash-to-buffer bus to keep up a high >>> bandwidth even with slow flash and a fast processor. >> >> The disadvantage of having a 256 byte wide memory, is power consumption. >> You will have 2048 active sense amplifiers. >> I dont see that coming soon. > > I hope you aren't involved in architecting new MCU designs. I don't > think anyone said they wanted 2048 sense amplifiers. I would either > interpret the above to be "256 bits" or I would consider an > implementation that used a 256 byte cache of some sort. What would be > the utility of a 256 byte wide interface to the Flash? Even the > fastest CM3 CPUs can't run at nearly that speed. >
I was referring to a 256 byte cache, but perhaps I wasn't clear in my description. Such a page cache will be filled from the flash at a speed that suits the flash, with a width that matches the flash (perhaps something like 64-bit or even 128-bit for performance-optimised parts, and maybe as small as 16-bit for price or power optimised parts). On the other side of the cache, the processor will read out with a speed and width that matches its instruction bus - typically 32-bit. It is effectively a specialised type of instruction cache - less flexible, but much simpler to implement. I've read about such a cache, but I can't remember which chip used it - it may not even have been an ARM device (perhaps it was a ColdFire v2 microcontroller). And many parts have some sort of "flash accelerator" in their feature list, which are probably a similar idea.
On Sep 21, 7:41=A0am, David Brown <da...@westcontrol.removethisbit.com>
wrote:
> On 21/09/2010 13:16, rickman wrote: > > > > > On Sep 20, 4:30 am, Ulf Samuelsson<u...@a-t-m-e-l.com> =A0wrote: > >> David Brown skrev: > >>> A better solution for micros like that is a wider flash design with a=
n
> >>> sram buffer in the flash module - that is certainly how some > >>> manufacturers handle the problem. =A0It is a simpler solution than a =
full
> >>> instruction cache because you have only a single "tag" (or perhaps tw=
o,
> >>> if you have two such buffers), and there are no issues with coherence=
or
> >>> anything else. =A0The buffer of perhaps 256 bytes gets filled wheneve=
r you
> >>> access a new "page" in the flash, so that the processor then reads fr=
om
> >>> the buffer rather than directly from the flash. =A0And if space/econo=
mics
> >>> allow, you have have a wider flash-to-buffer bus to keep up a high > >>> bandwidth even with slow flash and a fast processor. > > >> The disadvantage of having a 256 byte wide memory, is power consumptio=
n.
> >> You will have 2048 active sense amplifiers. > >> I dont see that coming soon. > > > I hope you aren't involved in architecting new MCU designs. =A0I don't > > think anyone said they wanted 2048 sense amplifiers. =A0I would either > > interpret the above to be "256 bits" or I would consider an > > implementation that used a 256 byte cache of some sort. =A0What would b=
e
> > the utility of a 256 byte wide interface to the Flash? =A0Even the > > fastest CM3 CPUs can't run at nearly that speed. > > I was referring to a 256 byte cache, but perhaps I wasn't clear in my > description. =A0Such a page cache will be filled from the flash at a spee=
d
> that suits the flash, with a width that matches the flash (perhaps > something like 64-bit or even 128-bit for performance-optimised parts, > and maybe as small as 16-bit for price or power optimised parts). =A0On > the other side of the cache, the processor will read out with a speed > and width that matches its instruction bus - typically 32-bit. > > It is effectively a specialised type of instruction cache - less > flexible, but much simpler to implement. > > I've read about such a cache, but I can't remember which chip used it - > it may not even have been an ARM device (perhaps it was a ColdFire v2 > microcontroller). =A0And many parts have some sort of "flash accelerator" > in their feature list, which are probably a similar idea.
Yes, simpler to implement, but definitely less effective. For example, lets assume the flash reads out 32 bytes (256 bits) at a rate of 50 MHz. That's 1600 MB/s. It would take 160 nS (8 reads) to fill the buffer on a jump. If the destination instruction was in the last line read, that would be a long stall of the processor. Of course, you could make the fill a bit smarter, reading the needed line first, but if the second instruction word was in the next line the processor would still have to wait for both reads to complete, rather slow in that case. So yes, there are tradeoffs and the fact that this sort of cache is seldom seen makes me think the bottom line is either work with no cache (meaning a very minimal cache like a single line cache) or design an associative cache that doesn't need to refill the whole cache. There are volumes of material written on cache memory designs and yet we keep seeing the same basic ones used in practice... for the most part. Rick