EmbeddedRelated.com
Forums

New ARM Cortex Microcontroller Product Family from STMicroelectronics

Started by Bill Giovino June 18, 2007
"rickman" <gnuarm@gmail.com> skrev i meddelandet 
news:1183843932.123792.195400@n60g2000hse.googlegroups.com...
> On Jul 6, 10:51 am, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote: >> The FIFO is implemented using Flip-Flops and you had a >> simple three stage pipeline (fetch, decode,execute) so >> your latency was not dramatic. > > That is not the point. By prefetching the instructions, you are > setting up for a bigger dump and subsequent loss of instruction memory > bandwidth when you branch. FIFOs or instruction prefetching are not a > perfect solution. It is much better to just have single cycle > memory.
Actually it is not, because if you try to decode your instruction in the same stage as the decoding, your clock frequency will go down significantly. The prefetching will work with single cycle memory and with memory having waitstates. Prefetching, decoding and execution, all will take one clock. If you execute at 66 MHz with a three stage pipeline then you probably will execute around ~40 MHz with a two stage pipeline (Just a guess). If you execute blocks of 5 instruction including one jump, each block will use 7 cycles (3 + 1 + 1 + 1 + 1) @ 66 Mhz in a three stage pipeline for ~ 10 blocks / us. In a two stage pipeline, you could use 2 clocks for a jump so you execute (2 + 1 + 1 + 1 + 1) @ 40 MHz which is 6,5 blocks / us, clearly slower.
> > >> >> If you have one waitstate, you will see that the bandwidth is still >> >> high >> Yes, but if the jumps are probably only 10-20% of all instructions >> so you lose only between 10-20% of the performance instead of 50%. >> The AVR32 loses less than 10% in average. > > But you are comparing apples and oranges. A processor that has no > wait states doesn't have to deal with this no matter what the > instruction mix is. It is just much simpler to not have to consider > memory latencies. >
A processor running from flash without waitstates will be limited in performance by the memory. A processor which reads multiple instructions with waitstate will be able to execute faster due to its higher bandwidth to memory.
> >> >> I have run the SAM7 at 48 MHz, zero waitstate. Does not work over the >> >> full >> >> temp range though. >> >> The AVR32 will support 1.2 MIPS/MHz @ 1 waitstate operation @ 66 MHz >> >> due to its 33 MHz 2 way interleaved flash memory. >> >> (1st access after jump is two clocks, subsucquent accesses are 1 >> >> clock) >> >> > How does that compare to the Cortex M3 running at 50 MHz with no >> > waitstates and no branch penalty? >> >> The UC3000 is claimed as 80 MIPS at 66 MHz. >> For the Cortex M3 to reach 80 MIPS at 50 MHz, >> you have to have 80/50 = 1,6 MIPS per MHz. >> I think that ARM does not claim that the Cortex is close to 1,6 MIPS per >> MHz. > > Oh, this is marketing stuff. I thought you might have run some real > benchmarks or someone else at Atmel might have.
They have run benchmarks on the AVR32, but I think people are relying on official figures for the Cortex.
> Certainly they have > looked hard at the Cortex. But if it competes too well against the > AVR32, I can see why it would not be pushed at Atmel. > Certainly there > will be a lot of sockets that will be won by an ARM device over a sole > source part like the AVR32.
And hopefully ARM device from Atmel :-)
> At this point I don't think anyone can > say whether the AVR32 has legs and will be around in 5 years. It has > been out for what, a year or so? >
Fortunately there are plenty of sockets around, and some will go AVR32.
> >> The AVR32 is decidedly better on DSP algorithms due to its >> single cycle MAC and also it has faster access to SRAM. >> Reading internal SRAM is a one clock cycle operation on the AVR32. >> Bit banging will be one of the strengths of the UC3000. > > Isn't reading internal SRAM a single cycle on *all* processors? I > can't think of any that require wait states. In fact, most processors > try to cram as much SRAM onto the chip as possible because it is so > fast. Did you say what you meant to say? >
On the UC3000 family, loading from internal SRAM will take one clock in the execution stage. Using single cycle SRAM does not mean that the load instruction is 1 clock. -- Best Regards, Ulf Samuelsson This is intended to be my personal opinion which may, or may not be shared by my employer Atmel Nordic AB
rickman wrote:
> On Jul 6, 10:51 am, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote: > >>The FIFO is implemented using Flip-Flops and you had a >>simple three stage pipeline (fetch, decode,execute) so >>your latency was not dramatic. > > > That is not the point. By prefetching the instructions, you are > setting up for a bigger dump and subsequent loss of instruction memory > bandwidth when you branch. FIFOs or instruction prefetching are not a > perfect solution. It is much better to just have single cycle > memory. > > > >>>>If you have one waitstate, you will see that the bandwidth is still high >> >>Yes, but if the jumps are probably only 10-20% of all instructions >>so you lose only between 10-20% of the performance instead of 50%. >>The AVR32 loses less than 10% in average. > > > But you are comparing apples and oranges. A processor that has no > wait states doesn't have to deal with this no matter what the > instruction mix is. It is just much simpler to not have to consider > memory latencies.
Of course, yes it "is much better to just have single cycle memory" - but in the real world, chip designers have to settle on what they can get, and right now, FLASH access speeds are a real bottleneck on uC performance. Width of FLASH access, (or Interleave), can have MORE impact on final speed, than any subtleness in the core itself.
> >>>>I have run the SAM7 at 48 MHz, zero waitstate. Does not work over the >>>>full >>>>temp range though. >>>>The AVR32 will support 1.2 MIPS/MHz @ 1 waitstate operation @ 66 MHz >>>>due to its 33 MHz 2 way interleaved flash memory. >>>>(1st access after jump is two clocks, subsucquent accesses are 1 clock) >> >>>How does that compare to the Cortex M3 running at 50 MHz with no >>>waitstates and no branch penalty? >> >>The UC3000 is claimed as 80 MIPS at 66 MHz. >>For the Cortex M3 to reach 80 MIPS at 50 MHz, >>you have to have 80/50 = 1,6 MIPS per MHz. >>I think that ARM does not claim that the Cortex is close to 1,6 MIPS per >>MHz. > > > Oh, this is marketing stuff. I thought you might have run some real > benchmarks or someone else at Atmel might have. Certainly they have > looked hard at the Cortex. But if it competes too well against the > AVR32, I can see why it would not be pushed at Atmel. Certainly there > will be a lot of sockets that will be won by an ARM device over a sole > source part like the AVR32. At this point I don't think anyone can > say whether the AVR32 has legs and will be around in 5 years. It has > been out for what, a year or so?
You can say (almost) the same for the CortexM3 ? It too is quite new, and I've not seen any multi-sourced (pin/peripheral compatible) offerings. Will it hit 'critical mass' ? From a porting viewpoint, an Atmel ARM7 user, could find it less of a jump to go to AVR32 (or the comming Atmel Flash ARM9's), than CortexM3, as the Atmel peripherals are very similar. The AVR32 I see as having a long life, it seems to have low cost tool flows, and good debug support. (Don't underestimate the importance of good debug support.) The actual uC Cores matter less and less : package and peripherals have determined our shortlists in latest projects - and the ST Cortex even made it onto the list, on that basis, until we found their serious oops, that CAN and USB were mutually exclusive ?!? Then, there is the new Coldfire V1 core from Freescale. Choices, Choices.... -jg
Ulf Samuelsson wrote:
> "rickman" <gnuarm@gmail.com> skrev i meddelandet > > That is not the point. By prefetching the instructions, you are > > setting up for a bigger dump and subsequent loss of instruction memory > > bandwidth when you branch. FIFOs or instruction prefetching are not a > > perfect solution. It is much better to just have single cycle > > memory. > > Actually it is not, because if you try to decode your instruction > in the same stage as the decoding, your clock frequency will > go down significantly. > The prefetching will work with single cycle memory and with > memory having waitstates.
What are you talking about??? How is slow memory faster than fast memory???
> Prefetching, decoding and execution, all will take one clock. > If you execute at 66 MHz with a three stage pipeline > then you probably will execute around ~40 MHz with > a two stage pipeline (Just a guess). > > If you execute blocks of 5 instruction including one jump, > each block will use 7 cycles (3 + 1 + 1 + 1 + 1) @ 66 Mhz > in a three stage pipeline for ~ 10 blocks / us. > > In a two stage pipeline, you could use 2 clocks for a jump > so you execute (2 + 1 + 1 + 1 + 1) @ 40 MHz > which is 6,5 blocks / us, clearly slower.
Since when do I get to design my own processor??? Everything you have just written is based on your own assumptions. This is a pointless discussion since everything you say is based on *your* assumptions! In addition, you only consider the parts of the issue that you choose to include. You did a timing analysis on paper that does not include the effect of branches. Clearly not accurate regardless of your assumptions!
> > But you are comparing apples and oranges. A processor that has no > > wait states doesn't have to deal with this no matter what the > > instruction mix is. It is just much simpler to not have to consider > > memory latencies. > > > > A processor running from flash without wait states will be limited > in performance by the memory. > A processor which reads multiple instructions with wait state > will be able to execute faster due to its higher bandwidth to memory.
Again you are assuming facts that are not in evidence. Where do you get the higher bandwidth from memory if it is running with wait states? Oh, right, you are *assuming* that there is something different in the design that will make that one faster. Something that is not part of a slower Flash that requires wait states.
> >> The UC3000 is claimed as 80 MIPS at 66 MHz. > >> For the Cortex M3 to reach 80 MIPS at 50 MHz, > >> you have to have 80/50 = 1,6 MIPS per MHz. > >> I think that ARM does not claim that the Cortex is close to 1,6 MIPS per > >> MHz. > > > > Oh, this is marketing stuff. I thought you might have run some real > > benchmarks or someone else at Atmel might have. > > They have run benchmarks on the AVR32, but I think people are relying > on official figures for the Cortex.
"People" being "you"?
> > Certainly they have > > looked hard at the Cortex. But if it competes too well against the > > AVR32, I can see why it would not be pushed at Atmel. > > Certainly there > > will be a lot of sockets that will be won by an ARM device over a sole > > source part like the AVR32. > > And hopefully ARM device from Atmel :-)
There are a number of sockets that Atmel won't win if they don't have a CM3 device. There are two companies with the new core in production and a third on their heels. I am sure sales of the ARM7 devices won't drop off a cliff. But this business is all about design wins and I stand by my earlier post in another thread that the CM3 will start to steal significant numbers of design wins by the end of this year and by the end of next year they will overshadow the ARM7 design wins in the off the shelf MCU market.
> > At this point I don't think anyone can > > say whether the AVR32 has legs and will be around in 5 years. It has > > been out for what, a year or so? > > > > Fortunately there are plenty of sockets around, and some will go AVR32.
Is that the plan for the AVR32, to take *some* sockets? You know as well as I do that if the AVR32 does not get significant market penetration within a two years from now, it will be put on the back burner and eventually discontinued. Atmel has no reason to keep making a part that consumes significant resources and does not make significant profit. Look at what happened to Atmel programmable logic. When was the last time they added a new FPGA to the product line? How many FPSLICs have been designed into new sockets?
> >> The AVR32 is decidedly better on DSP algorithms due to its > >> single cycle MAC and also it has faster access to SRAM. > >> Reading internal SRAM is a one clock cycle operation on the AVR32. > >> Bit banging will be one of the strengths of the UC3000. > > > > Isn't reading internal SRAM a single cycle on *all* processors? I > > can't think of any that require wait states. In fact, most processors > > try to cram as much SRAM onto the chip as possible because it is so > > fast. Did you say what you meant to say? > > > > On the UC3000 family, loading from internal SRAM will take one clock > in the execution stage. > Using single cycle SRAM does not mean that the load instruction is 1 clock.
Like I said, aren't all internal SRAMs in all processors single cycle???
"rickman" <gnuarm@gmail.com> skrev i meddelandet
news:1183995592.678499.34860@n2g2000hse.googlegroups.com...
> Ulf Samuelsson wrote: >> "rickman" <gnuarm@gmail.com> skrev i meddelandet >> > That is not the point. By prefetching the instructions, you are >> > setting up for a bigger dump and subsequent loss of instruction memory >> > bandwidth when you branch. FIFOs or instruction prefetching are not a >> > perfect solution. It is much better to just have single cycle >> > memory. >> >> Actually it is not, because if you try to decode your instruction >> in the same stage as the decoding, your clock frequency will >> go down significantly. >> The prefetching will work with single cycle memory and with >> memory having waitstates. > > What are you talking about??? How is slow memory faster than fast > memory??? >
If you have a memory capable of running at 50 MHz and you put that in a CPU capable of running at 25 MHz, then you will run slower. In a two stage pipeline, you do "fetch-decode" and "execute". If memory access, decoding and execution takes 20 ns, then it will take 20 + 20 = 40 ns to handle the "fetch-decode" stage, so the CPU can run at 25 MHz. In a three stage pipeline, you do "fetch", "decode", "execute". If all three stages take 20 ns, then you will be able to run at 50 MHz.
> >> Prefetching, decoding and execution, all will take one clock. >> If you execute at 66 MHz with a three stage pipeline >> then you probably will execute around ~40 MHz with >> a two stage pipeline (Just a guess). >> >> If you execute blocks of 5 instruction including one jump, >> each block will use 7 cycles (3 + 1 + 1 + 1 + 1) @ 66 Mhz >> in a three stage pipeline for ~ 10 blocks / us. >> >> In a two stage pipeline, you could use 2 clocks for a jump >> so you execute (2 + 1 + 1 + 1 + 1) @ 40 MHz >> which is 6,5 blocks / us, clearly slower. > > Since when do I get to design my own processor??? Everything you have > just written is based on your own assumptions. This is a pointless > discussion since everything you say is based on *your* assumptions! > In addition, you only consider the parts of the issue that you choose > to include. You did a timing analysis on paper that does not include > the effect of branches. Clearly not accurate regardless of your > assumptions!
Statistics is likely to show that branches are normally not that frequent that you gain speed by having a shorter pipeline.
> > >> > But you are comparing apples and oranges. A processor that has no >> > wait states doesn't have to deal with this no matter what the >> > instruction mix is. It is just much simpler to not have to consider >> > memory latencies. >> > >> >> A processor running from flash without wait states will be limited >> in performance by the memory. >> A processor which reads multiple instructions with wait state >> will be able to execute faster due to its higher bandwidth to memory. > > Again you are assuming facts that are not in evidence. Where do you > get the higher bandwidth from memory if it is running with wait > states? Oh, right, you are *assuming* that there is something > different in the design that will make that one faster. Something > that is not part of a slower Flash that requires wait states. >
By making it wider.
> >> >> The UC3000 is claimed as 80 MIPS at 66 MHz. >> >> For the Cortex M3 to reach 80 MIPS at 50 MHz, >> >> you have to have 80/50 = 1,6 MIPS per MHz. >> >> I think that ARM does not claim that the Cortex is close to 1,6 MIPS >> >> per >> >> MHz. >> > >> > Oh, this is marketing stuff. I thought you might have run some real >> > benchmarks or someone else at Atmel might have. >> >> They have run benchmarks on the AVR32, but I think people are relying >> on official figures for the Cortex. > > "People" being "you"?
No, Atmel marketing.
> > >> > Certainly they have >> > looked hard at the Cortex. But if it competes too well against the >> > AVR32, I can see why it would not be pushed at Atmel. >> > Certainly there >> > will be a lot of sockets that will be won by an ARM device over a sole >> > source part like the AVR32. >> >> And hopefully ARM device from Atmel :-) > > There are a number of sockets that Atmel won't win if they don't have > a CM3 device. There are two companies with the new core in production > and a third on their heels. I am sure sales of the ARM7 devices won't > drop off a cliff. But this business is all about design wins and I > stand by my earlier post in another thread that the CM3 will start to > steal significant numbers of design wins by the end of this year and > by the end of next year they will overshadow the ARM7 design wins in > the off the shelf MCU market.
And maybe the ARM9 designs overshadows the ARM7 and CM3 as well. I see most high volume designs nowadays require 200 MHz + operation. The large customers (1M+) requiring low power, seems to focus on 1,8V SAM7s or AVR32s. This is of course only 5% of the total MCU market normally so things could be different in your region. A company selecting a binary compatible family, will still be better off with ARM than with Cortex, due to larger performance span.
>> > At this point I don't think anyone can >> > say whether the AVR32 has legs and will be around in 5 years. It has >> > been out for what, a year or so? >> > >> >> Fortunately there are plenty of sockets around, and some will go AVR32. > > Is that the plan for the AVR32, to take *some* sockets? You know as > well as I do that if the AVR32 does not get significant market > penetration within a two years from now, it will be put on the back > burner and eventually discontinued. Atmel has no reason to keep making > a part that consumes significant resources and does not make > significant profit. Look at what happened to Atmel programmable > logic. When was the last time they added a new FPGA to the product > line? How many FPSLICs have been designed into new sockets? >
> >> >> The AVR32 is decidedly better on DSP algorithms due to its >> >> single cycle MAC and also it has faster access to SRAM. >> >> Reading internal SRAM is a one clock cycle operation on the AVR32. >> >> Bit banging will be one of the strengths of the UC3000. >> > >> > Isn't reading internal SRAM a single cycle on *all* processors? I >> > can't think of any that require wait states. In fact, most processors >> > try to cram as much SRAM onto the chip as possible because it is so >> > fast. Did you say what you meant to say? >> > >> >> On the UC3000 family, loading from internal SRAM will take one clock >> in the execution stage. >> Using single cycle SRAM does not mean that the load instruction is 1 >> clock. > > Like I said, aren't all internal SRAMs in all processors single > cycle??? >
Maybe so, but from a performance point of view, you are more interested in how many cycles it takes to load from SRAM into a register, and if this takes 1 clock cycle due to a 1 clock load instruction, or 3 clock cycles due to a 3 clock load instruction (from a 1 clock cycle SRAM), then you do see a performance differnence. -- Best Regards, Ulf Samuelsson This is intended to be my personal opinion which may, or may not be shared by my employer Atmel Nordic AB
On Sat, 14 Jul 2007 10:04:25 +0200, "Ulf Samuelsson"
<ulf@a-t-m-e-l.com> wrote:

>Statistics is likely to show that branches are normally not that frequent >that you >gain speed by having a shorter pipeline.
Branch frequency is highly dependent on the application domain and coding style. However, it has been reported than in control-type applications branch instructions can be 20% to 30% of the total. Stephen -- Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads
On Jul 14, 4:04 am, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote:
> "rickman" <gnu...@gmail.com> skrev i meddelandetnews:1183995592.678499.34860@n2g2000hse.googlegroups.com... > > > > > Ulf Samuelsson wrote: > >> "rickman" <gnu...@gmail.com> skrev i meddelandet > >> > That is not the point. By prefetching the instructions, you are > >> > setting up for a bigger dump and subsequent loss of instruction memory > >> > bandwidth when you branch. FIFOs or instruction prefetching are not a > >> > perfect solution. It is much better to just have single cycle > >> > memory. > > >> Actually it is not, because if you try to decode your instruction > >> in the same stage as the decoding, your clock frequency will > >> go down significantly. > >> The prefetching will work with single cycle memory and with > >> memory having waitstates. > > > What are you talking about??? How is slow memory faster than fast > > memory??? > > If you have a memory capable of running at 50 MHz and you > put that in a CPU capable of running at 25 MHz, then you > will run slower. > > In a two stage pipeline, you do "fetch-decode" and "execute". > If memory access, decoding and execution takes 20 ns, > then it will take 20 + 20 = 40 ns to handle the "fetch-decode" stage, > so the CPU can run at 25 MHz. > > In a three stage pipeline, you do "fetch", "decode", "execute". > If all three stages take 20 ns, then you will be able to run at 50 MHz.
This conversation has become pointless. It started discussing the loss of performance in processors that use slow Flash memory and you have turned it into a discussion of processor design. You are way off topic and your comments are irrelevant to the original point. The bottom line is that if all other things are equal, a processor with faster Flash memory will run faster. The Stellaris CM3 running at 50 MHz with no wait states from Flash will be faster for most apps than a processor running at 70 MHz with 1 or two wait states like the STM parts we were discussing. It may also be faster in many apps than a processor running at 70 MHz using a wide flash bus interface to overcome the wait states required because the lookahead fetch is often wasted when the instruction flow changes. You can dance around that, but those are the facts.
> >> Prefetching, decoding and execution, all will take one clock. > >> If you execute at 66 MHz with a three stage pipeline > >> then you probably will execute around ~40 MHz with > >> a two stage pipeline (Just a guess). > > >> If you execute blocks of 5 instruction including one jump, > >> each block will use 7 cycles (3 + 1 + 1 + 1 + 1) @ 66 Mhz > >> in a three stage pipeline for ~ 10 blocks / us. > > >> In a two stage pipeline, you could use 2 clocks for a jump > >> so you execute (2 + 1 + 1 + 1 + 1) @ 40 MHz > >> which is 6,5 blocks / us, clearly slower. > > > Since when do I get to design my own processor??? Everything you have > > just written is based on your own assumptions. This is a pointless > > discussion since everything you say is based on *your* assumptions! > > In addition, you only consider the parts of the issue that you choose > > to include. You did a timing analysis on paper that does not include > > the effect of branches. Clearly not accurate regardless of your > > assumptions! > > Statistics is likely to show that branches are normally not that frequent > that you > gain speed by having a shorter pipeline.
Funny, you are bringing in both statistics *and* probability. That is the type of language I hear all the time in commercials where they want you to think they have just told you a fact when in fact they have said pretty close to nothing.
> >> > But you are comparing apples and oranges. A processor that has no > >> > wait states doesn't have to deal with this no matter what the > >> > instruction mix is. It is just much simpler to not have to consider > >> > memory latencies. > > >> A processor running from flash without wait states will be limited > >> in performance by the memory. > >> A processor which reads multiple instructions with wait state > >> will be able to execute faster due to its higher bandwidth to memory. > > > Again you are assuming facts that are not in evidence. Where do you > > get the higher bandwidth from memory if it is running with wait > > states? Oh, right, you are *assuming* that there is something > > different in the design that will make that one faster. Something > > that is not part of a slower Flash that requires wait states. > > By making it wider. > > > > >> >> The UC3000 is claimed as 80 MIPS at 66 MHz. > >> >> For the Cortex M3 to reach 80 MIPS at 50 MHz, > >> >> you have to have 80/50 = 1,6 MIPS per MHz. > >> >> I think that ARM does not claim that the Cortex is close to 1,6 MIPS > >> >> per > >> >> MHz. > > >> > Oh, this is marketing stuff. I thought you might have run some real > >> > benchmarks or someone else at Atmel might have. > > >> They have run benchmarks on the AVR32, but I think people are relying > >> on official figures for the Cortex. > > > "People" being "you"? > > No, Atmel marketing.
Ahhh, *marketing*! That makes it very clear now. We can all have complete trust in benchmark figures from *marketing*!
> >> > Certainly they have > >> > looked hard at the Cortex. But if it competes too well against the > >> > AVR32, I can see why it would not be pushed at Atmel. > >> > Certainly there > >> > will be a lot of sockets that will be won by an ARM device over a sole > >> > source part like the AVR32. > > >> And hopefully ARM device from Atmel :-) > > > There are a number of sockets that Atmel won't win if they don't have > > a CM3 device. There are two companies with the new core in production > > and a third on their heels. I am sure sales of the ARM7 devices won't > > drop off a cliff. But this business is all about design wins and I > > stand by my earlier post in another thread that the CM3 will start to > > steal significant numbers of design wins by the end of this year and > > by the end of next year they will overshadow the ARM7 design wins in > > the off the shelf MCU market. > > And maybe the ARM9 designs overshadows the ARM7 and CM3 as well. > I see most high volume designs nowadays require 200 MHz + operation. > The large customers (1M+) requiring low power, seems to focus > on 1,8V SAM7s or AVR32s. > This is of course only 5% of the total MCU market normally > so things could be different in your region.
Yes, the swan song of the truly desperate. If anyone connected to the ARM7 feels threatened by the CM3, they simply bring in the ARM9 which is a totally unsuited processor for most of the apps that the ARM7 and CM3 target. The ARM9 will never fit the sockets that the ARM7 and CM3 fill. However, the CM3 fill most of those sockets much better than the ARM7 and that is my point.
> A company selecting a binary compatible family, will still be better off > with ARM > than with Cortex, due to larger performance span.
If they can shoe horn it onto their board! An ARM9 may be the right choice for a router, but not for a controller. The CM3 is targeted to the lower end bumping up against the 8 bit devices and eating into their market segment. The ARM9 will never compete in that area. It is too large of a chip and will always be uncompetitive at the low end.
> >> > At this point I don't think anyone can > >> > say whether the AVR32 has legs and will be around in 5 years. It has > >> > been out for what, a year or so? > > >> Fortunately there are plenty of sockets around, and some will go AVR32. > > > Is that the plan for the AVR32, to take *some* sockets? You know as > > well as I do that if the AVR32 does not get significant market > > penetration within a two years from now, it will be put on the back > > burner and eventually discontinued. Atmel has no reason to keep making > > a part that consumes significant resources and does not make > > significant profit. Look at what happened to Atmel programmable > > logic. When was the last time they added a new FPGA to the product > > line? How many FPSLICs have been designed into new sockets?
I see you ignored this comment. There are any number of "good ideas" that have totally failed in the market place. It is very possible that the ARM32 will be one of them.
> >> >> The AVR32 is decidedly better on DSP algorithms due to its > >> >> single cycle MAC and also it has faster access to SRAM. > >> >> Reading internal SRAM is a one clock cycle operation on the AVR32. > >> >> Bit banging will be one of the strengths of the UC3000. > > >> > Isn't reading internal SRAM a single cycle on *all* processors? I > >> > can't think of any that require wait states. In fact, most processors > >> > try to cram as much SRAM onto the chip as possible because it is so > >> > fast. Did you say what you meant to say? > > >> On the UC3000 family, loading from internal SRAM will take one clock > >> in the execution stage. > >> Using single cycle SRAM does not mean that the load instruction is 1 > >> clock. > > > Like I said, aren't all internal SRAMs in all processors single > > cycle??? > > Maybe so, but from a performance point of view, you are more > interested in how many cycles it takes to load from SRAM into a > register, and if this takes 1 clock cycle due to a 1 clock load > instruction, or 3 clock cycles due to a 3 clock load instruction > (from a 1 clock cycle SRAM), then you do see a performance differnence.
What processor only uses 3 clock instructions to access 1 clock memory? My understanding is that many processors not only use faster instructions to load, but can use memory in other instructions which allow single cycle back to back memory accesses. Besides, no one feature ever makes or breaks a processor chip. There are literally dozens of distinguishing points between different processors and only marketing and salesmen try to narrow an engineer's focus to a small number of features. I care about the overall utility of a processor and one of the big selling points to me is the ubiquitousness of the ARM chips. Very soon that will include the CM3 devices which will take over the low end squeezing the ARM7 between the CM3 and the ARM9.
rickman wrote:
 > On Jul 14, 4:04 am, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote:
>>And maybe the ARM9 designs overshadows the ARM7 and CM3 as well. >>I see most high volume designs nowadays require 200 MHz + operation. >>The large customers (1M+) requiring low power, seems to focus >>on 1,8V SAM7s or AVR32s. >>This is of course only 5% of the total MCU market normally >>so things could be different in your region. > > > Yes, the swan song of the truly desperate. If anyone connected to the > ARM7 feels threatened by the CM3, they simply bring in the ARM9 which > is a totally unsuited processor for most of the apps that the ARM7 and > CM3 target. The ARM9 will never fit the sockets that the ARM7 and CM3 > fill. However, the CM3 fill most of those sockets much better than > the ARM7 and that is my point.
Couple of teensy weeny problems to that sweeping statement: For something to hope to "fill most of those sockets", it needs to be Pin and code compatible, Alas, the M3 is neither. I note that NXP has licensed the Cortex A8, but simply not bothered with the M3. [Likely their 128 bit fetch ARM7, makes the M3 too small a change] Many designers will think the same. I don't see many taking an ARM7 out of a released product, just for the fun of dropping in a M3. So, the M3 competes for new designs, and Ulf is right that the leading edge will want a bigger new-design jump than ARM7->M3, so that leaves the M3 chasing a narrow aperture of design wins. There, it competes against all the other 32bit offerings, and it competes on Peripherals package and power, as much as Core. We looked at the new ST M3's : Great I thought, a Small MCU, with USB and CAN (notice the actual core is not even on this selection list! ) -Oops, seems ST have designed a part that is USB _or_ CAN. Even a good 8 bit core would run USB & CAN, so we don't actually care about a 25% performance window.
>>> Look at what happened to Atmel programmable >>>logic. When was the last time they added a new FPGA to the product >>>line?
Atmel are adding new CPLDs, (but their FPGAs are in stable design mode). They have the new CAP series, with ARM7 and ARM9. The new family looks well placed, to pick up 'Cost Down Design Passes' on products that started commercial life in FPGAs, but as volume (and competition) ramps, they need more efficent silicon.
>>> How many FPSLICs have been designed into new sockets? > > I see you ignored this comment. There are any number of "good ideas" > that have totally failed in the market place. It is very possible > that the ARM32 will be one of them.
I'm guessing you actually meant to say AVR32 here ;) I see AVR32 and FpSLIC as very different animals. FpSLIC: - a "good idea" ? Hmm... It was clear (to me, at least) even from release, the FpSLIC had problems, which was that it LOOKED to be very flexible to someone in marketing, but to a designer was actually very constraining: You had to KNOW you code was NEVER going to go above the (16K?) chip limit, and you had to have an application too big for a CPLD, and small enough to use the FPGA portion (but never exceed it) Then you notice that an application small enough to fit in 16K, but that ALSO needs a Small-Moderate FPGA, is becomming a tiny segment indeed. AVR32: This is a much simpler design choice. high end uC design choice is based mainly on the 4 P's : Peripherals, Power, Package & Price. Success is helped a lot by low cost tools, and good on chip debug will be important, as will a strong eco-system. Atmel's road map on this is looking pretty good. [So do Freescale's, and Infineon's, and none of these use M3...] -jg

"rickman" <gnuarm@gmail.com> skrev i meddelandet
news:1184594668.666542.195070@57g2000hsv.googlegroups.com...
> On Jul 14, 4:04 am, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote: >> "rickman" <gnu...@gmail.com> skrev i >> meddelandetnews:1183995592.678499.34860@n2g2000hse.googlegroups.com... >> >> >> >> > Ulf Samuelsson wrote: >> >> "rickman" <gnu...@gmail.com> skrev i meddelandet >> >> > That is not the point. By prefetching the instructions, you are >> >> > setting up for a bigger dump and subsequent loss of instruction >> >> > memory >> >> > bandwidth when you branch. FIFOs or instruction prefetching are not >> >> > a >> >> > perfect solution. It is much better to just have single cycle >> >> > memory. >> >> >> Actually it is not, because if you try to decode your instruction >> >> in the same stage as the decoding, your clock frequency will >> >> go down significantly. >> >> The prefetching will work with single cycle memory and with >> >> memory having waitstates. >> >> > What are you talking about??? How is slow memory faster than fast >> > memory??? >> >> If you have a memory capable of running at 50 MHz and you >> put that in a CPU capable of running at 25 MHz, then you >> will run slower. >> >> In a two stage pipeline, you do "fetch-decode" and "execute". >> If memory access, decoding and execution takes 20 ns, >> then it will take 20 + 20 = 40 ns to handle the "fetch-decode" stage, >> so the CPU can run at 25 MHz. >> >> In a three stage pipeline, you do "fetch", "decode", "execute". >> If all three stages take 20 ns, then you will be able to run at 50 MHz. > > This conversation has become pointless. It started discussing the > loss of performance in processors that use slow Flash memory and you > have turned it into a discussion of processor design. You are way off > topic and your comments are irrelevant to the original point. The > bottom line is that if all other things are equal, a processor with > faster Flash memory will run faster. The Stellaris CM3 running at 50 > MHz with no wait states from Flash will be faster for most apps than a > processor running at 70 MHz with 1 or two wait states like the STM > parts we were discussing. It may also be faster in many apps than a > processor running at 70 MHz using a wide flash bus interface to > overcome the wait states required because the lookahead fetch is often > wasted when the instruction flow changes. > > You can dance around that, but those are the facts. >
Nope it isn't, the AVR32 running at 66 MHz will run mostly at zero waitstates due to its interleaved flash controller design. Each flash access done by the memory controller will have 1 waitstate, but since the memory controller can do two accesses in parallel, the CPU will only see waitstates during jumps, and no waitstates during non jump instructions. If you do jumps 20% of the time, then the average number of waitstates is 0,2. On top of that you will be able to perform dataaccesses to the flash while eating from the instruction queue wihout any performance penalty.
>> And maybe the ARM9 designs overshadows the ARM7 and CM3 as well. >> I see most high volume designs nowadays require 200 MHz + operation. >> The large customers (1M+) requiring low power, seems to focus >> on 1,8V SAM7s or AVR32s. >> This is of course only 5% of the total MCU market normally >> so things could be different in your region. > > Yes, the swan song of the truly desperate. If anyone connected to the > ARM7 feels threatened by the CM3, they simply bring in the ARM9 which > is a totally unsuited processor for most of the apps that the ARM7 and > CM3 target. The ARM9 will never fit the sockets that the ARM7 and CM3 > fill. However, the CM3 fill most of those sockets much better than > the ARM7 and that is my point.
The ARM9 will fit almost any sockets where the user require an external bus.
> > >> A company selecting a binary compatible family, will still be better off >> with ARM >> than with Cortex, due to larger performance span. > > If they can shoe horn it onto their board! An ARM9 may be the right > choice for a router, but not for a controller. The CM3 is targeted to > the lower end bumping up against the 8 bit devices and eating into > their market segment. The ARM9 will never compete in that area. It > is too large of a chip and will always be uncompetitive at the low > end.
You'd be surprised how often ARM9 fits the bill.
>> >> > At this point I don't think anyone can >> >> > say whether the AVR32 has legs and will be around in 5 years. It >> >> > has >> >> > been out for what, a year or so? >> >> >> Fortunately there are plenty of sockets around, and some will go >> >> AVR32. >> >> > Is that the plan for the AVR32, to take *some* sockets? You know as >> > well as I do that if the AVR32 does not get significant market >> > penetration within a two years from now, it will be put on the back >> > burner and eventually discontinued. Atmel has no reason to keep making >> > a part that consumes significant resources and does not make >> > significant profit. Look at what happened to Atmel programmable >> > logic. When was the last time they added a new FPGA to the product >> > line? How many FPSLICs have been designed into new sockets? > > I see you ignored this comment. There are any number of "good ideas" > that have totally failed in the market place. It is very possible > that the ARM32 will be one of them. > > >> >> >> The AVR32 is decidedly better on DSP algorithms due to its >> >> >> single cycle MAC and also it has faster access to SRAM. >> >> >> Reading internal SRAM is a one clock cycle operation on the AVR32. >> >> >> Bit banging will be one of the strengths of the UC3000. >> >> >> > Isn't reading internal SRAM a single cycle on *all* processors? I >> >> > can't think of any that require wait states. In fact, most >> >> > processors >> >> > try to cram as much SRAM onto the chip as possible because it is so >> >> > fast. Did you say what you meant to say? >> >> >> On the UC3000 family, loading from internal SRAM will take one clock >> >> in the execution stage. >> >> Using single cycle SRAM does not mean that the load instruction is 1 >> >> clock. >> >> > Like I said, aren't all internal SRAMs in all processors single >> > cycle??? >> >> Maybe so, but from a performance point of view, you are more >> interested in how many cycles it takes to load from SRAM into a >> register, and if this takes 1 clock cycle due to a 1 clock load >> instruction, or 3 clock cycles due to a 3 clock load instruction >> (from a 1 clock cycle SRAM), then you do see a performance differnence. > > What processor only uses 3 clock instructions to access 1 clock > memory? My understanding is that many processors not only use faster > instructions to load, but can use memory in other instructions which > allow single cycle back to back memory accesses.
The simple three stage pipeline processors (and the CM3) normally use a few clocks in the execution stage to load data, but the uC3 family does not.
> Besides, no one feature ever makes or breaks a processor chip. There > are literally dozens of distinguishing points between different > processors and only marketing and salesmen try to narrow an engineer's > focus to a small number of features. I care about the overall utility > of a processor and one of the big selling points to me is the > ubiquitousness of the ARM chips. Very soon that will include the CM3 > devices which will take over the low end squeezing the ARM7 between > the CM3 and the ARM9.
-- Best Regards, Ulf Samuelsson This is intended to be my personal opinion which may, or may not be shared by my employer Atmel Nordic AB
On Jul 16, 5:30 pm, Jim Granville <no.s...@designtools.maps.co.nz>
wrote:
> rickman wrote: > > > On Jul 14, 4:04 am, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote: > > >>And maybe the ARM9 designs overshadows the ARM7 and CM3 as well. > >>I see most high volume designs nowadays require 200 MHz + operation. > >>The large customers (1M+) requiring low power, seems to focus > >>on 1,8V SAM7s or AVR32s. > >>This is of course only 5% of the total MCU market normally > >>so things could be different in your region. > > > Yes, the swan song of the truly desperate. If anyone connected to the > > ARM7 feels threatened by the CM3, they simply bring in the ARM9 which > > is a totally unsuited processor for most of the apps that the ARM7 and > > CM3 target. The ARM9 will never fit the sockets that the ARM7 and CM3 > > fill. However, the CM3 fill most of those sockets much better than > > the ARM7 and that is my point. > > Couple of teensy weeny problems to that sweeping statement: > For something to hope to "fill most of those sockets", it > needs to be Pin and code compatible, Alas, the M3 is neither.
No, when I say "fill the sockets" I am not talking about new chips being used in old designs, I am talking about the new chips being used in new designs that would otherwise make use of the other MCUs. So when new designs are started, a designer who considers the CM3 will see that it is a better choice for most designs that he would otherwise use an ARM7. Likewise, for designs that would otherwise use an ARM9, they will mostly continue to use the ARM9. I see the ARM7/ CM3 as fitting different sockets than the ARM9 with little overlap. So please try to read my words carefully. I know you can figure out what I mean since we have discussed this before and I am saying the same things I have said before. I guess I should reconsider my purpose in continuing to discuss this with you since you don't seem to pick up on what I am saying and the meaning seems to get twisted a lot.
> I note that NXP has licensed the Cortex A8, but simply not bothered > with the M3. > [Likely their 128 bit fetch ARM7, makes the M3 too small a change] > > Many designers will think the same. > I don't see many taking an ARM7 out of a released product, just > for the fun of dropping in a M3.
I agree, it would be silly to pull back a released product just to change the MCU when it is working just fine.
> So, the M3 competes for new designs, and Ulf is right that the leading > edge will want a bigger new-design jump than ARM7->M3, so that leaves > the M3 chasing a narrow aperture of design wins. > There, it competes against all the other 32bit offerings, and > it competes on Peripherals package and power, as much as Core.
I don't know what you mean by "leading-edge". New designs cover a wide range of requirements for the MCU from tiny 8 bit devices that give the lowest cost to huge 32 bit processors that nearly keep up with x86 CPUs. The application range of the ARM7/CM3 has little overlap with the ARM9. The most significant separator is cost. Most ARM9s do not include program storage requiring external Flash. The one ARM9 family that includes Flash runs much slower than the other ARM9s and is only a slight speed (or any other) improvement over the ARM7 or CM3. The CM3 has several advantages over both the ARM7 and ARM9 which you seem to want to dismiss while focusing on how the ARM9 is a very different processor with more advanced capabilities targeting a different market. Using an ARM9 in many applications is like using a mortar to hunt rabbits. There may be more features in the ARM9s than the CM3, but if you don't need them, why pay for them? Why do you continue to try to compare the ARM9 to the CM3? They address different markets and there is very little overlap.
> We looked at the new ST M3's : Great I thought, a Small MCU, with USB > and CAN (notice the actual core is not even on this selection list! ) > -Oops, seems ST have designed a part that is USB _or_ CAN. > Even a good 8 bit core would run USB & CAN, so we don't actually care > about a 25% performance window. > > >>> Look at what happened to Atmel programmable > >>>logic. When was the last time they added a new FPGA to the product > >>>line? > > Atmel are adding new CPLDs, (but their FPGAs are in stable design mode). > They have the new CAP series, with ARM7 and ARM9. > The new family looks well placed, to pick up 'Cost Down Design Passes' > on products that started commercial life in FPGAs, but as volume > (and competition) ramps, they need more efficent silicon.
Now you are going off into left field. My point was to compare the single source AVR32 to other single source products such as the FPSLIC which has failed in the market and will leave someone high and dry when it is discontinued. You bring an ASIC into the discussion as if it were somehow relevant. What was your point???
> >>> How many FPSLICs have been designed into new sockets? > > > I see you ignored this comment. There are any number of "good ideas" > > that have totally failed in the market place. It is very possible > > that the ARM32 will be one of them. > > I'm guessing you actually meant to say AVR32 here ;)
Yes, my slip...
> I see AVR32 and FpSLIC as very different animals.
Yes, they are different, but they have a significant common point, they are both single source with very stiff competition. It will be very easy for the AVR32 to slowly die just like the FPSLIC, the Transcend processors and many other products that just could not compete in the market. It is especially interesting that Atmel continues to introduce new ARM processors along side of the AVR32. I seem to recall Intel doing that with various processors like the 860, 960 and others, all of which died off and left users high and dry. I believe the 860 was a popular product in the military camp and was designed into a number of systems with 10 to 20 year lifespans. Then 3 years in, the family was discontinued so customers didn't even have similar chips to upgrade to. I can see the AVR32 going this same route.
> FpSLIC: - a "good idea" ? Hmm... > It was clear (to me, at least) even from release, the FpSLIC had > problems, which was that it LOOKED to be very flexible to someone in > marketing, but to a designer was actually very constraining: > > You had to KNOW you code was NEVER going to go above the (16K?) > chip limit, and you had to have an application too big for a CPLD, > and small enough to use the FPGA portion (but never exceed it) > Then you notice that an application small enough to fit in 16K, > but that ALSO needs a Small-Moderate FPGA, is becomming a tiny segment > indeed.
This sounds like a specious argument. *EVERY* CPU has limitations which you have to accept when you use it. At the time the FPSLIC was introduced some 10 years or more ago, 16kB was a generous amount of RAM for an 8 bit MCU. This memory is RAM, not Flash which was stored off chip in the FPSLIC. Regardless, it does not matter what flaws the product had, the point is that this type of product was sole sourced which had a lot to do with the product failure. It is not just a matter of pin compatibility, there was no one else making devices remotely like FPSLICs. That was actually the reason I did not use it in a design it was perfectly suited to. Likewise switching from an AVR32 to another processor will require a lot more work than just switching between ARMs.
> AVR32: This is a much simpler design choice. high end uC design choice > is based mainly on the 4 P's : Peripherals, Power, Package & Price. > Success is helped a lot by low cost tools, and good on chip debug > will be important, as will a strong eco-system.
That rolls off the tongue well, but there are significant difference between CPUs. You seem to point that out in spades when you compare the ARM7 to its sibling CM3, but completely dismiss it when you compare the AVR32 to all the other 32 bit processors. Staying within a family saves a lot of work. The ARM family has a great deal of commonality between all of its members with a wide target range while the AVR32 has a limited target range and requires switching families to go outside it. The bottom line is that the ARM chips have legs that other, proprietary products don't. Even ignoring the technical issues, the ARM has momentum which will capture a lot of design wins in close races.
> Atmel's road map on this is looking pretty good. > [So do Freescale's, and Infineon's, and none of these use M3...]
I seem to recall that the ARMs are a big part of Atmel's road map. That is my point, the CM3 is a better ARM than the ARM7 is. Everything the ARM7 does, the CM3 does better. The designs they target are not a good match to the ARM9 because of higher power consumption, larger physical size or higher cost. The CM3 out competes the ARM7 in every area except for the number of implementations which I am saying will be changing over the next few years. Finally, I don't see the AVR32 having any real advantages over the ARM processors unless there is an app which just happens to fit the AVR32 details better than any of the ARMs. The number of apps for which this is true will be very small indeed. So with more makers announcing new CM3 chips, I see the crossover point (more design wins of off the shelf MCUs) for the CM3 vs the ARM7 to be within the next year and may be by the end of this year.
On Jul 20, 6:37 pm, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote:
> "rickman" <gnu...@gmail.com> skrev i meddelandetnews:1184594668.666542.195070@57g2000hsv.googlegroups.com... > > > > > On Jul 14, 4:04 am, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote: > >> "rickman" <gnu...@gmail.com> skrev i > >> meddelandetnews:1183995592.678499.34860@n2g2000hse.googlegroups.com... > > >> > Ulf Samuelsson wrote: > >> >> "rickman" <gnu...@gmail.com> skrev i meddelandet > >> >> > That is not the point. By prefetching the instructions, you are > >> >> > setting up for a bigger dump and subsequent loss of instruction > >> >> > memory > >> >> > bandwidth when you branch. FIFOs or instruction prefetching are not > >> >> > a > >> >> > perfect solution. It is much better to just have single cycle > >> >> > memory. > > >> >> Actually it is not, because if you try to decode your instruction > >> >> in the same stage as the decoding, your clock frequency will > >> >> go down significantly. > >> >> The prefetching will work with single cycle memory and with > >> >> memory having waitstates. > > >> > What are you talking about??? How is slow memory faster than fast > >> > memory??? > > >> If you have a memory capable of running at 50 MHz and you > >> put that in a CPU capable of running at 25 MHz, then you > >> will run slower. > > >> In a two stage pipeline, you do "fetch-decode" and "execute". > >> If memory access, decoding and execution takes 20 ns, > >> then it will take 20 + 20 = 40 ns to handle the "fetch-decode" stage, > >> so the CPU can run at 25 MHz. > > >> In a three stage pipeline, you do "fetch", "decode", "execute". > >> If all three stages take 20 ns, then you will be able to run at 50 MHz. > > > This conversation has become pointless. It started discussing the > > loss of performance in processors that use slow Flash memory and you > > have turned it into a discussion of processor design. You are way off > > topic and your comments are irrelevant to the original point. The > > bottom line is that if all other things are equal, a processor with > > faster Flash memory will run faster. The Stellaris CM3 running at 50 > > MHz with no wait states from Flash will be faster for most apps than a > > processor running at 70 MHz with 1 or two wait states like the STM > > parts we were discussing. It may also be faster in many apps than a > > processor running at 70 MHz using a wide flash bus interface to > > overcome the wait states required because the lookahead fetch is often > > wasted when the instruction flow changes. > > > You can dance around that, but those are the facts. > > Nope it isn't, the AVR32 running at 66 MHz will run mostly > at zero waitstates due to its interleaved flash controller design. > Each flash access done by the memory controller > will have 1 waitstate, but since the memory controller can do > two accesses in parallel, the CPU will only see waitstates > during jumps, and no waitstates during non jump instructions. > If you do jumps 20% of the time, then the average number of waitstates is > 0,2. > On top of that you will be able to perform dataaccesses to the flash > while eating from the instruction queue wihout any performance penalty.
That is pointless. It does not matter how large the FIFO is, if you are pulling data out at a given rate and you can only put data in at that same rate, as soon as you have to stop instruction reads to do a data read, you will not be filling the FIFO as fast as it is being emptied and performance will suffer. Run through a simulation and see if that is not true. Based on the info you provided, this is the result.
> >> And maybe the ARM9 designs overshadows the ARM7 and CM3 as well. > >> I see most high volume designs nowadays require 200 MHz + operation. > >> The large customers (1M+) requiring low power, seems to focus > >> on 1,8V SAM7s or AVR32s. > >> This is of course only 5% of the total MCU market normally > >> so things could be different in your region. > > > Yes, the swan song of the truly desperate. If anyone connected to the > > ARM7 feels threatened by the CM3, they simply bring in the ARM9 which > > is a totally unsuited processor for most of the apps that the ARM7 and > > CM3 target. The ARM9 will never fit the sockets that the ARM7 and CM3 > > fill. However, the CM3 fill most of those sockets much better than > > the ARM7 and that is my point. > > The ARM9 will fit almost any sockets where the user require an external bus.
So you are agreeing with me that the ARM9 is not a good match for most ARM7 or CM3 designs? The ARM9 may "fit" the design, but it will not be as good a fit if the ARM7 or CM3 can do the job. If nothing else, the cost and power consumption will be higher with the ARM9. In most cases the package size will be larger for the ARM9. Why use a shotgun when a slingshot will do the job?
> >> A company selecting a binary compatible family, will still be better off > >> with ARM > >> than with Cortex, due to larger performance span. > > > If they can shoe horn it onto their board! An ARM9 may be the right > > choice for a router, but not for a controller. The CM3 is targeted to > > the lower end bumping up against the 8 bit devices and eating into > > their market segment. The ARM9 will never compete in that area. It > > is too large of a chip and will always be uncompetitive at the low > > end. > > You'd be surprised how often ARM9 fits the bill.
No, I think I have a pretty good handle on the differences between Atmel's ARM9 processors and the CM3 product line. They are similar CPUs with very different interfaces to the outside world for two very different target ranges. Anyone who thinks there is much overlap is kidding themselves.
> >> >> > At this point I don't think anyone can > >> >> > say whether the AVR32 has legs and will be around in 5 years. It > >> >> > has > >> >> > been out for what, a year or so? > > >> >> Fortunately there are plenty of sockets around, and some will go > >> >> AVR32. > > >> > Is that the plan for the AVR32, to take *some* sockets? You know as > >> > well as I do that if the AVR32 does not get significant market > >> > penetration within a two years from now, it will be put on the back > >> > burner and eventually discontinued. Atmel has no reason to keep making > >> > a part that consumes significant resources and does not make > >> > significant profit. Look at what happened to Atmel programmable > >> > logic. When was the last time they added a new FPGA to the product > >> > line? How many FPSLICs have been designed into new sockets? > > > I see you ignored this comment. There are any number of "good ideas" > > that have totally failed in the market place. It is very possible > > that the ARM32 will be one of them. > > >> >> >> The AVR32 is decidedly better on DSP algorithms due to its > >> >> >> single cycle MAC and also it has faster access to SRAM. > >> >> >> Reading internal SRAM is a one clock cycle operation on the AVR32. > >> >> >> Bit banging will be one of the strengths of the UC3000. > > >> >> > Isn't reading internal SRAM a single cycle on *all* processors? I > >> >> > can't think of any that require wait states. In fact, most > >> >> > processors > >> >> > try to cram as much SRAM onto the chip as possible because it is so > >> >> > fast. Did you say what you meant to say? > > >> >> On the UC3000 family, loading from internal SRAM will take one clock > >> >> in the execution stage. > >> >> Using single cycle SRAM does not mean that the load instruction is 1 > >> >> clock. > > >> > Like I said, aren't all internal SRAMs in all processors single > >> > cycle??? > > >> Maybe so, but from a performance point of view, you are more > >> interested in how many cycles it takes to load from SRAM into a > >> register, and if this takes 1 clock cycle due to a 1 clock load > >> instruction, or 3 clock cycles due to a 3 clock load instruction > >> (from a 1 clock cycle SRAM), then you do see a performance differnence. > > > What processor only uses 3 clock instructions to access 1 clock > > memory? My understanding is that many processors not only use faster > > instructions to load, but can use memory in other instructions which > > allow single cycle back to back memory accesses. > > The simple three stage pipeline processors (and the CM3) normally use a few > clocks > in the execution stage to load data, but the uC3 family does not.
Ok, I have to assume that you don't have any examples. Regardless, this seems like a red herring in this discussion anyway.
> > Besides, no one feature ever makes or breaks a processor chip. There > > are literally dozens of distinguishing points between different > > processors and only marketing and salesmen try to narrow an engineer's > > focus to a small number of features. I care about the overall utility > > of a processor and one of the big selling points to me is the > > ubiquitousness of the ARM chips. Very soon that will include the CM3 > > devices which will take over the low end squeezing the ARM7 between > > the CM3 and the ARM9.
I stand by my analysis of the competitiveness of the CM3.