ARM Cortex M3 - Who's utilizing it?| page 3

Reply by Jim Granville ●February 18, 20062006-02-18

Wilco Dijkstra wrote:
> A quick scan of the AVR8 CPUs revealed that the ATtiny2313/V
> seems to be the lowest power AVR at 0.41 mW/Mhz.
> Cortex-M3 uses 3.5 times less power...

  That is the classic error of comparing a new (future) core-only figure,
with an existing full-system product. A grapes to pomegranate comparison.

  There is a _lot_ more than just the core, that determines the
system Icc values.
  ASIC core vendors tend to over-look that, as that's not what they
sell.

  Even the IC vendors nudge the goal posts, by specing their
uC data with external square wave clocks. Nice way to ignore the
XTAL Oscillator Amplifier & Buffer current effects...

> Additionally a 32-bit CPU can do a lot more work per cycle, so they
> run at a lower frequency or sleep for longer. So a higher performance
> CPU that uses more power may actually use less *energy* to do a
> specific task.

  Perhaps if we compare a raw core, with a raw core ?

So, let's see how the Cortex compares, with the new ARM Async Core ?

Cortex M3                                         = appx 90uW/MHz
ARM996HS [New Clockless, Async technology core ]  = 45uW/MHz

[These numbers come from the same company, so should be free of
inter-company-skew effects.... ?]

Hmmm - wonder how those (two?) Cortex licensees feel about that ?

A spec of Energy per task is a very good one, and overdue on uC
designs.

-jg

Reply by Wilco Dijkstra ●February 18, 20062006-02-18

"Jim Granville" <no.spam@designtools.co.nz> wrote in message 
news:43f7a49d$1@clear.net.nz...
> Wilco Dijkstra wrote:
>> A quick scan of the AVR8 CPUs revealed that the ATtiny2313/V
>> seems to be the lowest power AVR at 0.41 mW/Mhz.
>> Cortex-M3 uses 3.5 times less power...
>
>  That is the classic error of comparing a new (future) core-only figure,
> with an existing full-system product. A grapes to pomegranate comparison.

Nope. The ATtiny2313/V number above is based on simulation like the
M3 figure. The Cortex-M3 figure includes the standard peripherals that are
part of the core.

In most cases power consumption is measured while running a benchmark
such as Dhrystone, so no peripherals are used. With a process tuned for
power the leakage current of the peripherals would be minimal.

>  There is a _lot_ more than just the core, that determines the
> system Icc values.
>  ASIC core vendors tend to over-look that, as that's not what they
> sell.

Peripherals only consume power if you enable and use them. But even
then most don't use much power, eg. a UART running at 100K baud still
uses a fraction of a core at 10MHz.

>> Additionally a 32-bit CPU can do a lot more work per cycle, so they
>> run at a lower frequency or sleep for longer. So a higher performance
>> CPU that uses more power may actually use less *energy* to do a
>> specific task.
>
>  Perhaps if we compare a raw core, with a raw core ?
>
> So, let's see how the Cortex compares, with the new ARM Async Core ?
>
> Cortex M3                                         = appx 90uW/MHz
> ARM996HS [New Clockless, Async technology core ]  = 45uW/MHz
>
> [These numbers come from the same company, so should be free of
> inter-company-skew effects.... ?]

No. You forgot to take into account the process geometry. The Cortex-M3
number is for 180nm, the 996HS for 130nm. According to datapoints for
the similar ARM946E-S, power consumption improves by a factor of
3 to 3.5 on a 180nm process. So Cortex-M3 would still win by a good
margin. Maybe we will get a Cortex-M3HS too?

> Hmmm - wonder how those (two?) Cortex licensees feel about that ?

There is 4 of them btw. I'm sure they are still happy - there are lots of
reasons for using the M3.

> A spec of Energy per task is a very good one, and overdue on uC
> designs.

Indeed.

Wilco

Reply by Jim Granville ●February 18, 20062006-02-18

Wilco Dijkstra wrote:

 >>> Additionally a 32-bit CPU can do a lot more work per cycle, so they
 >>> run at a lower frequency or sleep for longer. So a higher performance
 >>> CPU that uses more power may actually use less *energy* to do a
 >>> specific task.
 >>
 >>
 >> Perhaps if we compare a raw core, with a raw core ?
 >>
 >> So, let's see how the Cortex compares, with the new ARM Async Core ?
 >>
 >> Cortex M3                                         = appx 90uW/MHz
 >> ARM996HS [New Clockless, Async technology core ]  = 45uW/MHz
 >>
 >> [These numbers come from the same company, so should be free of
 >> inter-company-skew effects.... ?]
 >
 >
 >
 > No. You forgot to take into account the process geometry.


..but no more than your Tiny2313 <-> Cortex comparison

 > The Cortex-M3 number is for 180nm, the 996HS for 130nm. According to 
datapoints for
 > the similar ARM946E-S, power consumption improves by a factor of
 > 3 to 3.5 on a 180nm process.


mA/MHz can improve, but the Static Icc effects are starting to bite at 
those gemoetries, so often the focus has to shift from scaled speed, to
clawing back some of the precious lost static uA...

 > So Cortex-M3 would still win by a good
 > margin. Maybe we will get a Cortex-M3HS too?


Yes, a Cortex-M3HS would be an interesting device.
Especially with the right Flash speed, and peripheral mix..

( tho it might confuse the market, with two M3 variants... )

-jg

Reply by Ulf Samuelsson ●February 19, 20062006-02-19

Wilco Dijkstra wrote:
>> Don't think any one plans to put a Cortex-A8 in a smart card
>> which is one very obvious application for the AVR32...
>
> I don't think anyone sane is going to put the AVR32 there. As I said,
> it is an ARM11 class core, so totally unsuitable for smartcards
> (it doesn't even have a rotate instruction which is essential for
> cryptography). Maybe there will be a smaller low power version
> eventually but that wasn't mentioned.
>
You have ARM7s in the current high end smartcards.
I believe the AVR32 has the JAVA funcitonality precisely for this
application.


-- 
Best Regards,
Ulf Samuelsson
ulf@a-t-m-e-l.com
This message is intended to be my own personal view and it
may or may not be shared by my employer Atmel Nordic AB

Reply by Ulf Samuelsson ●February 19, 20062006-02-19

Jim Granville wrote:
> Wilco Dijkstra wrote:
>> A quick scan of the AVR8 CPUs revealed that the ATtiny2313/V
>> seems to be the lowest power AVR at 0.41 mW/Mhz.
>> Cortex-M3 uses 3.5 times less power...
>
>   That is the classic error of comparing a new (future) core-only
> figure, with an existing full-system product. A grapes to pomegranate
> comparison.
>

Well stated!

>
> -jg

-- 
Best Regards,
Ulf Samuelsson
ulf@a-t-m-e-l.com
This message is intended to be my own personal view and it
may or may not be shared by my employer Atmel Nordic AB

Reply by Wilco Dijkstra ●February 19, 20062006-02-19

"Jim Granville" <no.spam@designtools.co.nz> wrote in message 
news:43f7d549@clear.net.nz...
> Wilco Dijkstra wrote:
>
> > No. You forgot to take into account the process geometry.
>
> ..but no more than your Tiny2313 <-> Cortex comparison

The datasheets didn't give the process, however this page gives some
hints: http://www.atmel.com/dyn/products/ip_param_table.asp?family_id=615
All 180nm libraries use 1.8V like the Tiny2313, so it is likely 180nm.

> > The Cortex-M3 number is for 180nm, the 996HS for 130nm. According to
> datapoints for
> > the similar ARM946E-S, power consumption improves by a factor of
> > 3 to 3.5 on a 180nm process.
>
> mA/MHz can improve, but the Static Icc effects are starting to bite at 
> those gemoetries, so often the focus has to shift from scaled speed, to
> clawing back some of the precious lost static uA...

180nm isn't nearly as bad as 90nm... But it matters mostly when sleeping,
that is why there are various sleep states that power down large parts of 
the
chip (at the cost of slower wakeup). Voltage scaling may be affected too,
it is better to run at a slightly higher frequency than running at a lower
voltage/frequency for longer (and thus use more static current).

> > So Cortex-M3 would still win by a good
> > margin. Maybe we will get a Cortex-M3HS too?
>
> Yes, a Cortex-M3HS would be an interesting device.
> Especially with the right Flash speed, and peripheral mix..
>
> ( tho it might confuse the market, with two M3 variants... )

True, it might be possible to take advantage of asynchronous logic,
such as optimizing for the average rather than worst case (eg. use ripple
carry adders instead of lookahead). This would allow for even smaller
sizes without a large performance penalty.

Wilco

Reply by Jim Granville ●February 19, 20062006-02-19

Wilco Dijkstra wrote:
> "Jim Granville" <no.spam@designtools.co.nz> wrote in message 
> news:43f7d549@clear.net.nz...
> 
>>Wilco Dijkstra wrote:
>>
>>
>>>No. You forgot to take into account the process geometry.
>>
>>..but no more than your Tiny2313 <-> Cortex comparison
> 
> 
> The datasheets didn't give the process, however this page gives some
> hints: http://www.atmel.com/dyn/products/ip_param_table.asp?family_id=615
> All 180nm libraries use 1.8V like the Tiny2313, so it is likely 180nm.

The Tiny2313 is a 1.8-5.5V process, so I very much doubt it is 180nm, 
more likely 0,35um. Ulf will know ? :)

-jg

Reply by Chris Hills ●February 19, 20062006-02-19

In article <dt6s9g$ao4$1@nntp.aioe.org>, Ulf Samuelsson <ulf@a-t-m-e-
l.com> writes
>Wilco Dijkstra wrote:
>> "Ulf Samuelsson" <ulf@a-t-m-e-l.com> wrote in message
>> news:dt5q6v$g3$1@nntp.aioe.org...
>>> D. wrote:
>>> T.I has licensed the Cortex-A8, but I think this is for Nokia and
>>> alike... The AVR32 seems to run at higher frequency and has an MMU
>>> so they may be focusing on different markets.
>>
>> Yes, the AVR32 is definitely not in the same market as the M3. It is
>> an ARM11 + Jazelle + Thumb-2 clone, but because it is late (MIPS did
>> it
>> a few years ago), it now will have to compete with Cortex-A8. Ouch...
>
>Don't think any one plans to put a Cortex-A8 in a smart card

Given the nature of the smart card business (paranoid) you are hardly
likely to know yet.. 

Besides I think the cortex is aimed at a different market.


-- 
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills  Staffs  England     /\/\/\/\/
/\/\/ chris@phaedsys.org      www.phaedsys.org \/\/\
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

Reply by Wilco Dijkstra ●February 19, 20062006-02-19

"Jim Granville" <no.spam@designtools.co.nz> wrote in message 
news:43f79040$1@clear.net.nz...
> Wilco Dijkstra wrote:
>> Interestingly it turns out Atmel's marketing department has been
>> working overtime - their benchmarking figures are obviously bogus.
>>
>> They chose to compare against the i.MX21/i.MX31 numbers using
>> GCC (not the fastest compiler around by a large margin) and present
>> them as official ARM926 and ARM1136 numbers. For the codesize
>> results they chose EEMBC figures, however the EEMBC codesize
>> figures optimized for performance are totally meaningless.
>
>  Are you surprised ?

Yes, because it's quite brazen and they don't even try to hide it. Do they
really think anyone would take them seriously with such wild claims?

The 3x speedup over ARM is complete nonsense. The benchmarks
in question are floating point, and it is hardly surprising a CPU with
an FPU outperforms one that uses emulation. With an FPU the ARM
part becomes 4x faster. Where is that revolutionary performance lead
now?

The documents don't mention floating point anywhere (no FP instructions
either, only mention of an optional FPU in the user guide), so it looks
like they are misleading on purpose.

> All marketing departments are desperate to make their offerings
> look good, so they choose their leading-edge, against the others
> trailing edge, and then are selective as well.

They must be very deperate then :-)

>  I use a general nudge factor of 2:1 in filtering market droid fluff.
>
>  If they cannot claim a difference of more than one generation in
> performance, then it is not revolutionary, and probably merely
> comparable with the 'other guys' next release anyway...

Almost everything has been done before, not much chance for skipping
a generation. You would need to build a 1+ Ghz 2-way out-of-order
chip, and compete head on with PowerPC / x86 Geode.

Even the Itanium didn't turn out to be much of a revolution...

Wilco

Reply by ●February 20, 20062006-02-20

"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:

> Since Thumb-2 is a reencoding of ARM instructions, any existing ARM
> assembler can be assembled to Thumb-2 with minimal effort.

Almost, but not quite.  As well as a number of additions there are
a few omissions.

-- Jim Garside