High-end AVR vs. low-end ARM?| page 2

Reply by ●November 8, 20082008-11-08

Jim Granville wrote:
> Ulf Samuelsson wrote:
> 
>>>> As for AVR32, in case you were thinking about that one, there is no
>>>> real reason I would know why to start with that device. Use a Cortex-
>>>> M3 device instead the upcoming standard.
>>
>>
>> Let's see,
>>
>> Where do I get the Cortex-M3 flash chip with
>>
>> * Lower power consumption than any existing Cortex-M3 chip
>> * Single 1,8V +/- 10% power-supply for CORE *AND* I/O?
>> * 5V VCC , desirable for motor control?
>> * debug support allowing you to read/write internal registers without 
>> stopping the MCU.
>> * High Speed USB
>> * Free Eclipse/GCC tool directly supported by the silicon vendor
>> * Sustained 33 DSP MIPS when doing vector sums
>>     for(sum=0; i = 0; i < n; i++) sum = sum + C[i] * X[i];
>> * Migration path to low cost versions supporting Linux.
>> * Same H/W tools as the AVR (JTAG-ICE Mk II & STK600)
>> * Trace capable emulator at below $600 (AVRONE)
> 
> How much flash, with the above combinations ?
> 
> -jg

You can compare Cortex-M3 to AVR32 UC3A and UC3B series, but not to 
AP7(hi-speed usb, mmu, linux) - it's a different class of devices.
We also don't compare Intel Core2Duo to AVR ;)

-- 
voices (at) zrgnyyvpenva (dot) pbz [ROT13]

Reply by steve ●November 9, 20082008-11-09

On Nov 7, 2:17=A0pm, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote:
> >>As for AVR32, in case you were thinking about that one, there is no
> >>real reason I would know why to start with that device. Use a Cortex-
> >>M3 device instead the upcoming standard.
>
> Let's see,
>
> Where do I get the Cortex-M3 flash chip with
>
> * Lower power consumption than any existing Cortex-M3 chip
> * Single 1,8V +/- 10% power-supply for CORE *AND* I/O?
> * 5V VCC , desirable for motor control?
> * debug support allowing you to read/write internal registers without
> stopping the MCU.
> * High Speed USB
> * Free Eclipse/GCC tool directly supported by the silicon vendor
> * Sustained 33 DSP MIPS when doing vector sums
> =A0 =A0 for(sum=3D0; i =3D 0; i < n; i++) sum =3D sum + C[i] * X[i];
> * Migration path to low cost versions supporting Linux.
> * Same H/W tools as the AVR (JTAG-ICE Mk II & STK600)
> * Trace capable emulator at below $600 (AVRONE)
>
> Googling does not give any clue...
>

googling doesn't give you a clue for 1.8V, 5V AVR32s either....

Reply by steve ●November 9, 20082008-11-09

On Nov 6, 4:42=A0pm, "Bresco" <bre...@mixmaster.org> wrote:
> In terms of pricing, how do high-end AVR's (Mega-128) compare to low-end =
ARM
> processors? The ARM's are much more powerfull and have large RAM memories=
 on
> them.
>
> Anyone ever compare them? I heard that ARM's are cheaper than AVR's these
> days. Is this true?

 If you need really cheap and your watching every penny then ARM's are
still higher price then low end AVR's. Cortex has low power similar to
AVR and MSP430's, running and standby, and operate down to 2V. ARM's
tend to come in bigger packages and require more external parts
(caps), in general. As a wild guess I would say 90% of High end AVR'
applications could switch to an ARM. There are some ultra low power
applications where AVR and MSP430 are still king and there is no ARM
substitute.

Reply by Ulf Samuelsson ●November 10, 20082008-11-10

"steve" <bungalow_steve@yahoo.com> skrev i meddelandet 
news:95bf218c-bc04-421b-bde3-4a857909ff31@u18g2000pro.googlegroups.com...
On Nov 7, 2:17 pm, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote:
> >>As for AVR32, in case you were thinking about that one, there is no
> >>real reason I would know why to start with that device. Use a Cortex-
> >>M3 device instead the upcoming standard.
>
>> Let's see,
>>
>> Where do I get the Cortex-M3 flash chip with
>>
>> * Lower power consumption than any existing Cortex-M3 chip
>> * Single 1,8V +/- 10% power-supply for CORE *AND* I/O?
>> * 5V VCC , desirable for motor control?
>> * debug support allowing you to read/write internal registers without
>> stopping the MCU.
>> * High Speed USB
>> * Free Eclipse/GCC tool directly supported by the silicon vendor
>> * Sustained 33 DSP MIPS when doing vector sums
>> for(sum=0; i = 0; i < n; i++) sum = sum + C[i] * X[i];
>> * Migration path to low cost versions supporting Linux.
>> * Same H/W tools as the AVR (JTAG-ICE Mk II & STK600)
>> * Trace capable emulator at below $600 (AVRONE)
>>
>> Googling does not give any clue...
>>

> googling doesn't give you a clue for 1.8V, 5V AVR32s either....

Well that proves that google doesn't know everything :-)

All things above mentioned in the offical UC3 presentation,
The average Joe won't see UC3L/UC3C/UC3A3 until beginning of next year.

The technology behind the 1.8V devices is already available in AT91SAM7L.
The SAM7L runs the flash down to 1,55V.

-- 
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB

Reply by steve ●November 11, 20082008-11-11

On Nov 10, 6:49=A0pm, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote:
> "steve" <bungalow_st...@yahoo.com> skrev i meddelandetnews:95bf218c-bc04-=
421b-bde3-4a857909ff31@u18g2000pro.googlegroups.com...
> On Nov 7, 2:17 pm, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote:
>
>
>
>
>
> > >>As for AVR32, in case you were thinking about that one, there is no
> > >>real reason I would know why to start with that device. Use a Cortex-
> > >>M3 device instead the upcoming standard.
>
> >> Let's see,
>
> >> Where do I get the Cortex-M3 flash chip with
>
> >> * Lower power consumption than any existing Cortex-M3 chip
> >> * Single 1,8V +/- 10% power-supply for CORE *AND* I/O?
> >> * 5V VCC , desirable for motor control?
> >> * debug support allowing you to read/write internal registers without
> >> stopping the MCU.
> >> * High Speed USB
> >> * Free Eclipse/GCC tool directly supported by the silicon vendor
> >> * Sustained 33 DSP MIPS when doing vector sums
> >> for(sum=3D0; i =3D 0; i < n; i++) sum =3D sum + C[i] * X[i];
> >> * Migration path to low cost versions supporting Linux.
> >> * Same H/W tools as the AVR (JTAG-ICE Mk II & STK600)
> >> * Trace capable emulator at below $600 (AVRONE)
>
> >> Googling does not give any clue...
>
> > googling doesn't give you a clue for 1.8V, 5V AVR32s either....
>
> Well that proves that google doesn't know everything :-)
>
> All things above mentioned in the offical UC3 presentation,
> The average Joe won't see UC3L/UC3C/UC3A3 until beginning of next year.
>
> The technology behind the 1.8V devices is already available in AT91SAM7L.
> The SAM7L runs the flash down to 1,55V.
>
> --
> Best Regards,
> Ulf Samuelsson
> This is intended to be my personal opinion which may,
> or may not be shared by my employer Atmel Nordic AB- Hide quoted text -
>
> - Show quoted text -

Ok, the 7L are nice, though wish they expand the family

I've noticed in the Atmel slides packages they say FIR filter is 11
times faster then on a CortexM3. That is hard to believe, not sure
why, Cortex is 2 cycle MAC, AVR32 is single cycle, maybe with the 2
wait states on Cortex FLASH they came up with that number?


* Sustained 33 DSP MIPS when doing vector sums
    for(sum=3D0; i =3D 0; i < n; i++) sum =3D sum + C[i] * X[i];

 the 33 MIPS is at what clock speed?

Reply by Ulf Samuelsson ●November 12, 20082008-11-12

"steve" <bungalow_steve@yahoo.com> skrev i meddelandet 
news:6311ff24-5d99-4f61-a440-57098c7bedc6@a3g2000prm.googlegroups.com...
On Nov 10, 6:49 pm, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote:
> "steve" <bungalow_st...@yahoo.com> skrev i 
> meddelandetnews:95bf218c-bc04-421b-bde3-4a857909ff31@u18g2000pro.googlegroups.com...
> On Nov 7, 2:17 pm, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote:
>
>
>
>
>
> > >>As for AVR32, in case you were thinking about that one, there is no
> > >>real reason I would know why to start with that device. Use a Cortex-
> > >>M3 device instead the upcoming standard.
>
> >> Let's see,
>
> >> Where do I get the Cortex-M3 flash chip with
>
> >> * Lower power consumption than any existing Cortex-M3 chip
> >> * Single 1,8V +/- 10% power-supply for CORE *AND* I/O?
> >> * 5V VCC , desirable for motor control?
> >> * debug support allowing you to read/write internal registers without
> >> stopping the MCU.
> >> * High Speed USB
> >> * Free Eclipse/GCC tool directly supported by the silicon vendor
> >> * Sustained 33 DSP MIPS when doing vector sums
> >> for(sum=0; i = 0; i < n; i++) sum = sum + C[i] * X[i];
> >> * Migration path to low cost versions supporting Linux.
> >> * Same H/W tools as the AVR (JTAG-ICE Mk II & STK600)
> >> * Trace capable emulator at below $600 (AVRONE)
>
> >> Googling does not give any clue...
>
> > googling doesn't give you a clue for 1.8V, 5V AVR32s either....
>
> Well that proves that google doesn't know everything :-)
>
> All things above mentioned in the offical UC3 presentation,
> The average Joe won't see UC3L/UC3C/UC3A3 until beginning of next year.
>
> The technology behind the 1.8V devices is already available in AT91SAM7L.
> The SAM7L runs the flash down to 1,55V.
>
> --
> Best Regards,
> Ulf Samuelsson
> This is intended to be my personal opinion which may,
> or may not be shared by my employer Atmel Nordic AB- Hide quoted text -
>
> - Show quoted text -

Ok, the 7L are nice, though wish they expand the family.

==> There is a new family in the works with more SRAM.

I've noticed in the Atmel slides packages they say FIR filter is 11
times faster then on a CortexM3. That is hard to believe, not sure
why, Cortex is 2 cycle MAC, AVR32 is single cycle, maybe with the 2
wait states on Cortex FLASH they came up with that number?

==> Not only that.
        I am not sure about 11 times though.

        You win by having
        * 1 clock cycle load instructions.
            Cortex-M3 implementations are at least 2, maybe more
            If running from flash, then there will be plenty of clocks.
            The AVR32 with the AHB will probably use two clocks
            to read from the flash at 66 MHz.
            Furthermore, this is non blocking in some cases
            since the core can read instructions from the intruction
            queue instead of from the flash.
        * The ability to use the upper part of the 32 bit register
            for MAC instructions, so you load TWO samples/coefficients
            in a single clock cycle.

            The unroled loop then becomes:

            LOAD            1 clock
            LOAD            1 clock
            MAC              1 clock
            MAC              1 clock

        * The hidden Accumulator
            The register file on a low end risc processor normally
            only have two read ports.
            You cannot do A = A + C*X in a single clock
            because you need to read A,C and X in the same clock cycle.

            The AVR32 has a "hidden" accumulator (patented) which
            allows you to use the two read ports for C and X

            After the last MAC, you write the accumulator back to the
            register file, adding one clock latency

        * The AVR32 runs with 1 waitstate, while the STM32 runs with 2.

* Sustained 33 DSP MIPS when doing vector sums
    for(sum=0; i = 0; i < n; i++) sum = sum + C[i] * X[i];

* The last feature is instructions which handle saturation
    the way a DSP should, and this has to be handled
    manually in other RISCs like CM3

 the 33 MIPS is at what clock speed?

==> 66 MHz (with a 100% unrolled loop)
    I.E:        n = 6 =>

            LOAD            1 clock
            LOAD            1 clock
            MAC              1 clock
            MAC              1 clock
            LOAD            1 clock
            LOAD            1 clock
            MAC              1 clock
            MAC              1 clock
            LOAD            1 clock
            LOAD            1 clock
            MAC              1 clock
            MAC              1 clock
            ; Hidden writeback: 1 clock

-- 
-- 
Best Regards,
Ulf Samuelsson
ulf@a-t-m-e-l.com
This message is intended to be my own personal view and it
may or may not be shared by my employer Atmel Nordic AB

Reply by Wilco Dijkstra ●November 12, 20082008-11-12

"Ulf Samuelsson" <ulf@a-t-m-e-l.com> wrote in message news:gfe7ts$cu6$1@aioe.org...
> "steve" <bungalow_steve@yahoo.com> skrev i meddelandet

> I've noticed in the Atmel slides packages they say FIR filter is 11
> times faster then on a CortexM3. That is hard to believe, not sure
> why, Cortex is 2 cycle MAC, AVR32 is single cycle, maybe with the 2
> wait states on Cortex FLASH they came up with that number?
>
> ==> Not only that.
>        I am not sure about 11 times though.

Indeed, people are still spreading lies about Cortex-M3 as usual.

>        You win by having
>        * 1 clock cycle load instructions.
>            Cortex-M3 implementations are at least 2, maybe more

Cortex-M3 loads are 2 cycles unless the next instruction is a load or
store, in which case it is 1 cycle. So a sequence of N loads takes
N+1 cycles.

>        * The ability to use the upper part of the 32 bit register
>            for MAC instructions, so you load TWO samples/coefficients
>            in a single clock cycle.
>
>            The unroled loop then becomes:
>
>            LOAD            1 clock
>            LOAD            1 clock
>            MAC              1 clock
>            MAC              1 clock

This is the same trick as the ARM9E introduced a long time ago.

>        * The AVR32 runs with 1 waitstate, while the STM32 runs with 2.

The Luminary Cortex-M3 cores run with 0 wait states. But even with a
wait state you don't necessary see a slowdown if the fetch width is at
least 64 bits (3-4 Thumb-2 instructions). Waitstates primarily slowdown
branches.

> * Sustained 33 DSP MIPS when doing vector sums
>    for(sum=0; i = 0; i < n; i++) sum = sum + C[i] * X[i];
>
> * The last feature is instructions which handle saturation
>    the way a DSP should, and this has to be handled
>    manually in other RISCs like CM3

Actually Cortex-M3 has a saturate instruction.

> the 33 MIPS is at what clock speed?
>
> ==> 66 MHz (with a 100% unrolled loop)
>    I.E:        n = 6 =>
>
>            LOAD            1 clock
>            LOAD            1 clock
>            MAC              1 clock
>            MAC              1 clock
>            LOAD            1 clock
>            LOAD            1 clock
>            MAC              1 clock
>            MAC              1 clock
>            LOAD            1 clock
>            LOAD            1 clock
>            MAC              1 clock
>            MAC              1 clock
>            ; Hidden writeback: 1 clock

On Cortex-M3 this would take the following sequence:

LDRH r2, [r0,#0]
LDRH r3, [r0,#2]
LDRH r4, [r0,#4]
LDRH r5, [r1,#0]
LDRH r6, [r1,#2]
LDRH r7, [r1,#4]
MLA r8,r2,r5,r8
MLA r8,r3,r6,r8
MLA r8,r4,r7,r8

The LDRHs take 7 cycles (6 + 1), the MLAs take 6 cycles, or in total 26 cycles.
That is exactly twice as slow as AVR32 on the above code. So the claim of 11
times slower is a total lie. Those Atmel marketeers should be ashamed of
themselves.

Wilco

Reply by steve ●November 12, 20082008-11-12

On Nov 12, 6:15=A0am, "Wilco Dijkstra"
<Wilco.removethisDijks...@ntlworld.com> wrote:
> "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote in messagenews:gfe7ts$cu6$1@a=
ioe.org...
> > "steve" <bungalow_st...@yahoo.com> skrev i meddelandet
> > I've noticed in the Atmel slides packages they say FIR filter is 11
> > times faster then on a CortexM3. That is hard to believe, not sure
> > why, Cortex is 2 cycle MAC, AVR32 is single cycle, maybe with the 2
> > wait states on Cortex FLASH they came up with that number?
>
> > =3D=3D> Not only that.
> > =A0 =A0 =A0 =A0I am not sure about 11 times though.
>
> Indeed, people are still spreading lies about Cortex-M3 as usual.
>
> > =A0 =A0 =A0 =A0You win by having
> > =A0 =A0 =A0 =A0* 1 clock cycle load instructions.
> > =A0 =A0 =A0 =A0 =A0 =A0Cortex-M3 implementations are at least 2, maybe =
more
>
> Cortex-M3 loads are 2 cycles unless the next instruction is a load or
> store, in which case it is 1 cycle. So a sequence of N loads takes
> N+1 cycles.
>
> > =A0 =A0 =A0 =A0* The ability to use the upper part of the 32 bit regist=
er
> > =A0 =A0 =A0 =A0 =A0 =A0for MAC instructions, so you load TWO samples/co=
efficients
> > =A0 =A0 =A0 =A0 =A0 =A0in a single clock cycle.
>
> > =A0 =A0 =A0 =A0 =A0 =A0The unroled loop then becomes:
>
> > =A0 =A0 =A0 =A0 =A0 =A0LOAD =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0LOAD =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0MAC =A0 =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0MAC =A0 =A0 =A0 =A0 =A0 =A0 =A01 clock
>
> This is the same trick as the ARM9E introduced a long time ago.
>
> > =A0 =A0 =A0 =A0* The AVR32 runs with 1 waitstate, while the STM32 runs =
with 2.
>
> The Luminary Cortex-M3 cores run with 0 wait states. But even with a
> wait state you don't necessary see a slowdown if the fetch width is at
> least 64 bits (3-4 Thumb-2 instructions). Waitstates primarily slowdown
> branches.
>
> > * Sustained 33 DSP MIPS when doing vector sums
> > =A0 =A0for(sum=3D0; i =3D 0; i < n; i++) sum =3D sum + C[i] * X[i];
>
> > * The last feature is instructions which handle saturation
> > =A0 =A0the way a DSP should, and this has to be handled
> > =A0 =A0manually in other RISCs like CM3
>
> Actually Cortex-M3 has a saturate instruction.
>
>
>
>
>
> > the 33 MIPS is at what clock speed?
>
> > =3D=3D> 66 MHz (with a 100% unrolled loop)
> > =A0 =A0I.E: =A0 =A0 =A0 =A0n =3D 6 =3D>
>
> > =A0 =A0 =A0 =A0 =A0 =A0LOAD =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0LOAD =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0MAC =A0 =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0MAC =A0 =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0LOAD =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0LOAD =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0MAC =A0 =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0MAC =A0 =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0LOAD =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0LOAD =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0MAC =A0 =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0MAC =A0 =A0 =A0 =A0 =A0 =A0 =A01 clock
> > =A0 =A0 =A0 =A0 =A0 =A0; Hidden writeback: 1 clock
>
> On Cortex-M3 this would take the following sequence:
>
> LDRH r2, [r0,#0]
> LDRH r3, [r0,#2]
> LDRH r4, [r0,#4]
> LDRH r5, [r1,#0]
> LDRH r6, [r1,#2]
> LDRH r7, [r1,#4]
> MLA r8,r2,r5,r8
> MLA r8,r3,r6,r8
> MLA r8,r4,r7,r8
>
> The LDRHs take 7 cycles (6 + 1), the MLAs take 6 cycles, or in total 26 c=
ycles.
> That is exactly twice as slow as AVR32 on the above code. So the claim of=
 11
> times slower is a total lie. Those Atmel marketeers should be ashamed of
> themselves.
>
> Wilco- Hide quoted text -
>
> - Show quoted text -

Ok, I took the atmel published FIR filter cycle count and the STM FIR
filter cycle count both from their websites (using their optimized in
house DSP packages)

http://www.atmel.com/dyn/resources/prod_documents/doc32076.pdf

http://www.st.com/stonline/products/literature/um/14988.pdf

of course both don't give data on the same size FIR filter, so I have
to normalize...

For Atmel, a 64 point, 24 tap,41 outputs FIR takes 2,439 cycles, which
is 41*24 =3D 984 MACs, for a cycle/MAC ratio of 2.478 cycles/MAC

For STM Cortex at full speed 2 wait states, 63 point 32 tap, 32 output
FIR takes 3929 cycles, which is 32*32 =3D 1024 MACs
for a ratio of  3.83 cycles/MAC (2 wait states)

a difference of 1.54X

at zero wait states ( below 24Mhz) STM reports 3478 cycles

so 3.396 cycles/Mac (0 wait states), a difference of 1.37 times

Reply by Ulf Samuelsson ●November 13, 20082008-11-13

>> ==> 66 MHz (with a 100% unrolled loop)
>>    I.E:        n = 6 =>
>>
>>            LOAD            1 clock
>>            LOAD            1 clock
>>            MAC              1 clock
>>            MAC              1 clock
>>            LOAD            1 clock
>>            LOAD            1 clock
>>            MAC              1 clock
>>            MAC              1 clock
>>            LOAD            1 clock
>>            LOAD            1 clock
>>            MAC              1 clock
>>            MAC              1 clock
>>            ; Hidden writeback: 1 clock
>
> On Cortex-M3 this would take the following sequence:
>
> LDRH r2, [r0,#0]
> LDRH r3, [r0,#2]
> LDRH r4, [r0,#4]
> LDRH r5, [r1,#0]
> LDRH r6, [r1,#2]
> LDRH r7, [r1,#4]
> MLA r8,r2,r5,r8
> MLA r8,r3,r6,r8
> MLA r8,r4,r7,r8
>
> The LDRHs take 7 cycles (6 + 1), the MLAs take 6 cycles, or in total 26 
> cycles.
> That is exactly twice as slow as AVR32 on the above code. So the claim of 
> 11
> times slower is a total lie. Those Atmel marketeers should be ashamed of
> themselves.
>


And you are comparing 3 MACs with 6 MACs.

6 MACs from memory using AVR32 = 13 clocks.
6 MACs from memory using CM3 = 52 clocks or 4 x difference.


> Wilco
-- 
Best Regards,
Ulf Samuelsson
ulf@a-t-m-e-l.com
This message is intended to be my own personal view and it
may or may not be shared by my employer Atmel Nordic AB

Reply by Wilco Dijkstra ●November 13, 20082008-11-13

"Ulf Samuelsson" <ulf@a-t-m-e-l.com> wrote in message news:gfhl33$h59$1@aioe.org...
>>> ==> 66 MHz (with a 100% unrolled loop)
>>>    I.E:        n = 6 =>
>>>
>>>            LOAD            1 clock
>>>            LOAD            1 clock
>>>            MAC              1 clock
>>>            MAC              1 clock
>>>            LOAD            1 clock
>>>            LOAD            1 clock
>>>            MAC              1 clock
>>>            MAC              1 clock
>>>            LOAD            1 clock
>>>            LOAD            1 clock
>>>            MAC              1 clock
>>>            MAC              1 clock
>>>            ; Hidden writeback: 1 clock
>>
>> On Cortex-M3 this would take the following sequence:
>>
>> LDRH r2, [r0,#0]
>> LDRH r3, [r0,#2]
>> LDRH r4, [r0,#4]
>> LDRH r5, [r1,#0]
>> LDRH r6, [r1,#2]
>> LDRH r7, [r1,#4]
>> MLA r8,r2,r5,r8
>> MLA r8,r3,r6,r8
>> MLA r8,r4,r7,r8
>>
>> The LDRHs take 7 cycles (6 + 1), the MLAs take 6 cycles, or in total 26 cycles.
>> That is exactly twice as slow as AVR32 on the above code. So the claim of 11
>> times slower is a total lie. Those Atmel marketeers should be ashamed of
>> themselves.
>>
>
>
> And you are comparing 3 MACs with 6 MACs.
>
> 6 MACs from memory using AVR32 = 13 clocks.
> 6 MACs from memory using CM3 = 52 clocks or 4 x difference.

No, read again. It's 13 cycles to do 3 MACs, so 26 to do 6 MACS.

Wilco

Previous 123 Next

High-end AVR vs. low-end ARM?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group