Any ARMs with hardware divide?| page 4

Reply by Wilco Dijkstra ●May 11, 20052005-05-11

"Jim Granville" <no.spam@designtools.co.nz> wrote in message
news:427ff711$1@clear.net.nz...
> Wilco Dijkstra wrote:

> Yes, the ARM info is sparse, and poorly detailed, but what they have
> published shows Thumb2 to have LOWER peformance than ARM, but better
> code density.

Those are ARM1156T2-S benchmarks - not M3. The first Thumb-2 compiler
indeed generates code that is almost 1% larger than Thumb (still 34% smaller
than ARM!). The difference in performance is less than 3% on the ARM1156
(the first Thumb-2 CPU). So it is pretty close to the marketing statement
"ARM performance at Thumb codesize". The next compiler release will
without a doubt improve upon this and close the gap if not bridge it.

> > So moving *within* a Cortex family is generally trivial - you'll get
> > full binary compatibility. Moving *between* Cortex families may
> > require some porting and care to get full binary compatibility.
>
> These verbal gymnastics aptly demonstrate my point that calling M3
> something clearly different would have helped. When you have to
> underline the difference between 'within', and 'between', then
> perhaps a clearer name scheme would have been smarter.

I'd say Cortex-M and Cortex-R are clearly different names. They are
Cortex because they all support the same base instruction set (Thumb-2).

> and this seems to be the crux of problem. ARM seem to think they can
> replace the 8051/8 bit sector with this new variant. Instead, they have
> lost focus on what attracts users to ARM ( see Ulf's comments ).
> Atmel, Philips et al _already_ have sub $3 offerings, so there is
> substantial overlap into the 8/16 bit arena now. And this with an
> ARM/Thumb offering.

Yes, these are nice parts, but they don't compete with many of the cheap
MCUs like the 8051. The M3 can compete much better and maybe get down
to the $1 price range.

>   Mostly, the uC selection decisions I see made, hinge on Peripherals &
> FLASH/RAM, NOT the core itself. As Ulf says, they choose ARM's
> _because_ they are binary [opcode] compatible.

Yes that is true.

Anyone moving to ARM from the 8051 or similar simply won't care
whether the M3 supports the ARM instruction set or not as long as it
doesn't make porting harder. The resulting code is of course binary
compatible with any other Cortex CPU as I explained.

>   Philips seem to have a HW solution that simply and effectively
> reduces the ARM/Thumb step effect. Thus any "new core" benchmarks that
> exlude this solution, lack credibility.

There are no benchmarks on narrow flash for the M3 AFAIK. When running
Thumb-2 on a wide flash it will run faster than ARM because of its smaller
codesize. If the performance penalty of running from flash is 15% for ARM,
it would be 10% for Thumb-2. If we use the current figure of Thumb-2 being
3% slower than ARM using perfect memory, it would be 2% faster on flash.

> > I'd expect tools to automatically detect incompatibilities:
...
> Key words here are 'expect' and 'could'. We are talking about existing,
> proven tools and in use right now, not horizonware.

ARM's tools have had this feature for over 5 years now (since ADS): any potential
incompatibilities are immediately fed back to the user. It is not incompatibilities
themselves that cause the trouble, it is the wasted hours due to trivial mistakes that
aren't spotted by tools that are the real issue. Loading a big endian image on a CPU
configured for little endian is something I've done many times, but it never took me
more than a second to correct the mistake as the debugger simply refused to run the
image...

> > Given that M3 outperforms the good old ARM7tdmi by such a large
> > margin on all aspects and Cortex has Thumb-2 written all over it, what do
> > you think may quietly get "de-emphasised"? :-)
>
> That's easy : The lack of binary [opcpode] compatibility.

Or rather your perceived lack thereof. I don't understand how the lack of ARM
instruction set support can be crucial while differences in peripherals are
somehow excluded from binary compatibility issues... In the real world both
stop you from running the same binary on different cores.

>   I simply don't see the 'such a large margin on all aspects' in ARMs
> published information at all ?

You're looking at the wrong information. On an ARM7tdmi with perfect
memory, Thumb gives about 0.74 MIPS/Mhz, ARM does 0.9. The M3 gives
1.2 - about as fast as ARM code running on an ARM9. That's about 60%
performance improvement over the 7tdmi using Thumb (at Thumb codesize)
or 30% when using ARM (with a 35% codesize gain).

Then there is the power consumption and die size which are less than half that
of the ARM7tdmi, the much better interrupt latency and multiply/division
performance, unaligned access, simplified OS model etc.

>   Their example claim of a system Size saving of a (mere) 9%, also
> avoids any comments on Speed. Hmmmm... ?

You mean the gatecount here? The saving over ARM7tdmi with the same set
of peripherals is about 37K gates (70K - 33K). Assuming a gate is equivalent
to 16 bits of flash (probably too conservative), that is an extra 74KBytes of
flash for free. You'd need 820KBytes of flash before this becomes a mere 9%
saving, and that is definitely not a low-end MCU. You could build an M3 with
1K SRAM and 16KBytes of flash and _still_ be smaller than a bare ARM7tdmi!

>   To me, Thumb2 is a sensible, middle ground between ARM and Thumb,
> ( fixes some of the older core's shortcommings ) but the removal of ARM
> binary compatibility on the M3, and apparent pitch into a space users
> are leaving void, is poorly researched.

Thumb-2 is not "middle" ground - it combines the best features of ARM with
the best features of Thumb, effectively superceding both. Why do you think
Cortex is based around Thumb-2?

>   Time will show who is right :)

Sure - I bet there are many people working hard to try prove you wrong :-)

Wilco

Reply by Jim Granville ●May 11, 20052005-05-11

Wilco Dijkstra wrote:
> "Jim Granville" <no.spam@designtools.co.nz> wrote in message
> news:427ff711$1@clear.net.nz...
> 
>>Wilco Dijkstra wrote:
> 
> 
>>Yes, the ARM info is sparse, and poorly detailed, but what they have
>>published shows Thumb2 to have LOWER peformance than ARM, but better
>>code density.
> 
> 
> Those are ARM1156T2-S benchmarks - not M3. The first Thumb-2 compiler
> indeed generates code that is almost 1% larger than Thumb (still 34% smaller
> than ARM!). The difference in performance is less than 3% on the ARM1156
> (the first Thumb-2 CPU). So it is pretty close to the marketing statement
> "ARM performance at Thumb codesize". The next compiler release will
> without a doubt improve upon this and close the gap if not bridge it.

?! - but the M3 is Thumb-2, and you have just confirmed "not quite ARM
performance yet..."

<snip>
>>  Mostly, the uC selection decisions I see made, hinge on Peripherals &
>>FLASH/RAM, NOT the core itself. As Ulf says, they choose ARM's
>>_because_ they are binary [opcode] compatible.
> 
> 
> Yes that is true.
> 
> Anyone moving to ARM from the 8051 or similar simply won't care
> whether the M3 supports the ARM instruction set or not as long as it
> doesn't make porting harder.

Not the users I talk with.
Binary compatible is near the top of their lists, _especially_
80C51 users.

With Cortex-M, as Ulf says, they may as well also look at the raft of 
other  'new core' alternatives. Like CyanTech, MAXQ, & the many new 
Flash DSP's .....
  Gamble: Choose which ones will not hit critical mass, and survive only
one generation.

 > The resulting code is of course binary
> compatible with any other Cortex CPU as I explained.

Well, we'll agree to differ on our definition of Binary compatible.

Could one write code that ran fine on a Cortex-R, but choked a Cortex-M ?

I call that NOT binary [opcode] compatible.

Other users are free to apply their own definitions.

<snip>
>>>Given that M3 outperforms the good old ARM7tdmi by such a large
>>>margin on all aspects and Cortex has Thumb-2 written all over it, what do
>>>you think may quietly get "de-emphasised"? :-)
>>
>>That's easy : The lack of binary [opcode] compatibility.
> 
> 
> Or rather your perceived lack thereof. I don't understand how the lack of ARM
> instruction set support can be crucial while differences in peripherals are
> somehow excluded from binary compatibility issues... In the real world both
> stop you from running the same binary on different cores.

To help you with that distinction, I stated binary [opcode] compatibility.

80C51 designers are fully versed in peripheral porting, but they
also expect [even demand?] to have one stable/proven/mature tool chain.

> 
>>  I simply don't see the 'such a large margin on all aspects' in ARMs
>>published information at all ?
> 
> 
> You're looking at the wrong information. 

I was looking at ARMs own web data, on Thumb-2.

If that is wrong, then we'll wait for it to be corrected.

Your own numbers above agree that Cortex is struggling to match
ARM performance on Speed -[ real soon now... just need another compiler 
pass...]

>>  Their example claim of a system Size saving of a (mere) 9%, also
>>avoids any comments on Speed. Hmmmm... ?
> 
> 
> You mean the gatecount here?

No Code Size. They somehow 'missed' mention of the speed numbers ?

> The saving over ARM7tdmi with the same set
> of peripherals is about 37K gates (70K - 33K). 
<snip>

  The more important comparison is -M and -R, -A gate counts, then you 
compare 'same design generation'.

  Better still, give us the incremental cost of adding size-optimised 
ARM compatible execution to M3 [ie: can be a little slower, NO-choke is 
the design brief] ?

  Summary: Thumb-2 has performance merits, but the -M variant
risks 'falling between two stools' - instead of building on their
strengths, they seem to be trying to be all things to all users.
That's a pity, as the talent and resource could be better applied.

  Probably time to end this thread, and wait 18 months for the users to 
vote.. :)

-jg

Reply by Gene S. Berkowitz ●September 9, 20062006-09-09

In article <Xns9649A3DE02BDmnooneuiucedu127001@204.127.199.17>, 
mnoone.uiuc.edu@127.0.0.1 says...
> Hi - does anybody know of any ARMs with a built in hardware divide? I heard 
> that a new core coming out would have a built-in hardware divide, but I've 
> been unsuccessful in finding out which core that is and if any chips have 
> that core. Thanks,
> 
> -Michael J. Noone

Not sure if you ever got an actual answer to your question.
The Philips/NXP LPC3180 is based on the ARM926EJ-S, and has a vector 
floating point co-processor (not in the core, but at least it's on the 
same chip...)

"This CPU coprocessor provides full support for single-precision and 
double-precision add, subtract, multiply, divide, and multiply-
accumulate operations at CPU clock speeds. It is compliant with the IEEE 
754 standard, and enables advanced Motor control and DSP applications. 
The VFP has three separate pipelines for floating-point MAC operations,
divide or square root operations, and load/store operations. These 
pipelines can operate in parallel and can complete execution out of 
order. All single-precision instructions, except divide and square root, 
take one cycle and double-precision multiply and multiply-accumulate 
instructions take two cycles. The VFP also provides format
conversions between floating-point and integer word formats."

--Gene

Reply by Paul Gotch ●September 9, 20062006-09-09

In comp.sys.arm Gene S. Berkowitz <first.last@comcast.net> wrote:
> > Hi - does anybody know of any ARMs with a built in hardware divide? I heard 
> > that a new core coming out would have a built-in hardware divide, but I've 
> > been unsuccessful in finding out which core that is and if any chips have 
> > that core. Thanks,
> > 
> > -Michael J. Noone

> Not sure if you ever got an actual answer to your question.

The answer is that ARMv7-M and ARMv7-R architecture processors have hardware
divide. The Cortex M3 implements ARMv7-M and the Cortex R4 implements
ARMv7-R.

-p
-- 
"Unix is user friendly, it's just picky about who its friends are."
 - Anonymous
--------------------------------------------------------------------

Previous 2 34Next

Any ARMs with hardware divide?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group