Wilco Dijkstra wrote:
> "Jim Granville" <no.spam@designtools.co.nz> wrote in message
> news:427b5a17$1@clear.net.nz...
>
>
>>>> I did note that ARMs 'benchmarks' to justify the Cortex,
>>>> focus on
>>>>narrow bus systems, but there ARE very small uC shipping,
>>>>with wide busses...
>
>
> Do you have a reference? The only benchmark source I can find
> on Cortex-M3 is the comparison with other MCUs (1MByte):
> http://www.arm.com/Multimedia/DevCon2004_presentation.pdf
>
> The reason I find this statement surprising is that in fact Thumb-2 works
> best on wider interfaces (>= 32-bits), as it uses 32-bit instructions. It is
> faster than ARM when using the same flash interface since it fetches less
> code.
Yes, the ARM info is sparse, and poorly detailed, but what they have
published shows Thumb2 to have LOWER peformance than ARM, but better
code density. Thumb2 _does_ decrease the Step effect, between
ARM//Thumb, and adds smarter embedded opcodes.
They state it is a mix of 16 bit and 32 bit opcodes.
>
>>>In a wide bus uncached system the perfomance benifits are
>>>going to be very slight over stright ARM code, but the code is
>>>considerably smaller.
>>
>>You did read their numbers ?
>
>
> Yes, it looks promising, but without further details it is difficult
> to figure out why those numbers look suspiciously good. It's obvious
> that a prefetch buffer can hide the fetch latency in straight-line code.
> However typical code branches a lot and the latency of non-sequential
> accesses can only be hidden by a cache. Maybe that is what it does...
>
> Note a wide interface will not only speedup ARM, but also Thumb-2.
Yes, but the biggest effect is to shift the normal hit 32 bit opcode
fetch encounters. It is an opcode-bandsidth, and matching that to memory
bandwidth issue.
>
>
>><snip> Wilco Dijkstra wrote:
>>
>>>>>The Cortex family is very similar: there will be multiple
>>>>>CPUs at different performance levels within each of the A,
>>>>>R and M strands, and these will be binary compatible (ie.
>>>>>no recompile needed).
>>>>
>>>>Err What ?! [Who is confused here ?]
>>>>
>>>>Earlier in this thread, you stated
>>>>" ... so existing compilers and objects will continue to
>>>>work (as long as they don't contain ARM code). "
>
>
> I said that *within* families CPUs are 100% compatible. The
> paragraphs are consistent. To clarify with a detailed example:
>
> Suppose we have 2 different Cortex-M cores: M3 and M4. These
> are fully binary compatible in that you should be able to run an M3
> binary on the M4 and visa versa [1][2]. If newer versions provide
> the performance and features you want then you'll never want to move
> to another Cortex family (ie. binary compatibility is a non-issue).
>
> However say we also have an R5 core. You should be able to
> run M3 and M4 binaries on the R5 [1]. However you will need
> to do some more porting and recompilation to get the best out of the
> new CPU [3]. The same is true today when you move from an ARM7
> to an ARM11.
>
> Alternatively you can also run R5 binaries that have been compiled
> with downwards compatibility in mind (ie. no ARM code, no
> R5-specific features etc) on the M3 and M4. Doing this requires
> a bit of care of course, but no more than you need today for code
> that is designed to run on many architectures (eg. C libraries).
>
> So moving *within* a Cortex family is generally trivial - you'll get
> full binary compatibility. Moving *between* Cortex families may
> require some porting and care to get full binary compatibility.
> In all cases a recompilation is highly desirable as the compiler can
> then optimise for that particular CPU.
These verbal gymnastics aptly demonstrate my point that calling M3
something clearly different would have helped. When you have to
underline the difference between 'within', and 'between', then
perhaps a clearer name scheme would have been smarter.
>
> So... where do you want to migrate to today? (tm)
>
>
> [1] Of course this level of compatibility only applies to the instruction
> set - most MCUs have lots of peripherals which cause another level of
> incompatibility. For example any code that runs on the AT91 series can't
> run on the LPC2000 series (or visa versa). Even with identical interfaces
> one chip may have 2 timers and another 8. So if you use a purist definition
> of "binary compatible" no 2 chips are compatible.
Binary compatible means what it does on the 80C51. NO opcode choking.
Very simple. SFR and Peripheral compatability are easier to manage.
>
> [2] Of course while your M3 code runs fine on newer versions, the
> pipeline may be a little different, and so your code doesn't run as
> fast as it could (unless you recompile it - you may not care, but your
> competitor might).
>
> [3] You'll end up running with the caches disabled as the M3 doesn't
> have a cache and thus has no code to enable it. So you're not getting
> full use of the new features - and the difference between potential
> and actual performance is likely much larger than [2].
>
>
>
>>>> So, exactly what DOES happen when a Cortex M3 encounters
>>>> an ARM (not thumb) opcode ?
>>>>
>>>> If it chokes, it is not binary compatible. Very simple.
>>>
>>>Yes, it chokes.
>>
>>Good, so we have established it is NOT binary compatible.
>
>
> We already knew that the M3 does not run ARM code natively, however
> it does run existing Thumb and Thumb-2 code, so it is binary compatible
> with that. So you could port your OS to the M3 (which is something you
> would have to do even if the M3 supported ARM), then relink your existing
> Thumb objects/libraries. If you did have any ARM objects without source
> you could disassemble them and reassemble for Thumb-2 without too much
> effort. Not 100% compatible, but close enough.
'Close enough' for who ?
ARM users will make that call, not ARM marketing.
>
> Also a key goal of the M3 is to aid migration of non-ARM 8/16-bit MCU to
> the ARM world. The ARM world is totally incompatible of course, but if the
> gain is worth more than the cost, people will move. The M3 tries to lower
> the entry barrier as much as possible by removing features that cause new
> users trouble (like ARM/Thumb interworking, the OS model), and introducing
> features that make things easier (Thumb-2, DIV, faster interrupts, more flash
> for a given die size).
and this seems to be the crux of problem. ARM seem to think they can
replace the 8051/8 bit sector with this new variant. Instead, they have
lost focus on what attracts users to ARM ( see Ulf's comments ).
Atmel, Philips et al _already_ have sub $3 offerings, so there is
substantial overlap into the 8/16 bit arena now. And this with an
ARM/Thumb offering.
Mostly, the uC selection decisions I see made, hinge on Peripherals &
FLASH/RAM, NOT the core itself. As Ulf says, they choose ARM's
_because_ they are binary [opcode] compatible.
Philips seem to have a HW solution that simply and effectively
reduces the ARM/Thumb step effect. Thus any "new core" benchmarks that
exlude this solution, lack credibility.
It is better to talk about the better embedded opcodes/features in
Cortex. - and the A and R variants _include_ ARM opcodes.
After all, code size is steadily getting both larger and cheaper,
with FLASH ARMs now well clear of 8/16 bit models in FLASH resource.
>> So far, history proves to be quite intolerant to
>>not-binary-compatible options, that cause admin and version control
>>grief, and force users to carefully check
>>"Now, _which_ ARM did we use in that model ? -was it that Cortex-M?"
>
>
> I'd expect tools to automatically detect incompatibilities:
>
> (a) when linking (automatically select compatible libs, error if incompatible)
> (b) when simulating/debugging an image
> (c) when burning an image into flash
> (d) when running on hardware (trap when executing an incompatible instruction)
>
> This is basic stuff. You could even emulate unsupported instructions if you
> absolutely needed it.
Key words here are 'expect' and 'could'. We are talking about existing,
proven tools and in use right now, not horizonware.
>
>>Many ideas in Cortex are very good, and fix the shortfalls in the ARM
>>for embedded control, but I fear ARM looks to be repeating the mistakes
>>of history, by not learning from it....
>>
>>Will we find that the Cortex-M quietly gets 'de-emphasised' ?
>
>
> Given that M3 outperforms the good old ARM7tdmi by such a large
> margin on all aspects and Cortex has Thumb-2 written all over it, what do
> you think may quietly get "de-emphasised"? :-)
That's easy : The lack of binary [opcpode] compatibility.
Will Ulf be pushing Atmel to release a -M3 microcontroller : I doubt it!
I simply don't see the 'such a large margin on all aspects' in ARMs
published information at all ?
These graphs show Thumb-2 as being LARGER than Thumb, and SLOWER than
ARM ?! [but also smaller than ARM, and faster than Thumb]
Their example claim of a system Size saving of a (mere) 9%, also
avoids any comments on Speed. Hmmmm... ?
To me, Thumb2 is a sensible, middle ground between ARM and Thumb,
( fixes some of the older core's shortcommings ) but the removal of ARM
binary compatibility on the M3, and apparent pitch into a space users
are leaving void, is poorly researched.
Time will show who is right :)
-jg