PIC vs ARM assembler (no flamewar please)| page 7

Reply by Wilco Dijkstra ●February 20, 20072007-02-20

"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message 
news:20070219.7A36678.8557@mojaveg.lsan.mdsg-pacwest.com...
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:

>> Yes, but a pure 2-operand instruction set requires additional
>> moves to avoid overwriting the destination register.
>
> I suspect that memory-to-memory operations are more common and
> the load/store RISC requires a temporary register for this.

I'm not sure what you mean. z = x + y needs an extra move on any
2 operand architecture (assuming x and y remain live).

>> Thumb has various 3-opnd instructions to reduce this.
>
> Thumb?  I thought all the three-operand instructions were
> removed from Thumb.

Thumb has 3 operand add and loads, for example:

ADDS r0,r1,#0..7
ADDS r0,r1,r2
SUBS r0,r1,#0..7
SUBS r0,r1,r2
LDR    r0,[r1,#0..124]
LDR    r0,[r1,r2]
STR    r0,[r1,#0..124]
STR    r0,[r1,r2]

>> RISC focuses more on easy decoding...
>
> True to a degree, but ease of decoding doesn't require
> three-operand instructions, conditional execution of every
> instruction, etc.

Indeed, but it doesn't make decoding harder either. Other
RISCs simply have unused bits, while on ARM they have a
use that significantly improves codesize and performance.
Despite this, fixed 32-bit instructions are not optimal.

Wilco

Reply by rickman ●February 20, 20072007-02-20

On Feb 20, 4:46 am, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote:
> "rickman" <gnu...@gmail.com> skrev i meddelandetnews:1171639930.653456.17030@t69g2000cwt.googlegroups.com...
>
>
>
>
>
> > On Feb 16, 7:39 am, "Ulf Samuelsson" <u...@a-t-m-e-l.com> wrote:
> >> <ucad...@gmail.com> skrev i
> >> meddelandetnews:1171495132.249289.198340@v33g2000cwv.googlegroups.com...
>
> >> > Had a discussion with a _hardware_ guy (as in transistors and OP-amps)
> >> > about "powerful" micros.
>
> >> > He his a PIC guy and claimed that PIC have a very nice instruction set
> >> > and is a pleasure to work with in assembly. He also mentioned the he
> >> > would rather use a dsPIC instead of an ARM7 because ARM7 is very hard
> >> > to program and has a confusing assembly (we never talked application,
> >> > so I assume he meant this holds regardless of application). He also
> >> > said that another major advantage of dsPIC is that its a PIC, hence
> >> > the know-how and toolchain advantage...
>
> >> > Completely shocked, I told him that my experience was the exact
> >> > opposite, and I really enjoy ARM assembler (well, maybe not enjoy...).
> >> > Anyway, after that, the discussion turned into a flamewar...
>
> >> > So what do you say? Maybe I have been wrong all the time?
>
> >> > What do you guys think about the instruction set and architecture,
> >> > provided that you were forced to code in assembly and we ignored the
> >> > fact that these is more of an apples vs pink-flying-elephants
> >> > comparison...
>
> >> > (you can also include your background and your other favorite micros
> >> > such as AVR and MSP4xx, but_ please_ don't flame. and you must REALLY
> >> > HAVE WORKED with all of them, no gusses please :)  )
>
> >> > ((yes, I REALLY do want your answers. Because I suspect the answer
> >> > will differ very much dependent on your background, and experience and
> >> > your application, and I think that information would benefit this
> >> > little community))
>
> >> > -shocked
>
> >> The Series 32000 instruction set is way superior to anything mentioned so
> >> far.
> >> The MC68000 instructon set is a murky wannabee in comparision.
>
> >>     Try doing this in a single instruction on another architecture...
>
> >>     pointer1->field1[ix1] = (unsigned int) pointer2->field2[ix2];
>
> >>     maps to:
>
> >>     movzbd     field1(pointer1(sb))[r0:d], field2(pointer2(sp))[r3:d]
>
> >> Elegance is Everything!
>
> >> --
> >> Best Regards
> >> Ulf Samuelsson
>
> > Yeah, but how many days does it take to execute?
>
> > ;^)
>
> The NS32532 was about two times the speed of the x86/68k competition of  its
> day 68030.
> The NS32764 (later Swordfish) was one of the first Superscalar RISC
> processors
> and would execute code faster than any of the other RISC processors in that
> time period.
> It would decode the instruction into risc instructions before execution.
> The final version of course skipped the 32000 instruction set altogether.

Nice diversion, but you didn't answer the question.  How long did it
take to execute the instruction you describe above?

But your comments bring up another question.  If it was so good, what
happened???

Obviously there is more to a processor than just how fast it runs
code.  Just ask Intel, they will tell you so.

Reply by rickman ●February 20, 20072007-02-20

On Feb 18, 2:03 pm, Jim Granville <no.s...@designtools.maps.co.nz>
wrote:
> werty wrote:
> >   Long ago TI had its registers in RAM !
> >  ( 64 pin  "99000" ) , context switch was
> >  fast ! and you could have hundreds !
>
>   Zilog have this in their current uC.
> It's also something Atmel could/should have done with their AVR,
> and it is a common shortcoming of RISC devices in Microcontrollers.
> RISC cores are historically pitched by die area, and memory is 'NIH',
> and so we see pointer/stack thrashing.

Using RAM for registers in the way that the TMS9900 did is a concept
that had its time and the world has moved on.  It made sense when
register and memory has nearly the same speed.  Now that memory is the
speed bottleneck for CPUs, it would be horribly slow to implement.
The TMS9900 used a pointer register (that's right, registers did not
go away) to point to the first register in memory.  An ADD would then
take three memory accesses to complete rather than one clock cycle.
Even if you put the memory on chip, you either have to limit the
location of the registers to a special bank of fast, multiport memory
(register bank) or you have to accept multiple memory cycles for a
single instruction, even when working in registers.

I think I still have a TMS9900 board around somewhere along with a
TMS9995 board I built myself in wirewrap.

Reply by David Brown ●February 20, 20072007-02-20

Wilco Dijkstra wrote:
> "David Brown" <david@westcontrol.removethisbit.com> wrote in message 
> news:45dae320$0$24609$8404b019@news.wineasy.se...
>> Wilco Dijkstra wrote:
> 
>>> Correct, the information content of ARM instructions is around 19-20 bits
>>> per instruction. However you can't get there using a fixed length 
>>> encoding,
>>> so neither ARM nor Thumb are optimal. Thumb-2 uses mixed 16/32-bit
>>> encodings to get the best of both worlds.
>> That's beginning to sound like the ColdFire (which Freescale refers to as 
>> a "variable instruction length RISC processor").
> 
> Not really. ColdFire is a 68K variant removing some of the less
> frequently used instructions and complex addressing modes.

True enough.

> Although this allows for simpler and faster implementations, the
> instruction set remains as CISCy as the 68K. It's all marketing...
> 

Marketing terms are not necessarily the same thing as technical terms - 
I was careful to say "Freescale refers to as ..." rather than "is".

However, there is no fixed distinction between RISC and CISC.  The two 
terms refer to a range of characteristics commonly associated with RISC 
cpus and CISC cpus.  Some chips clearly fall into one camp or the other, 
but most have at least slightly mixed characteristics.  The ColdFire 
core is very much such a mixed chip - in terms of the ISA, it is 
noticeably more RISCy than the 68k (especially the later cores with 
their more complex addressing modes), and in terms of its 
implementation, it is even more so.  Even the original 68k, with its 
multiple registers and (mostly) orthogonal instruction set is pretty RISCy.

So the ARM is moving from a fairly pure RISC architecture, through the 
Thumb (with it's more CISCy smaller register set and more specialised 
register usage) and now Thumb-2 (with variable length instructions). 
It's gaining CISC attributes in a move to improve code density at the 
expense of more complex instruction decoding.

The ColdFire, on the other hand, has moved from the original 68k to a 
more RISCy core, with a much greater emphasis on single-cycle 
register-to-register instructions and a simpler and more efficient core, 
in order to improve performance and lead to a smaller implementation.

There are still plenty of differences between the architectures, but 
there is no doubt that there are a lot more similarities between the ARM 
Thumb-2 and the ColdFire than between the original ARM and the original 68k.

> There are few RISCs with variable length instructions.
> 

The AVR?  I can't think of any others.

> Wilco 
> 
>

Reply by Jim Granville ●February 20, 20072007-02-20

rickman wrote:

> On Feb 18, 2:03 pm, Jim Granville <no.s...@designtools.maps.co.nz>
> wrote:
> 
>>werty wrote:
>>
>>>  Long ago TI had its registers in RAM !
>>> ( 64 pin  "99000" ) , context switch was
>>> fast ! and you could have hundreds !
>>
>>  Zilog have this in their current uC.
>>It's also something Atmel could/should have done with their AVR,
>>and it is a common shortcoming of RISC devices in Microcontrollers.
>>RISC cores are historically pitched by die area, and memory is 'NIH',
>>and so we see pointer/stack thrashing.
> 
> 
> Using RAM for registers in the way that the TMS9900 did is a concept
> that had its time and the world has moved on.  It made sense when
> register and memory has nearly the same speed.  Now that memory is the
> speed bottleneck for CPUs, it would be horribly slow to implement.

I think you missed my uC = microcontroller. (AVR <> CPU)
What you state is correct for megabyte CPUs, with all the cahce and 
SDRAM fruit, but certainly NOT true for single chip microcontrollers.

CPUs being pressed into uC service, is one of the drawbacks with some
approaches. Quick and dirty, yes, efficent, no.

> The TMS9900 used a pointer register (that's right, registers did not
> go away) to point to the first register in memory.  An ADD would then
> take three memory accesses to complete rather than one clock cycle.
> Even if you put the memory on chip, you either have to limit the
> location of the registers to a special bank of fast, multiport memory
> (register bank) or you have to accept multiple memory cycles for a
> single instruction, even when working in registers.

Sounds like a poor example of how anyone would do this today.

Look at the XC166, and eZ8, for examples of how you can do
very efficent memory overlays.

In a uC, you are talking of a few K's of memory, so speed should
not be an issue at all.

-jg

Reply by msg ●February 20, 20072007-02-20

Jim Granville wrote:

> rickman wrote:
> 
>> On Feb 18, 2:03 pm, Jim Granville <no.s...@designtools.maps.co.nz>
>> wrote:
>>
>>> werty wrote:
>>>
>>>>  Long ago TI had its registers in RAM !
>>>> ( 64 pin  "99000" ) , context switch was
>>>> fast ! and you could have hundreds !
>>>
>>>
>>>  Zilog have this in their current uC.
>>> It's also something Atmel could/should have done with their AVR,
>>> and it is a common shortcoming of RISC devices in Microcontrollers.
>>> RISC cores are historically pitched by die area, and memory is 'NIH',
>>> and so we see pointer/stack thrashing.
>>
> 
> I think you missed my uC = microcontroller. (AVR <> CPU)
> What you state is correct for megabyte CPUs, with all the cahce and 
> SDRAM fruit, but certainly NOT true for single chip microcontrollers.

<snip>

> CPUs being pressed into uC service, is one of the drawbacks with some
> approaches. Quick and dirty, yes, efficent, no.

<snip>

> In a uC, you are talking of a few K's of memory, so speed should
> not be an issue at all.

I too miss the TMS9900/99000 ISA; I was always impressed by the
performance of Ti's DX-10 o/s running on 64kbyte, 3.3MHz 9900
servicing sixteen terminals with decent response time; context
switching was fast.  Ti provided some good multitasking realtime
executives for industrial control as well.

I've recently enjoyed working on the (older) Intel 8096/80x196
with its 256 registers addressed as memory and three-operand-capable
instructions; it is somewhat of a challenge to limit tasks
to sets of working registers within the on-chip set for fast
context switches without using a stack.  For the small-ish
uC projects I'm doing, the 9900 ISA would be far more efficient
and useful.

Regards,

Michael

Reply by rickman ●February 20, 20072007-02-20

On Feb 20, 2:14 pm, Jim Granville <no.s...@designtools.maps.co.nz>
wrote:
> rickman wrote:
> > Using RAM for registers in the way that the TMS9900 did is a concept
> > that had its time and the world has moved on.  It made sense when
> > register and memory has nearly the same speed.  Now that memory is the
> > speed bottleneck for CPUs, it would be horribly slow to implement.
>
> I think you missed my uC = microcontroller. (AVR <> CPU)
> What you state is correct for megabyte CPUs, with all the cahce and
> SDRAM fruit, but certainly NOT true for single chip microcontrollers.
>
> CPUs being pressed into uC service, is one of the drawbacks with some
> approaches. Quick and dirty, yes, efficent, no.

I understand.  The TMS9995 was much closer to an MCU with onboard RAM
and it still was much slower than register based CPUs.

> > The TMS9900 used a pointer register (that's right, registers did not
> > go away) to point to the first register in memory.  An ADD would then
> > take three memory accesses to complete rather than one clock cycle.
> > Even if you put the memory on chip, you either have to limit the
> > location of the registers to a special bank of fast, multiport memory
> > (register bank) or you have to accept multiple memory cycles for a
> > single instruction, even when working in registers.
>
> Sounds like a poor example of how anyone would do this today.
>
> Look at the XC166, and eZ8, for examples of how you can do
> very efficent memory overlays.
>
> In a uC, you are talking of a few K's of memory, so speed should
> not be an issue at all.

But for RAM to be as efficient as a register file it has to be triple
ported so you can read two operands and write back another... or you
have to go to an accumulator based design.  Once you have triple
ported RAM, you have just added a register file!  A rose by any other
name still smells as sweet...

Reply by Roberto Waltman ●February 20, 20072007-02-20

msg wrote:
>...
>I too miss the TMS9900/99000 ISA; I was always impressed by the
>performance of Ti's DX-10 o/s running on 64kbyte, 3.3MHz 9900
>servicing sixteen terminals with decent response time; context
>switching was fast.  Ti provided some good multitasking realtime
>executives for industrial control as well.

Aha. And the PDP-11s running RSX-11 or "Young-UNIX". Or HP-1000 under
RTE-II/III. Or NOVAs, Burroughs, or ...

But the real reason for the good performance of these very limited
(for today's standards) systems was not their "advanced" architectural
features, but that the people writing their software were aware of the
systems limitations, and acted accordingly...

Roberto Waltman

[ Please reply to the group,
  return address is invalid ]

Reply by Jim Granville ●February 20, 20072007-02-20

rickman wrote:
> But for RAM to be as efficient as a register file it has to be triple
> ported so you can read two operands and write back another... or you
> have to go to an accumulator based design.  Once you have triple
> ported RAM, you have just added a register file!  A rose by any other
> name still smells as sweet...

Correct, that's the hardware level detail.

The really important point, is at the SW level, you now access any small 
clusters of Register-Mappable-RAM variables VERY efficently indeed,
using register opcodes.
- Such clusters of variables are very common in code
- eg a Real time clock subroutine, could be fully coded using register
opcodes, with a single Ram-locate operation on entry.

Fast context switching is also now built in. Stack usage drops.
Lots of benefits, but you DO have to design the chip more as a system,
and not simply buy and paste-in an IP core.

It's also backward compatible. If you are uncomfortable with the 
overlay, or the tools are catching up, just leave the register pointer 
alone, and you have plain-old-vanilla-RISC.

See the XC166, and IIRC the Sun CPUs used to allow
a partial page overlap, so you could pass params in Ram.Registers, and 
allow locals as well, with very low pointer thrashing.

-jg

Reply by Wilco Dijkstra ●February 20, 20072007-02-20

"Jim Granville" <no.spam@designtools.maps.co.nz> wrote in message 
news:45db4806$1@clear.net.nz...
> rickman wrote:

>> The TMS9900 used a pointer register (that's right, registers did not
>> go away) to point to the first register in memory.  An ADD would then
>> take three memory accesses to complete rather than one clock cycle.
>> Even if you put the memory on chip, you either have to limit the
>> location of the registers to a special bank of fast, multiport memory
>> (register bank) or you have to accept multiple memory cycles for a
>> single instruction, even when working in registers.
>
> Sounds like a poor example of how anyone would do this today.
>
> Look at the XC166, and eZ8, for examples of how you can do
> very efficent memory overlays.
>
> In a uC, you are talking of a few K's of memory, so speed should
> not be an issue at all.

These are not examples of a RAM mapped register file, just of a
hardware assisted context switch. So the contents of the RAM are
copied to/from the register file but are not kept in sync until the next
context switch.

Even a few KB of SRAM is much slower than a register file.

Wilco