PIC vs ARM assembler (no flamewar please)| page 11

Reply by Jonathan Kirwan ●February 22, 20072007-02-22

On Thu, 22 Feb 2007 11:43:32 GMT, "Wilco Dijkstra"
<Wilco_dot_Dijkstra@ntlworld.com> wrote:

>
>"Jonathan Kirwan" <jkirwan@easystreet.com> wrote in message 
>news:6nkpt2he5ovvonio95dk1op99nm42q6kfp@4ax.com...
>> On Wed, 21 Feb 2007 22:27:27 GMT, "Wilco Dijkstra"
>> <Wilco_dot_Dijkstra@ntlworld.com> wrote:
>>
>>>"Jonathan Kirwan" <jkirwan@easystreet.com> wrote in message
>>>news:ea7pt21omlbn70fujpjtd5b91ror4gc9l3@4ax.com...
>
>>>> I think of SRAM as "static RAM."  Nothing more than that.
>
>> My first reaction to the above is that ASIC cpu designs have control
>> over all this and they use that flexibility as a matter of course,
>> too.  And none of this addresses itself to the fact that registers
>> are, in fact, SRAM.  So your differentiation is without a difference.
>
>You're using a different definition of SRAM.

It's the definition I was taught in the 1970s, both by engineers who
practiced at the time and by manufacturers who made the parts I used.
I can refer you to data books on the subject, I suppose.  Not that it
would change your point... or mine.

>Wikipedia defines SRAM as a regular single ported cell structure with
>word and bitlines which is laid out in a 2 dimensional array and typically
>uses sense amps. It mentions dual ported SRAM and calls it DPRAM.

I retain the general classification of the term 'SRAM' from the roots
by which it got its name.  Not the wiki definition, where new terms
are applied and old ones redefined.

>> What exactly is the difference in your mind between a flipflip and an
>> sram bit cell?  I'm curious.
>
>An SRAM bit cell is designed to be laid out in a 2 dimensional structure
>sharing bit and word lines thus taking minimal area. A flop is completely
>different. There are lots of variants, but they typically have a clock, may
>contain a scan chain for debug and sometimes have special features
>to save power. Note that flops are typically used in synthesized logic
>rather than latches and precharged logic. They are irregular and much
>larger than an SRAM cell, but they have a good fanout and can drive
>logic directly unlike SRAM.
>
>So while logically they both store 1 bit, they have different interfaces,
>characteristics, layout and uses. I hope that clears things up...

And that explains your use of the term and my difference with it.  I
don't think I'll change my use, yet.

Jon

Reply by Jonathan Kirwan ●February 22, 20072007-02-22

On Thu, 22 Feb 2007 09:51:18 -0800, I wrote:

><snip>

>And that explains your use of the term and my difference with it.  I
>don't think I'll change my use, yet.

Sidebar:  The reason I won't is that I need a term that retains the
general classification.  It's meaningful to me.  And if I adopted your
use, I'd lose that word's denotation without another to replace it.
Unless you can tell me what replaces that usage, today.....

Jon

Reply by Wilco Dijkstra ●February 22, 20072007-02-22

"Jonathan Kirwan" <jkirwan@easystreet.com> wrote in message 
news:f1mrt214rb2f2osns0fmps79fgm1uqeu5v@4ax.com...
> On Thu, 22 Feb 2007 09:51:18 -0800, I wrote:
>
>><snip>
>
>>And that explains your use of the term and my difference with it.  I
>>don't think I'll change my use, yet.
>
> Sidebar:  The reason I won't is that I need a term that retains the
> general classification.  It's meaningful to me.  And if I adopted your
> use, I'd lose that word's denotation without another to replace it.
> Unless you can tell me what replaces that usage, today.....

You (and Jim) are free to use your definition - it just may cause some
confusion every now and again... I have no idea whether there are
any terms that have the meaning you use, I think each kind of memory
got its own name. There are so many variations and new memories are
appearing all the time which don't fall in existing categories...

Wilco

Reply by Everett M. Greene ●February 22, 20072007-02-22

"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
> "David Brown" <david@westcontrol.removethisbit.com> wrote

[snip]
> > * Specialised accumulator: no
> 
> Many famous CISCs are not accumulator based, eg PDP, VAX, 68K,
> System/360 etc. Accumulators are typically used in 8-bitters where
> most instructions are 1 or 2 bytes for good codesize.

What is your definition of accumulator-based?

I would consider nearly all processors to be accumulator-based
with very few exceptions.

[snip]
> > Adding these variable length instructions is a good thing, if it doesn't 
> > cost too much at the decoder.  It increases both code density and 
> > instruction speed, since it opens the path for 32-bit immediate data (or 
> > addresses) to be included directly in a single instruction.
> 
> Actually, embedding large immediates in the instruction stream is
> bad for codesize because they cannot be shared.

So what if they can't be shared?  To share or not to share
is a programming consideration.

> For Thumb-2 the main goal was to allow access to 32-bit ARM
> instructions for cases where a single 16-bit instruction was
> not enough.

ARM doesn't have universal immediates due to the limitations
of the values that can be generated.

> Thumb-2 doesn't have immediates like 68K/CF.

Reply by Wilco Dijkstra ●February 22, 20072007-02-22

"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message 
news:20070222.7A41868.96EE@mojaveg.lsan.mdsg-pacwest.com...
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
>> "David Brown" <david@westcontrol.removethisbit.com> wrote
>
> [snip]
>> > * Specialised accumulator: no
>>
>> Many famous CISCs are not accumulator based, eg PDP, VAX, 68K,
>> System/360 etc. Accumulators are typically used in 8-bitters where
>> most instructions are 1 or 2 bytes for good codesize.
>
> What is your definition of accumulator-based?
>
> I would consider nearly all processors to be accumulator-based
> with very few exceptions.

The easiest way to explain is the syntax of a basic operation like add:

add            -> stack based
add r0         -> accumulator based (likely CISC)
add r0,r1      -> 2 operand (likely CISC)
add r0,r1,r2   -> 3 operand (likely RISC)

In an accumulator based architecture there is an implied operand,
namely the accumulator. An advantage is instructions have only
1 operand, but a disadvantage is most operations overwrite the
accumulator, so you may need additional moves/stores to save it.

>> Actually, embedding large immediates in the instruction stream is
>> bad for codesize because they cannot be shared.
>
> So what if they can't be shared?  To share or not to share
> is a programming consideration.

Absolutely.

>> For Thumb-2 the main goal was to allow access to 32-bit ARM
>> instructions for cases where a single 16-bit instruction was
>> not enough.
>
> ARM doesn't have universal immediates due to the limitations
> of the values that can be generated.

Yes, although the shifted immediates cover most of the frequently
occurring constants, you need several instructions for more
complex constants. New instructions were added to create 16 and
32-bit constants easily.

Wilco

Reply by David Brown ●February 23, 20072007-02-23

Everett M. Greene wrote:
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
>> "David Brown" <david@westcontrol.removethisbit.com> wrote
> 
> [snip]
>>> * Specialised accumulator: no
>> Many famous CISCs are not accumulator based, eg PDP, VAX, 68K,
>> System/360 etc. Accumulators are typically used in 8-bitters where
>> most instructions are 1 or 2 bytes for good codesize.
> 
> What is your definition of accumulator-based?
> 

Accumulator-based architectures have a special register which is used as 
one of the sources and/or destination of ALU instructions.  It may or 
may not be possible to do loads and stores from other registers.

Wilco's comments about the format of an "add" instruction are helpful 
here, but there are other good indicators too.  If your cpu has a 
register called "A", then there is a very good chance that "A" is an 
accumulator.  In fact, if the registers are named rather than numbered, 
the architecture is probably accumulator based.

It is possible to have more than one accumulator while still being 
accumulator based.  That's quite common in DSPs, I believe.

> I would consider nearly all processors to be accumulator-based
> with very few exceptions.
> 

Some random examples of non-accumulator architectures are AVR, msp430, 
68k (including ColdFire), ARM, MIPs, PPC.

Accumulators are common in CISC architectures, but not essential.  I 
can't think of any way to make an accumulator-based RISC architecture.

> [snip]
>>> Adding these variable length instructions is a good thing, if it doesn't 
>>> cost too much at the decoder.  It increases both code density and 
>>> instruction speed, since it opens the path for 32-bit immediate data (or 
>>> addresses) to be included directly in a single instruction.
>> Actually, embedding large immediates in the instruction stream is
>> bad for codesize because they cannot be shared.
> 
> So what if they can't be shared?  To share or not to share
> is a programming consideration.
> 

It's simple maths.  Lets assume we have a 32-bit processor using 
instructions of 1, 2 or 3 16-bit words.  You have a 32-bit constant "x" 
that is needed in various functions.  There are two strategies to load 
"x" into a register - use "move d0, #x" (taking 48 bits), or "move d0, 
(x, a5)" (using 32 bits, assuming a 16-bit displacement for the address 
of x).  The first strategy takes 48 bits per usage of x.  The second 
takes 32 bits per usage of x, plus a full 32-bit shared copy of x in 
code.  Thus 32-bit immediate data is smaller in code size unless the 
same data is used at least twice in the program.  And the first strategy 
requires 48 bits of read bandwidth per access, while the second requires 
64 bits and almost certainly takes longer to execute.

The difference gets more dramatic if you have caches - the immediate 
data is likely to be prefetched by the instruction cache logic, while 
the shared value may not be.

So even if your RISC processor requires 64 bits of instruction to load a 
32-bit immediate (not an uncommon situation), it still often makes sense 
to put the immediate data directly in the instruction stream.

Of course, in most code, 32-bit immediate data that can't be generated 
as a shifted 16-bit value or an offset from a base data pointer is 
relatively rare, and thus it is not a major issue.

>> For Thumb-2 the main goal was to allow access to 32-bit ARM
>> instructions for cases where a single 16-bit instruction was
>> not enough.
> 
> ARM doesn't have universal immediates due to the limitations
> of the values that can be generated.
> 
>> Thumb-2 doesn't have immediates like 68K/CF.

Reply by Everett M. Greene ●February 23, 20072007-02-23

"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote
> > "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
> >> "David Brown" <david@westcontrol.removethisbit.com> wrote
> >
> > [snip]
> >> > * Specialised accumulator: no
> >>
> >> Many famous CISCs are not accumulator based, eg PDP, VAX, 68K,
> >> System/360 etc. Accumulators are typically used in 8-bitters where
> >> most instructions are 1 or 2 bytes for good codesize.
> >
> > What is your definition of accumulator-based?
> >
> > I would consider nearly all processors to be accumulator-based
> > with very few exceptions.
> 
> The easiest way to explain is the syntax of a basic operation like add:
> 
> add            -> stack based
> add r0         -> accumulator based (likely CISC)
> add r0,r1      -> 2 operand (likely CISC)
> add r0,r1,r2   -> 3 operand (likely RISC)
> 
> In an accumulator based architecture there is an implied operand,
> namely the accumulator. An advantage is instructions have only
> 1 operand, but a disadvantage is most operations overwrite the
> accumulator, so you may need additional moves/stores to save it.

An accumulator is an accumulator is...  Except for
the first, all your scenarios involve accumulators.
The latter two involve multiple accumulators.

Aside:  Do the 68K A registers qualify as accumulators
when you can only add/subtract to them?

Reply by Wilco Dijkstra ●February 23, 20072007-02-23

"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message 
news:20070223.7951888.963F@mojaveg.lsan.mdsg-pacwest.com...
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
>> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote

>> > What is your definition of accumulator-based?
>> >
>> > I would consider nearly all processors to be accumulator-based
>> > with very few exceptions.
>>
>> The easiest way to explain is the syntax of a basic operation like add:
>>
>> add            -> stack based
>> add r0         -> accumulator based (likely CISC)
>> add r0,r1      -> 2 operand (likely CISC)
>> add r0,r1,r2   -> 3 operand (likely RISC)
>>
>> In an accumulator based architecture there is an implied operand,
>> namely the accumulator. An advantage is instructions have only
>> 1 operand, but a disadvantage is most operations overwrite the
>> accumulator, so you may need additional moves/stores to save it.
>
> An accumulator is an accumulator is...  Except for
> the first, all your scenarios involve accumulators.
> The latter two involve multiple accumulators.

No. An accumulator is typically hardwired to the ALU, so there is
one per ALU. The Z-80 A register is a good example of this.
If the result can be written in any register, it is not an accumulator.

The stack top in a stack based architecture is often implemented as
an accumulator (but this doesn't show in the ISA).

> Aside:  Do the 68K A registers qualify as accumulators
> when you can only add/subtract to them?

No, like D registers they are not accumulators.

Wilco

Reply by CBFalconer ●February 23, 20072007-02-23

Wilco Dijkstra wrote:
> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote:
>> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
>>> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote
> 
>>>> What is your definition of accumulator-based?
>>>>
>>>> I would consider nearly all processors to be accumulator-based
>>>> with very few exceptions.
>>>
>>> The easiest way to explain is the syntax of a basic operation
>>> like add:
>>>
>>> add            -> stack based
>>> add r0         -> accumulator based (likely CISC)
>>> add r0,r1      -> 2 operand (likely CISC)
>>> add r0,r1,r2   -> 3 operand (likely RISC)
>>>
>>> In an accumulator based architecture there is an implied operand,
>>> namely the accumulator. An advantage is instructions have only
>>> 1 operand, but a disadvantage is most operations overwrite the
>>> accumulator, so you may need additional moves/stores to save it.
>>
>> An accumulator is an accumulator is...  Except for
>> the first, all your scenarios involve accumulators.
>> The latter two involve multiple accumulators.
> 
> No. An accumulator is typically hardwired to the ALU, so there is
> one per ALU. The Z-80 A register is a good example of this. If the
> result can be written in any register, it is not an accumulator.

Also the HL register, as a 16 bit accumulator.
> 
> The stack top in a stack based architecture is often implemented
> as an accumulator (but this doesn't show in the ISA).

Several level of the stack may be so implemented.  The HP3000 did
this.  So did my 8080 pcode translator for one level.  The speed
improvement can be impressive.

-- 
Chuck F (cbfalconer at maineline dot net)
   Available for consulting/temporary embedded and systems.
   <http://cbfalconer.home.att.net>

Reply by Wilco Dijkstra ●February 23, 20072007-02-23

"David Brown" <david@westcontrol.removethisbit.com> wrote in message 
news:45deb79f$0$22513$8404b019@news.wineasy.se...

>>>> Adding these variable length instructions is a good thing, if it 
>>>> doesn't cost too much at the decoder.  It increases both code density 
>>>> and instruction speed, since it opens the path for 32-bit immediate 
>>>> data (or addresses) to be included directly in a single instruction.
>>> Actually, embedding large immediates in the instruction stream is
>>> bad for codesize because they cannot be shared.
>>
>> So what if they can't be shared?  To share or not to share
>> is a programming consideration.
>>
>
> It's simple maths.  Lets assume we have a 32-bit processor using 
> instructions of 1, 2 or 3 16-bit words.  You have a 32-bit constant "x" 
> that is needed in various functions.  There are two strategies to load "x" 
> into a register - use "move d0, #x" (taking 48 bits), or "move d0, (x, 
> a5)" (using 32 bits, assuming a 16-bit displacement for the address of x). 
> The first strategy takes 48 bits per usage of x.  The second takes 32 bits 
> per usage of x, plus a full 32-bit shared copy of x in code.  Thus 32-bit 
> immediate data is smaller in code size unless the same data is used at 
> least twice in the program.

Yes. How often you need the same constant depends a bit on how
you deal with global variables, but unless you only use global variables
once, it's always a win to share the addresses.

>And the first strategy requires 48 bits of read bandwidth per access, while 
>the second requires 64 bits and almost certainly takes longer to execute.

However the second method makes good use of a Harvard architecture.
Instruction fetch typically requires the most bandwidth, so inline constants
make fetching slower (and this effect gets worse on superscalar CPUs).
So which method is faster depends a bit on the micro architecture, but my
money is on loading as fetching is usually bandwidth limited.

> The difference gets more dramatic if you have caches - the immediate data 
> is likely to be prefetched by the instruction cache logic, while the 
> shared value may not be.

In both cases there will a cache miss on the first use. Whether it is 
I-cache
or D-cache is not really relevant. Smaller code has fewer misses overall.

> So even if your RISC processor requires 64 bits of instruction to load a 
> 32-bit immediate (not an uncommon situation), it still often makes sense 
> to put the immediate data directly in the instruction stream.

On a wide superscalar RISC, maybe. The 2 instruction sethi/setlo sequence
will take 2 cycles on a scalar CPU, which is slower than a load. Thumb
needs 48 bits to load a constant even if it is never reused, so it's always
better to load immediates than inlining them in the instruction stream.

> Of course, in most code, 32-bit immediate data that can't be generated as 
> a shifted 16-bit value or an offset from a base data pointer is relatively 
> rare, and thus it is not a major issue.

Complex immediates that need more than 16 bits are very rare indeed.
Address constants are common though, but there are ways to reduce the
number of unique constants to encourage sharing. Several global variables
can be accessed via the same base address for example.

Wilco

Previous 9 101112 13 14 Next

PIC vs ARM assembler (no flamewar please)

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group