PIC vs ARM assembler (no flamewar please)| page 12

Reply by Everett M. Greene ●February 24, 20072007-02-24

"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote
> > "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
> >> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote
> 
> >> > What is your definition of accumulator-based?
> >> >
> >> > I would consider nearly all processors to be accumulator-based
> >> > with very few exceptions.
> >>
> >> The easiest way to explain is the syntax of a basic operation like add:
> >>
> >> add            -> stack based
> >> add r0         -> accumulator based (likely CISC)
> >> add r0,r1      -> 2 operand (likely CISC)
> >> add r0,r1,r2   -> 3 operand (likely RISC)
> >>
> >> In an accumulator based architecture there is an implied operand,
> >> namely the accumulator. An advantage is instructions have only
> >> 1 operand, but a disadvantage is most operations overwrite the
> >> accumulator, so you may need additional moves/stores to save it.
> >
> > An accumulator is an accumulator is...  Except for
> > the first, all your scenarios involve accumulators.
> > The latter two involve multiple accumulators.
> 
> No. An accumulator is typically hardwired to the ALU, so there is
> one per ALU. The Z-80 A register is a good example of this.
> If the result can be written in any register, it is not an accumulator.
> 
> The stack top in a stack based architecture is often implemented as
> an accumulator (but this doesn't show in the ISA).
> 
> > Aside:  Do the 68K A registers qualify as accumulators
> > when you can only add/subtract to them?
> 
> No, like D registers they are not accumulators.

Now you've really baffled me as to what you consider to be
an accumulator.  I thought an accumulator is a register/
device/location to/on/in which arithmetic and logical
operations can be performed.  Have I missed something
over the years?

Reply by Wilco Dijkstra ●February 24, 20072007-02-24

"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message 
news:20070224.79EC8E8.8B06@mojaveg.lsan.mdsg-pacwest.com...
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:

>> An accumulator is typically hardwired to the ALU, so there is
>> one per ALU. The Z-80 A register is a good example of this.
>> If the result can be written in any register, it is not an accumulator.

> Now you've really baffled me as to what you consider to be
> an accumulator.  I thought an accumulator is a register/
> device/location to/on/in which arithmetic and logical
> operations can be performed.  Have I missed something
> over the years?

Correct, but it must also be the *only* possible register where the
result of those operations is written. So when you can't choose the
destination, it is the accumulator.

Examples of accumulator architectures are Z-80, 6502, 8051.

See also http://en.wikipedia.org/wiki/Accumulator_%28computing%29

Wilco

Reply by Everett M. Greene ●February 25, 20072007-02-25

"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote
> > "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
> 
> >> An accumulator is typically hardwired to the ALU, so there is
> >> one per ALU. The Z-80 A register is a good example of this.
> >> If the result can be written in any register, it is not an accumulator.
> 
> > Now you've really baffled me as to what you consider to be
> > an accumulator.  I thought an accumulator is a register/
> > device/location to/on/in which arithmetic and logical
> > operations can be performed.  Have I missed something
> > over the years?
> 
> Correct, but it must also be the *only* possible register where the
> result of those operations is written. So when you can't choose the
> destination, it is the accumulator.

So you're saying that single-accumulator processors are
accumulator-based but multiple-accumulator processors are
not?

> Examples of accumulator architectures are Z-80, 6502, 8051.

The 6502 has a single accumulator whereas the 6800 has two
but the two are otherwise architecturally the same.  The
6502 is accumulator-based and the 6800 is not?

> See also http://en.wikipedia.org/wiki/Accumulator_%28computing%29

I thought wikipedia is a collection of opinions, not an
authoritative source.

Reply by Paul Keinanen ●February 26, 20072007-02-26

On Sat, 24 Feb 2007 19:50:00 GMT, "Wilco Dijkstra"
<Wilco_dot_Dijkstra@ntlworld.com> wrote:

>
>"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message 
>news:20070224.79EC8E8.8B06@mojaveg.lsan.mdsg-pacwest.com...
>> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
>
>>> An accumulator is typically hardwired to the ALU, so there is
>>> one per ALU. The Z-80 A register is a good example of this.
>>> If the result can be written in any register, it is not an accumulator.
>
>> Now you've really baffled me as to what you consider to be
>> an accumulator.  I thought an accumulator is a register/
>> device/location to/on/in which arithmetic and logical
>> operations can be performed.  Have I missed something
>> over the years?
>
>Correct, but it must also be the *only* possible register where the
>result of those operations is written. So when you can't choose the
>destination, it is the accumulator.
>
>Examples of accumulator architectures are Z-80, 6502, 8051.
>
>See also http://en.wikipedia.org/wiki/Accumulator_%28computing%29

The Goldstine/Neumann paper referenced clearly shows that the
accumulator originally had some independent arithmetic capabilities. 

In ENIAC the numeric registers had quite a lot of independent
functionality.

Even in the 1970's, it might have been viable to implement the
accumulator with 4 bit parallel load bidirectional shift registers
available as TTL MSI chips. That accumulator could independently do
single bit shifts, parallel clear, in addition to just storing the
result from the ALU. 

These days the accumulator would be a simple latch and for instance
clearing the accumulator would require something like disabling both
the ALU input multiplexors (producing a constant 0 or 1 at the mux
output), setting the ALU into XOR mode and when the result (=0x0000)
latch it into the accumulator.

When the accumulator these days is just a trivial latch, discussing if
some architecture is accumulator based or general register based is
just a question of operation code bit allocations. With potentially
more targets for the ALU result, more bits would have to be allocated
for the result, extending the length of the instruction word.

With current level of integration, it would make sense to add more
processing power to the accumulator/register e.g. closely integrate a
simple ALU and flags register with each of the 8-256 general purpose
registers. This would simplify overlapping operations, since this
would reduce the risk for stalling either due to limited internal bus
bandwidth or e.g. due to conflicts if only a single flags register
would be available.

Paul

Reply by David Brown ●February 26, 20072007-02-26

Wilco Dijkstra wrote:
> "David Brown" <david@westcontrol.removethisbit.com> wrote in message 
> news:45deb79f$0$22513$8404b019@news.wineasy.se...
> 
>>>>> Adding these variable length instructions is a good thing, if it 
>>>>> doesn't cost too much at the decoder.  It increases both code density 
>>>>> and instruction speed, since it opens the path for 32-bit immediate 
>>>>> data (or addresses) to be included directly in a single instruction.
>>>> Actually, embedding large immediates in the instruction stream is
>>>> bad for codesize because they cannot be shared.
>>> So what if they can't be shared?  To share or not to share
>>> is a programming consideration.
>>>
>> It's simple maths.  Lets assume we have a 32-bit processor using 
>> instructions of 1, 2 or 3 16-bit words.  You have a 32-bit constant "x" 
>> that is needed in various functions.  There are two strategies to load "x" 
>> into a register - use "move d0, #x" (taking 48 bits), or "move d0, (x, 
>> a5)" (using 32 bits, assuming a 16-bit displacement for the address of x). 
>> The first strategy takes 48 bits per usage of x.  The second takes 32 bits 
>> per usage of x, plus a full 32-bit shared copy of x in code.  Thus 32-bit 
>> immediate data is smaller in code size unless the same data is used at 
>> least twice in the program.
> 
> Yes. How often you need the same constant depends a bit on how
> you deal with global variables, but unless you only use global variables
> once, it's always a win to share the addresses.
> 
>> And the first strategy requires 48 bits of read bandwidth per access, while 
>> the second requires 64 bits and almost certainly takes longer to execute.
> 
> However the second method makes good use of a Harvard architecture.
> Instruction fetch typically requires the most bandwidth, so inline constants
> make fetching slower (and this effect gets worse on superscalar CPUs).
> So which method is faster depends a bit on the micro architecture, but my
> money is on loading as fetching is usually bandwidth limited.
> 

As always, the answer is "it depends".  On most fast processors, except 
for specialised architectures like DSPs, it all boils down to accesses 
of a unified memory space which has limited bandwidth, even if the 
internal buses are separated into code and data buses.  And as for 
whether it is the data or the code bus that is most heavily loaded, *my* 
money is on profiling the application in question on the architecture in 
question, because anything else is pure guesswork.

>> The difference gets more dramatic if you have caches - the immediate data 
>> is likely to be prefetched by the instruction cache logic, while the 
>> shared value may not be.
> 
> In both cases there will a cache miss on the first use. Whether it is 
> I-cache
> or D-cache is not really relevant. Smaller code has fewer misses overall.
> 

That's not the way caches and prefetching works.  The instruction cache 
logic will prefetch streams of instructions, based on branch prediction 
logic.  So even though the inlined value needs to be loaded into the 
instruction cache, it will be (at the very latest) right round the 
corner by the time the "load immediate" instruction is actually 
executed.  If the data has to be loaded through the data cache, however, 
then a miss means starting from scratch with an external memory access, 
and a lot of processor cycles lost while waiting.

Of course, it's a different matter entirely with a small processor and 
single-cycle (or two-cycle) internal memories.  Then there is no issue 
with bus access latency, and it makes sense to look at the saturation of 
the internal buses.

>> So even if your RISC processor requires 64 bits of instruction to load a 
>> 32-bit immediate (not an uncommon situation), it still often makes sense 
>> to put the immediate data directly in the instruction stream.
> 
> On a wide superscalar RISC, maybe. The 2 instruction sethi/setlo sequence
> will take 2 cycles on a scalar CPU, which is slower than a load. Thumb
> needs 48 bits to load a constant even if it is never reused, so it's always
> better to load immediates than inlining them in the instruction stream.
> 

Sorry, my numbers were comparing a cpu with a proper "load immediate 
32-bit" instruction, such as the 68k/CF - the point is the increased 
efficiency if you can do that in *one* instruction, not two as is needed 
for most RISC cpus.  Whether that increased efficiency in this case is 
worth the more complex instruction decoder is another question entirely.

>> Of course, in most code, 32-bit immediate data that can't be generated as 
>> a shifted 16-bit value or an offset from a base data pointer is relatively 
>> rare, and thus it is not a major issue.
> 
> Complex immediates that need more than 16 bits are very rare indeed.
> Address constants are common though, but there are ways to reduce the
> number of unique constants to encourage sharing. Several global variables
> can be accessed via the same base address for example.
> 

The PPC ABI recommends using a "small data segment", so that most global 
addresses are formed as a 16-bit offset to the SDS register.

> Wilco 
> 
>

Reply by David Brown ●February 26, 20072007-02-26

Everett M. Greene wrote:
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
>> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote
>>> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
>>>> An accumulator is typically hardwired to the ALU, so there is
>>>> one per ALU. The Z-80 A register is a good example of this.
>>>> If the result can be written in any register, it is not an accumulator.
>>> Now you've really baffled me as to what you consider to be
>>> an accumulator.  I thought an accumulator is a register/
>>> device/location to/on/in which arithmetic and logical
>>> operations can be performed.  Have I missed something
>>> over the years?
>> Correct, but it must also be the *only* possible register where the
>> result of those operations is written. So when you can't choose the
>> destination, it is the accumulator.
> 
> So you're saying that single-accumulator processors are
> accumulator-based but multiple-accumulator processors are
> not?
> 

I'm with Wilco on this.

The ALU in a cpu core has two input channels and one output channel.  In 
an accumulator based architecture, one of the input channels is fixed to 
the accumulator register, with the other being flexible (other 
registers, memory, immediate data, etc.).  The output channel is often 
also fixed to the accumulator, but may possibly be able to write back to 
other sources.

Thus you have a specialised accumulator implemented as a dual ported 
register, while the other registers are connected to the ALU with a 
simpler single port (read/write), or possibly separate read and write 
ports for some architectures.

On some CPUs, such as the 6800, there are two accumulators which can be 
selected, but the principle is still the same, as is the case for those 
(like the Z80) which can treat other registers (H,L) as accumulators for 
specific operations.

If you look at the 68k for comparison, you see a set of 8 
general-purpose D registers.  Any ALU operation can use any two of these 
as a sources, and one as a destination (the fact that destination must 
be one of the sources is mainly an artefact of limited instruction code 
space).  You have a tripple-ported register bank, and no specific 
"super" registers.

Ultimately, the term "accumulator" means a register than can be added to 
- more generally, it is a register tightly tied to the ALU.

>> Examples of accumulator architectures are Z-80, 6502, 8051.
> 
> The 6502 has a single accumulator whereas the 6800 has two
> but the two are otherwise architecturally the same.  The
> 6502 is accumulator-based and the 6800 is not?
> 
>> See also http://en.wikipedia.org/wiki/Accumulator_%28computing%29
> 
> I thought wikipedia is a collection of opinions, not an
> authoritative source.

Indeed - but it can be constructive nonetheless.

Reply by Wilco Dijkstra ●February 26, 20072007-02-26

"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message 
news:20070225.7A02CD0.A549@mojaveg.lsan.mdsg-pacwest.com...

>> Examples of accumulator architectures are Z-80, 6502, 8051.
>
> The 6502 has a single accumulator whereas the 6800 has two
> but the two are otherwise architecturally the same.  The
> 6502 is accumulator-based and the 6800 is not?

All 6800 instructions have an operand that selects A, B or memory.
There are a few operations that have both A and B as inputs. So
A and B are almost general purpose registers. You can leave one
out and still have a functioning architecture (you can't remove special
purpose registers such as the stack pointer, accumulator or flags).
So I'm inclined to say it's a cross between a general purpose register
and accumulator architecture.

Wilco

Reply by Everett M. Greene ●February 27, 20072007-02-27

David Brown <david@westcontrol.removethisbit.com> writes:

> Ultimately, the term "accumulator" means a register that
> can be added to

As you can with all 16 A and D registers in the 68K, for
instance.

> -- more generally, it is a register tightly tied to
> the ALU.

I would consider this to be less general.

Reply by David Brown ●February 27, 20072007-02-27

Everett M. Greene wrote:
> David Brown <david@westcontrol.removethisbit.com> writes:
> 
>> Ultimately, the term "accumulator" means a register that
>> can be added to
> 
> As you can with all 16 A and D registers in the 68K, for
> instance.
> 

I was explaining where the term comes from, rather than what is meant by 
an "accumulator" now (perhaps I was not very clear in what I wrote).

>> -- more generally, it is a register tightly tied to
>> the ALU.
> 
> I would consider this to be less general.

The clearest difference between accumulator-based cpus and general 
purpose register architectures is seen when using them - do a majority 
of the ALU instructions require a specific register or not?  It's that 
simple.

It can also be instructive to think how you would draw out the design. 
If you have one branch of the ALU being fed by a specific register, it's 
an accumulator.  If your registers consist of an array of identical 
registers that can be the source and destination of the ALU, it is not 
an accumulator-based architecture.

You can ask at what point do you move from multiple accumulators (like 
the 6800) to an array of registers (like the 68k).  I'd say the naming 
convention shows fairly clearly what the core's designers think - when 
you have named registers "A" and "B", you have specialised registers and 
an accumulator architecture.  When you have numbered registers "D0" ... 
"D7", you have an array of general purpose registers.

Reply by Everett M. Greene ●February 27, 20072007-02-27

David Brown <david@westcontrol.removethisbit.com> writes:
> Everett M. Greene wrote:
> > David Brown <david@westcontrol.removethisbit.com> writes:
> > 
> >> Ultimately, the term "accumulator" means a register that
> >> can be added to
> > 
> > As you can with all 16 A and D registers in the 68K, for
> > instance.
> 
> I was explaining where the term comes from, rather than what is meant by 
> an "accumulator" now (perhaps I was not very clear in what I wrote).
> 
> >> -- more generally, it is a register tightly tied to
> >> the ALU.
> > 
> > I would consider this to be less general.
> 
> The clearest difference between accumulator-based cpus and general 
> purpose register architectures is seen when using them - do a majority 
> of the ALU instructions require a specific register or not?  It's that 
> simple.
> 
> It can also be instructive to think how you would draw out the design. 
> If you have one branch of the ALU being fed by a specific register, it's 
> an accumulator.  If your registers consist of an array of identical 
> registers that can be the source and destination of the ALU, it is not 
> an accumulator-based architecture.
> 
> You can ask at what point do you move from multiple accumulators (like 
> the 6800) to an array of registers (like the 68k).  I'd say the naming 
> convention shows fairly clearly what the core's designers think - when 
> you have named registers "A" and "B", you have specialised registers and 
> an accumulator architecture.  When you have numbered registers "D0" ... 
> "D7", you have an array of general purpose registers.

This is a difference without a difference?

I'll take multiple accumulators over single accumulators
any day.  I don't care if you call them accumulators,
data registers, or swizzle sticks.

Previous 10 111213 14 15 Next

PIC vs ARM assembler (no flamewar please)

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group