EmbeddedRelated.com
Forums

PIC vs ARM assembler (no flamewar please)

Started by Unknown February 14, 2007
"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote > > "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes: > >> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote > > >> > What is your definition of accumulator-based? > >> > > >> > I would consider nearly all processors to be accumulator-based > >> > with very few exceptions. > >> > >> The easiest way to explain is the syntax of a basic operation like add: > >> > >> add -> stack based > >> add r0 -> accumulator based (likely CISC) > >> add r0,r1 -> 2 operand (likely CISC) > >> add r0,r1,r2 -> 3 operand (likely RISC) > >> > >> In an accumulator based architecture there is an implied operand, > >> namely the accumulator. An advantage is instructions have only > >> 1 operand, but a disadvantage is most operations overwrite the > >> accumulator, so you may need additional moves/stores to save it. > > > > An accumulator is an accumulator is... Except for > > the first, all your scenarios involve accumulators. > > The latter two involve multiple accumulators. > > No. An accumulator is typically hardwired to the ALU, so there is > one per ALU. The Z-80 A register is a good example of this. > If the result can be written in any register, it is not an accumulator. > > The stack top in a stack based architecture is often implemented as > an accumulator (but this doesn't show in the ISA). > > > Aside: Do the 68K A registers qualify as accumulators > > when you can only add/subtract to them? > > No, like D registers they are not accumulators.
Now you've really baffled me as to what you consider to be an accumulator. I thought an accumulator is a register/ device/location to/on/in which arithmetic and logical operations can be performed. Have I missed something over the years?
"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message 
news:20070224.79EC8E8.8B06@mojaveg.lsan.mdsg-pacwest.com...
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
>> An accumulator is typically hardwired to the ALU, so there is >> one per ALU. The Z-80 A register is a good example of this. >> If the result can be written in any register, it is not an accumulator.
> Now you've really baffled me as to what you consider to be > an accumulator. I thought an accumulator is a register/ > device/location to/on/in which arithmetic and logical > operations can be performed. Have I missed something > over the years?
Correct, but it must also be the *only* possible register where the result of those operations is written. So when you can't choose the destination, it is the accumulator. Examples of accumulator architectures are Z-80, 6502, 8051. See also http://en.wikipedia.org/wiki/Accumulator_%28computing%29 Wilco
"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote > > "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes: > > >> An accumulator is typically hardwired to the ALU, so there is > >> one per ALU. The Z-80 A register is a good example of this. > >> If the result can be written in any register, it is not an accumulator. > > > Now you've really baffled me as to what you consider to be > > an accumulator. I thought an accumulator is a register/ > > device/location to/on/in which arithmetic and logical > > operations can be performed. Have I missed something > > over the years? > > Correct, but it must also be the *only* possible register where the > result of those operations is written. So when you can't choose the > destination, it is the accumulator.
So you're saying that single-accumulator processors are accumulator-based but multiple-accumulator processors are not?
> Examples of accumulator architectures are Z-80, 6502, 8051.
The 6502 has a single accumulator whereas the 6800 has two but the two are otherwise architecturally the same. The 6502 is accumulator-based and the 6800 is not?
> See also http://en.wikipedia.org/wiki/Accumulator_%28computing%29
I thought wikipedia is a collection of opinions, not an authoritative source.
On Sat, 24 Feb 2007 19:50:00 GMT, "Wilco Dijkstra"
<Wilco_dot_Dijkstra@ntlworld.com> wrote:

> >"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message >news:20070224.79EC8E8.8B06@mojaveg.lsan.mdsg-pacwest.com... >> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes: > >>> An accumulator is typically hardwired to the ALU, so there is >>> one per ALU. The Z-80 A register is a good example of this. >>> If the result can be written in any register, it is not an accumulator. > >> Now you've really baffled me as to what you consider to be >> an accumulator. I thought an accumulator is a register/ >> device/location to/on/in which arithmetic and logical >> operations can be performed. Have I missed something >> over the years? > >Correct, but it must also be the *only* possible register where the >result of those operations is written. So when you can't choose the >destination, it is the accumulator. > >Examples of accumulator architectures are Z-80, 6502, 8051. > >See also http://en.wikipedia.org/wiki/Accumulator_%28computing%29
The Goldstine/Neumann paper referenced clearly shows that the accumulator originally had some independent arithmetic capabilities. In ENIAC the numeric registers had quite a lot of independent functionality. Even in the 1970's, it might have been viable to implement the accumulator with 4 bit parallel load bidirectional shift registers available as TTL MSI chips. That accumulator could independently do single bit shifts, parallel clear, in addition to just storing the result from the ALU. These days the accumulator would be a simple latch and for instance clearing the accumulator would require something like disabling both the ALU input multiplexors (producing a constant 0 or 1 at the mux output), setting the ALU into XOR mode and when the result (=0x0000) latch it into the accumulator. When the accumulator these days is just a trivial latch, discussing if some architecture is accumulator based or general register based is just a question of operation code bit allocations. With potentially more targets for the ALU result, more bits would have to be allocated for the result, extending the length of the instruction word. With current level of integration, it would make sense to add more processing power to the accumulator/register e.g. closely integrate a simple ALU and flags register with each of the 8-256 general purpose registers. This would simplify overlapping operations, since this would reduce the risk for stalling either due to limited internal bus bandwidth or e.g. due to conflicts if only a single flags register would be available. Paul
Wilco Dijkstra wrote:
> "David Brown" <david@westcontrol.removethisbit.com> wrote in message > news:45deb79f$0$22513$8404b019@news.wineasy.se... > >>>>> Adding these variable length instructions is a good thing, if it >>>>> doesn't cost too much at the decoder. It increases both code density >>>>> and instruction speed, since it opens the path for 32-bit immediate >>>>> data (or addresses) to be included directly in a single instruction. >>>> Actually, embedding large immediates in the instruction stream is >>>> bad for codesize because they cannot be shared. >>> So what if they can't be shared? To share or not to share >>> is a programming consideration. >>> >> It's simple maths. Lets assume we have a 32-bit processor using >> instructions of 1, 2 or 3 16-bit words. You have a 32-bit constant "x" >> that is needed in various functions. There are two strategies to load "x" >> into a register - use "move d0, #x" (taking 48 bits), or "move d0, (x, >> a5)" (using 32 bits, assuming a 16-bit displacement for the address of x). >> The first strategy takes 48 bits per usage of x. The second takes 32 bits >> per usage of x, plus a full 32-bit shared copy of x in code. Thus 32-bit >> immediate data is smaller in code size unless the same data is used at >> least twice in the program. > > Yes. How often you need the same constant depends a bit on how > you deal with global variables, but unless you only use global variables > once, it's always a win to share the addresses. > >> And the first strategy requires 48 bits of read bandwidth per access, while >> the second requires 64 bits and almost certainly takes longer to execute. > > However the second method makes good use of a Harvard architecture. > Instruction fetch typically requires the most bandwidth, so inline constants > make fetching slower (and this effect gets worse on superscalar CPUs). > So which method is faster depends a bit on the micro architecture, but my > money is on loading as fetching is usually bandwidth limited. >
As always, the answer is "it depends". On most fast processors, except for specialised architectures like DSPs, it all boils down to accesses of a unified memory space which has limited bandwidth, even if the internal buses are separated into code and data buses. And as for whether it is the data or the code bus that is most heavily loaded, *my* money is on profiling the application in question on the architecture in question, because anything else is pure guesswork.
>> The difference gets more dramatic if you have caches - the immediate data >> is likely to be prefetched by the instruction cache logic, while the >> shared value may not be. > > In both cases there will a cache miss on the first use. Whether it is > I-cache > or D-cache is not really relevant. Smaller code has fewer misses overall. >
That's not the way caches and prefetching works. The instruction cache logic will prefetch streams of instructions, based on branch prediction logic. So even though the inlined value needs to be loaded into the instruction cache, it will be (at the very latest) right round the corner by the time the "load immediate" instruction is actually executed. If the data has to be loaded through the data cache, however, then a miss means starting from scratch with an external memory access, and a lot of processor cycles lost while waiting. Of course, it's a different matter entirely with a small processor and single-cycle (or two-cycle) internal memories. Then there is no issue with bus access latency, and it makes sense to look at the saturation of the internal buses.
>> So even if your RISC processor requires 64 bits of instruction to load a >> 32-bit immediate (not an uncommon situation), it still often makes sense >> to put the immediate data directly in the instruction stream. > > On a wide superscalar RISC, maybe. The 2 instruction sethi/setlo sequence > will take 2 cycles on a scalar CPU, which is slower than a load. Thumb > needs 48 bits to load a constant even if it is never reused, so it's always > better to load immediates than inlining them in the instruction stream. >
Sorry, my numbers were comparing a cpu with a proper "load immediate 32-bit" instruction, such as the 68k/CF - the point is the increased efficiency if you can do that in *one* instruction, not two as is needed for most RISC cpus. Whether that increased efficiency in this case is worth the more complex instruction decoder is another question entirely.
>> Of course, in most code, 32-bit immediate data that can't be generated as >> a shifted 16-bit value or an offset from a base data pointer is relatively >> rare, and thus it is not a major issue. > > Complex immediates that need more than 16 bits are very rare indeed. > Address constants are common though, but there are ways to reduce the > number of unique constants to encourage sharing. Several global variables > can be accessed via the same base address for example. >
The PPC ABI recommends using a "small data segment", so that most global addresses are formed as a 16-bit offset to the SDS register.
> Wilco > >
Everett M. Greene wrote:
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes: >> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote >>> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes: >>>> An accumulator is typically hardwired to the ALU, so there is >>>> one per ALU. The Z-80 A register is a good example of this. >>>> If the result can be written in any register, it is not an accumulator. >>> Now you've really baffled me as to what you consider to be >>> an accumulator. I thought an accumulator is a register/ >>> device/location to/on/in which arithmetic and logical >>> operations can be performed. Have I missed something >>> over the years? >> Correct, but it must also be the *only* possible register where the >> result of those operations is written. So when you can't choose the >> destination, it is the accumulator. > > So you're saying that single-accumulator processors are > accumulator-based but multiple-accumulator processors are > not? >
I'm with Wilco on this. The ALU in a cpu core has two input channels and one output channel. In an accumulator based architecture, one of the input channels is fixed to the accumulator register, with the other being flexible (other registers, memory, immediate data, etc.). The output channel is often also fixed to the accumulator, but may possibly be able to write back to other sources. Thus you have a specialised accumulator implemented as a dual ported register, while the other registers are connected to the ALU with a simpler single port (read/write), or possibly separate read and write ports for some architectures. On some CPUs, such as the 6800, there are two accumulators which can be selected, but the principle is still the same, as is the case for those (like the Z80) which can treat other registers (H,L) as accumulators for specific operations. If you look at the 68k for comparison, you see a set of 8 general-purpose D registers. Any ALU operation can use any two of these as a sources, and one as a destination (the fact that destination must be one of the sources is mainly an artefact of limited instruction code space). You have a tripple-ported register bank, and no specific "super" registers. Ultimately, the term "accumulator" means a register than can be added to - more generally, it is a register tightly tied to the ALU.
>> Examples of accumulator architectures are Z-80, 6502, 8051. > > The 6502 has a single accumulator whereas the 6800 has two > but the two are otherwise architecturally the same. The > 6502 is accumulator-based and the 6800 is not? > >> See also http://en.wikipedia.org/wiki/Accumulator_%28computing%29 > > I thought wikipedia is a collection of opinions, not an > authoritative source.
Indeed - but it can be constructive nonetheless.
"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message 
news:20070225.7A02CD0.A549@mojaveg.lsan.mdsg-pacwest.com...

>> Examples of accumulator architectures are Z-80, 6502, 8051. > > The 6502 has a single accumulator whereas the 6800 has two > but the two are otherwise architecturally the same. The > 6502 is accumulator-based and the 6800 is not?
All 6800 instructions have an operand that selects A, B or memory. There are a few operations that have both A and B as inputs. So A and B are almost general purpose registers. You can leave one out and still have a functioning architecture (you can't remove special purpose registers such as the stack pointer, accumulator or flags). So I'm inclined to say it's a cross between a general purpose register and accumulator architecture. Wilco
David Brown <david@westcontrol.removethisbit.com> writes:

> Ultimately, the term "accumulator" means a register that > can be added to
As you can with all 16 A and D registers in the 68K, for instance.
> -- more generally, it is a register tightly tied to > the ALU.
I would consider this to be less general.
Everett M. Greene wrote:
> David Brown <david@westcontrol.removethisbit.com> writes: > >> Ultimately, the term "accumulator" means a register that >> can be added to > > As you can with all 16 A and D registers in the 68K, for > instance. >
I was explaining where the term comes from, rather than what is meant by an "accumulator" now (perhaps I was not very clear in what I wrote).
>> -- more generally, it is a register tightly tied to >> the ALU. > > I would consider this to be less general.
The clearest difference between accumulator-based cpus and general purpose register architectures is seen when using them - do a majority of the ALU instructions require a specific register or not? It's that simple. It can also be instructive to think how you would draw out the design. If you have one branch of the ALU being fed by a specific register, it's an accumulator. If your registers consist of an array of identical registers that can be the source and destination of the ALU, it is not an accumulator-based architecture. You can ask at what point do you move from multiple accumulators (like the 6800) to an array of registers (like the 68k). I'd say the naming convention shows fairly clearly what the core's designers think - when you have named registers "A" and "B", you have specialised registers and an accumulator architecture. When you have numbered registers "D0" ... "D7", you have an array of general purpose registers.
David Brown <david@westcontrol.removethisbit.com> writes:
> Everett M. Greene wrote: > > David Brown <david@westcontrol.removethisbit.com> writes: > > > >> Ultimately, the term "accumulator" means a register that > >> can be added to > > > > As you can with all 16 A and D registers in the 68K, for > > instance. > > I was explaining where the term comes from, rather than what is meant by > an "accumulator" now (perhaps I was not very clear in what I wrote). > > >> -- more generally, it is a register tightly tied to > >> the ALU. > > > > I would consider this to be less general. > > The clearest difference between accumulator-based cpus and general > purpose register architectures is seen when using them - do a majority > of the ALU instructions require a specific register or not? It's that > simple. > > It can also be instructive to think how you would draw out the design. > If you have one branch of the ALU being fed by a specific register, it's > an accumulator. If your registers consist of an array of identical > registers that can be the source and destination of the ALU, it is not > an accumulator-based architecture. > > You can ask at what point do you move from multiple accumulators (like > the 6800) to an array of registers (like the 68k). I'd say the naming > convention shows fairly clearly what the core's designers think - when > you have named registers "A" and "B", you have specialised registers and > an accumulator architecture. When you have numbered registers "D0" ... > "D7", you have an array of general purpose registers.
This is a difference without a difference? I'll take multiple accumulators over single accumulators any day. I don't care if you call them accumulators, data registers, or swizzle sticks.