EmbeddedRelated.com
Forums

PIC vs ARM assembler (no flamewar please)

Started by Unknown February 14, 2007
On Thu, 22 Feb 2007 11:43:32 GMT, "Wilco Dijkstra"
<Wilco_dot_Dijkstra@ntlworld.com> wrote:

> >"Jonathan Kirwan" <jkirwan@easystreet.com> wrote in message >news:6nkpt2he5ovvonio95dk1op99nm42q6kfp@4ax.com... >> On Wed, 21 Feb 2007 22:27:27 GMT, "Wilco Dijkstra" >> <Wilco_dot_Dijkstra@ntlworld.com> wrote: >> >>>"Jonathan Kirwan" <jkirwan@easystreet.com> wrote in message >>>news:ea7pt21omlbn70fujpjtd5b91ror4gc9l3@4ax.com... > >>>> I think of SRAM as "static RAM." Nothing more than that. > >> My first reaction to the above is that ASIC cpu designs have control >> over all this and they use that flexibility as a matter of course, >> too. And none of this addresses itself to the fact that registers >> are, in fact, SRAM. So your differentiation is without a difference. > >You're using a different definition of SRAM.
It's the definition I was taught in the 1970s, both by engineers who practiced at the time and by manufacturers who made the parts I used. I can refer you to data books on the subject, I suppose. Not that it would change your point... or mine.
>Wikipedia defines SRAM as a regular single ported cell structure with >word and bitlines which is laid out in a 2 dimensional array and typically >uses sense amps. It mentions dual ported SRAM and calls it DPRAM.
I retain the general classification of the term 'SRAM' from the roots by which it got its name. Not the wiki definition, where new terms are applied and old ones redefined.
>> What exactly is the difference in your mind between a flipflip and an >> sram bit cell? I'm curious. > >An SRAM bit cell is designed to be laid out in a 2 dimensional structure >sharing bit and word lines thus taking minimal area. A flop is completely >different. There are lots of variants, but they typically have a clock, may >contain a scan chain for debug and sometimes have special features >to save power. Note that flops are typically used in synthesized logic >rather than latches and precharged logic. They are irregular and much >larger than an SRAM cell, but they have a good fanout and can drive >logic directly unlike SRAM. > >So while logically they both store 1 bit, they have different interfaces, >characteristics, layout and uses. I hope that clears things up...
And that explains your use of the term and my difference with it. I don't think I'll change my use, yet. Jon
On Thu, 22 Feb 2007 09:51:18 -0800, I wrote:

><snip>
>And that explains your use of the term and my difference with it. I >don't think I'll change my use, yet.
Sidebar: The reason I won't is that I need a term that retains the general classification. It's meaningful to me. And if I adopted your use, I'd lose that word's denotation without another to replace it. Unless you can tell me what replaces that usage, today..... Jon
"Jonathan Kirwan" <jkirwan@easystreet.com> wrote in message 
news:f1mrt214rb2f2osns0fmps79fgm1uqeu5v@4ax.com...
> On Thu, 22 Feb 2007 09:51:18 -0800, I wrote: > >><snip> > >>And that explains your use of the term and my difference with it. I >>don't think I'll change my use, yet. > > Sidebar: The reason I won't is that I need a term that retains the > general classification. It's meaningful to me. And if I adopted your > use, I'd lose that word's denotation without another to replace it. > Unless you can tell me what replaces that usage, today.....
You (and Jim) are free to use your definition - it just may cause some confusion every now and again... I have no idea whether there are any terms that have the meaning you use, I think each kind of memory got its own name. There are so many variations and new memories are appearing all the time which don't fall in existing categories... Wilco
"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
> "David Brown" <david@westcontrol.removethisbit.com> wrote
[snip]
> > * Specialised accumulator: no > > Many famous CISCs are not accumulator based, eg PDP, VAX, 68K, > System/360 etc. Accumulators are typically used in 8-bitters where > most instructions are 1 or 2 bytes for good codesize.
What is your definition of accumulator-based? I would consider nearly all processors to be accumulator-based with very few exceptions. [snip]
> > Adding these variable length instructions is a good thing, if it doesn't > > cost too much at the decoder. It increases both code density and > > instruction speed, since it opens the path for 32-bit immediate data (or > > addresses) to be included directly in a single instruction. > > Actually, embedding large immediates in the instruction stream is > bad for codesize because they cannot be shared.
So what if they can't be shared? To share or not to share is a programming consideration.
> For Thumb-2 the main goal was to allow access to 32-bit ARM > instructions for cases where a single 16-bit instruction was > not enough.
ARM doesn't have universal immediates due to the limitations of the values that can be generated.
> Thumb-2 doesn't have immediates like 68K/CF.
"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message 
news:20070222.7A41868.96EE@mojaveg.lsan.mdsg-pacwest.com...
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes: >> "David Brown" <david@westcontrol.removethisbit.com> wrote > > [snip] >> > * Specialised accumulator: no >> >> Many famous CISCs are not accumulator based, eg PDP, VAX, 68K, >> System/360 etc. Accumulators are typically used in 8-bitters where >> most instructions are 1 or 2 bytes for good codesize. > > What is your definition of accumulator-based? > > I would consider nearly all processors to be accumulator-based > with very few exceptions.
The easiest way to explain is the syntax of a basic operation like add: add -> stack based add r0 -> accumulator based (likely CISC) add r0,r1 -> 2 operand (likely CISC) add r0,r1,r2 -> 3 operand (likely RISC) In an accumulator based architecture there is an implied operand, namely the accumulator. An advantage is instructions have only 1 operand, but a disadvantage is most operations overwrite the accumulator, so you may need additional moves/stores to save it.
>> Actually, embedding large immediates in the instruction stream is >> bad for codesize because they cannot be shared. > > So what if they can't be shared? To share or not to share > is a programming consideration.
Absolutely.
>> For Thumb-2 the main goal was to allow access to 32-bit ARM >> instructions for cases where a single 16-bit instruction was >> not enough. > > ARM doesn't have universal immediates due to the limitations > of the values that can be generated.
Yes, although the shifted immediates cover most of the frequently occurring constants, you need several instructions for more complex constants. New instructions were added to create 16 and 32-bit constants easily. Wilco
Everett M. Greene wrote:
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes: >> "David Brown" <david@westcontrol.removethisbit.com> wrote > > [snip] >>> * Specialised accumulator: no >> Many famous CISCs are not accumulator based, eg PDP, VAX, 68K, >> System/360 etc. Accumulators are typically used in 8-bitters where >> most instructions are 1 or 2 bytes for good codesize. > > What is your definition of accumulator-based? >
Accumulator-based architectures have a special register which is used as one of the sources and/or destination of ALU instructions. It may or may not be possible to do loads and stores from other registers. Wilco's comments about the format of an "add" instruction are helpful here, but there are other good indicators too. If your cpu has a register called "A", then there is a very good chance that "A" is an accumulator. In fact, if the registers are named rather than numbered, the architecture is probably accumulator based. It is possible to have more than one accumulator while still being accumulator based. That's quite common in DSPs, I believe.
> I would consider nearly all processors to be accumulator-based > with very few exceptions. >
Some random examples of non-accumulator architectures are AVR, msp430, 68k (including ColdFire), ARM, MIPs, PPC. Accumulators are common in CISC architectures, but not essential. I can't think of any way to make an accumulator-based RISC architecture.
> [snip] >>> Adding these variable length instructions is a good thing, if it doesn't >>> cost too much at the decoder. It increases both code density and >>> instruction speed, since it opens the path for 32-bit immediate data (or >>> addresses) to be included directly in a single instruction. >> Actually, embedding large immediates in the instruction stream is >> bad for codesize because they cannot be shared. > > So what if they can't be shared? To share or not to share > is a programming consideration. >
It's simple maths. Lets assume we have a 32-bit processor using instructions of 1, 2 or 3 16-bit words. You have a 32-bit constant "x" that is needed in various functions. There are two strategies to load "x" into a register - use "move d0, #x" (taking 48 bits), or "move d0, (x, a5)" (using 32 bits, assuming a 16-bit displacement for the address of x). The first strategy takes 48 bits per usage of x. The second takes 32 bits per usage of x, plus a full 32-bit shared copy of x in code. Thus 32-bit immediate data is smaller in code size unless the same data is used at least twice in the program. And the first strategy requires 48 bits of read bandwidth per access, while the second requires 64 bits and almost certainly takes longer to execute. The difference gets more dramatic if you have caches - the immediate data is likely to be prefetched by the instruction cache logic, while the shared value may not be. So even if your RISC processor requires 64 bits of instruction to load a 32-bit immediate (not an uncommon situation), it still often makes sense to put the immediate data directly in the instruction stream. Of course, in most code, 32-bit immediate data that can't be generated as a shifted 16-bit value or an offset from a base data pointer is relatively rare, and thus it is not a major issue.
>> For Thumb-2 the main goal was to allow access to 32-bit ARM >> instructions for cases where a single 16-bit instruction was >> not enough. > > ARM doesn't have universal immediates due to the limitations > of the values that can be generated. > >> Thumb-2 doesn't have immediates like 68K/CF.
"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes:
> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote > > "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes: > >> "David Brown" <david@westcontrol.removethisbit.com> wrote > > > > [snip] > >> > * Specialised accumulator: no > >> > >> Many famous CISCs are not accumulator based, eg PDP, VAX, 68K, > >> System/360 etc. Accumulators are typically used in 8-bitters where > >> most instructions are 1 or 2 bytes for good codesize. > > > > What is your definition of accumulator-based? > > > > I would consider nearly all processors to be accumulator-based > > with very few exceptions. > > The easiest way to explain is the syntax of a basic operation like add: > > add -> stack based > add r0 -> accumulator based (likely CISC) > add r0,r1 -> 2 operand (likely CISC) > add r0,r1,r2 -> 3 operand (likely RISC) > > In an accumulator based architecture there is an implied operand, > namely the accumulator. An advantage is instructions have only > 1 operand, but a disadvantage is most operations overwrite the > accumulator, so you may need additional moves/stores to save it.
An accumulator is an accumulator is... Except for the first, all your scenarios involve accumulators. The latter two involve multiple accumulators. Aside: Do the 68K A registers qualify as accumulators when you can only add/subtract to them?
"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message 
news:20070223.7951888.963F@mojaveg.lsan.mdsg-pacwest.com...
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes: >> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote
>> > What is your definition of accumulator-based? >> > >> > I would consider nearly all processors to be accumulator-based >> > with very few exceptions. >> >> The easiest way to explain is the syntax of a basic operation like add: >> >> add -> stack based >> add r0 -> accumulator based (likely CISC) >> add r0,r1 -> 2 operand (likely CISC) >> add r0,r1,r2 -> 3 operand (likely RISC) >> >> In an accumulator based architecture there is an implied operand, >> namely the accumulator. An advantage is instructions have only >> 1 operand, but a disadvantage is most operations overwrite the >> accumulator, so you may need additional moves/stores to save it. > > An accumulator is an accumulator is... Except for > the first, all your scenarios involve accumulators. > The latter two involve multiple accumulators.
No. An accumulator is typically hardwired to the ALU, so there is one per ALU. The Z-80 A register is a good example of this. If the result can be written in any register, it is not an accumulator. The stack top in a stack based architecture is often implemented as an accumulator (but this doesn't show in the ISA).
> Aside: Do the 68K A registers qualify as accumulators > when you can only add/subtract to them?
No, like D registers they are not accumulators. Wilco
Wilco Dijkstra wrote:
> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote: >> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> writes: >>> "Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote > >>>> What is your definition of accumulator-based? >>>> >>>> I would consider nearly all processors to be accumulator-based >>>> with very few exceptions. >>> >>> The easiest way to explain is the syntax of a basic operation >>> like add: >>> >>> add -> stack based >>> add r0 -> accumulator based (likely CISC) >>> add r0,r1 -> 2 operand (likely CISC) >>> add r0,r1,r2 -> 3 operand (likely RISC) >>> >>> In an accumulator based architecture there is an implied operand, >>> namely the accumulator. An advantage is instructions have only >>> 1 operand, but a disadvantage is most operations overwrite the >>> accumulator, so you may need additional moves/stores to save it. >> >> An accumulator is an accumulator is... Except for >> the first, all your scenarios involve accumulators. >> The latter two involve multiple accumulators. > > No. An accumulator is typically hardwired to the ALU, so there is > one per ALU. The Z-80 A register is a good example of this. If the > result can be written in any register, it is not an accumulator.
Also the HL register, as a 16 bit accumulator.
> > The stack top in a stack based architecture is often implemented > as an accumulator (but this doesn't show in the ISA).
Several level of the stack may be so implemented. The HP3000 did this. So did my 8080 pcode translator for one level. The speed improvement can be impressive. -- Chuck F (cbfalconer at maineline dot net) Available for consulting/temporary embedded and systems. <http://cbfalconer.home.att.net>
"David Brown" <david@westcontrol.removethisbit.com> wrote in message 
news:45deb79f$0$22513$8404b019@news.wineasy.se...

>>>> Adding these variable length instructions is a good thing, if it >>>> doesn't cost too much at the decoder. It increases both code density >>>> and instruction speed, since it opens the path for 32-bit immediate >>>> data (or addresses) to be included directly in a single instruction. >>> Actually, embedding large immediates in the instruction stream is >>> bad for codesize because they cannot be shared. >> >> So what if they can't be shared? To share or not to share >> is a programming consideration. >> > > It's simple maths. Lets assume we have a 32-bit processor using > instructions of 1, 2 or 3 16-bit words. You have a 32-bit constant "x" > that is needed in various functions. There are two strategies to load "x" > into a register - use "move d0, #x" (taking 48 bits), or "move d0, (x, > a5)" (using 32 bits, assuming a 16-bit displacement for the address of x). > The first strategy takes 48 bits per usage of x. The second takes 32 bits > per usage of x, plus a full 32-bit shared copy of x in code. Thus 32-bit > immediate data is smaller in code size unless the same data is used at > least twice in the program.
Yes. How often you need the same constant depends a bit on how you deal with global variables, but unless you only use global variables once, it's always a win to share the addresses.
>And the first strategy requires 48 bits of read bandwidth per access, while >the second requires 64 bits and almost certainly takes longer to execute.
However the second method makes good use of a Harvard architecture. Instruction fetch typically requires the most bandwidth, so inline constants make fetching slower (and this effect gets worse on superscalar CPUs). So which method is faster depends a bit on the micro architecture, but my money is on loading as fetching is usually bandwidth limited.
> The difference gets more dramatic if you have caches - the immediate data > is likely to be prefetched by the instruction cache logic, while the > shared value may not be.
In both cases there will a cache miss on the first use. Whether it is I-cache or D-cache is not really relevant. Smaller code has fewer misses overall.
> So even if your RISC processor requires 64 bits of instruction to load a > 32-bit immediate (not an uncommon situation), it still often makes sense > to put the immediate data directly in the instruction stream.
On a wide superscalar RISC, maybe. The 2 instruction sethi/setlo sequence will take 2 cycles on a scalar CPU, which is slower than a load. Thumb needs 48 bits to load a constant even if it is never reused, so it's always better to load immediates than inlining them in the instruction stream.
> Of course, in most code, 32-bit immediate data that can't be generated as > a shifted 16-bit value or an offset from a base data pointer is relatively > rare, and thus it is not a major issue.
Complex immediates that need more than 16 bits are very rare indeed. Address constants are common though, but there are ways to reduce the number of unique constants to encourage sharing. Several global variables can be accessed via the same base address for example. Wilco