Pipelined 6502/z80 with cache and 16x clock multiplier| page 2

Reply by David Brown ●December 24, 20102010-12-24

On 24/12/10 03:54, D Yuniskis wrote:
> Hi Brett,
>
> On 12/23/2010 6:26 PM, Brett Davis wrote:
>>> Rabbit tried this approach with the Z80. But, decided to make
>>> something that wasn't 100% compatible with the original Z80.
>>> At least the Z180 devices didn't suffer this fate.
>>
>> I looked at the Rabbit CPUs, its a quite nice upgrade that makes
>> using C code for it viable.
>
> I didn't see that it bought you anything "appreciable".
> I.e., if you aren't going to make a "100% compatible"
> device, then why not come up with an entirely different
> design (instead of reheating one that was decades old)?
>

I guess they wanted an almost-but-not-quite Z80 to run their 
almost-but-not-quite C :-)

>> Its not pipelined, 2 cycles per opcode byte and data byte.
>
> Most of its "improvements" come from a full 8-bit ALU
> (the original Z80 had a 4 bit ALU so had to push things
> through, "twice")
>
>> Something I would have expected ~6 years after the Z80 came out,
>> not ~26 years. ;)
>

The 6502 was a pipelined design - IIRC it overlapped at least part of 
the instructions while competing designs (like the Z80) were entirely 
non-pipelined.  This was one of the reasons that the Acorn BBC Micros 
were faster than many other home micros of that era.

Reply by larwe ●December 24, 20102010-12-24

On Dec 24, 6:29=A0am, David Brown <david.br...@removethis.hesbynett.no>
wrote:

> non-pipelined. =A0This was one of the reasons that the Acorn BBC Micros
> were faster than many other home micros of that era.

Well, "6502 faster than Z80" is only valid for small values of
"faster"; are we measuring a tight loop around a NOP, or an actual
useful function?

Of course I did love the Beeb.

Reply by D Yuniskis ●December 24, 20102010-12-24

Hi Lewin,

On 12/24/2010 5:43 AM, larwe wrote:
> On Dec 24, 6:29 am, David Brown<david.br...@removethis.hesbynett.no>
> wrote:
>
>> non-pipelined.  This was one of the reasons that the Acorn BBC Micros
>> were faster than many other home micros of that era.
>
> Well, "6502 faster than Z80" is only valid for small values of
> "faster"; are we measuring a tight loop around a NOP, or an actual
> useful function?

It also depends a lot on how you normalize "operating conditions"
(same clock frequency?  same memory access time?  same set of
available I/O's?  etc.).

6502/68xx vs 808x/Zx80 typified the early split between
processor designs.  Memory mapped vs. dedicated I/O space;
interrupt handling; "single accumulator" vs. register file;
etc.

I had a friend who worked in the 68xx camp while I was dealing
with 808x's... watching each other write code was almost an
"anxious" event -- wondering what was going to happen next.
E.g., I would plan ahead so everything I needed ended up in
registers; he would grab what he needed *as* he needed it...

Then TI came along suffering some major hallucinations with their
99xx(x)'s...  :-/  (clean idea but technology went a different
way).

(sigh)  It's too bad how *few* designs we have now to choose from.

Reply by David Brown ●December 25, 20102010-12-25

On 24/12/10 13:43, larwe wrote:
> On Dec 24, 6:29 am, David Brown<david.br...@removethis.hesbynett.no>
> wrote:
>
>> non-pipelined.  This was one of the reasons that the Acorn BBC Micros
>> were faster than many other home micros of that era.
>
> Well, "6502 faster than Z80" is only valid for small values of
> "faster"; are we measuring a tight loop around a NOP, or an actual
> useful function?
>
> Of course I did love the Beeb.

Note that I didn't say the "6502 is faster than the Z80".  I said the 
BBC Micro was faster than most contemporary home micros - several of 
which happened to be Z80-based (such as the ZX Spectrum).  There were 
many reasons for this.  The fact that the 6502 was pipelined meant that 
it was significantly faster than it should have been as an 8-bit 
register-based processor at a slower clock rate than the Z80 (2 MHz vs. 
3.5 MHZ, IIRC) was only one of the reasons.

Looking purely at the cpu, the 6502 was fast for some things and slow 
for others.  It had fast zero-page access - if you could hold your 
important data there, code size was small and speed high.  But if you 
needed to do a lot of data movement or 16-bit arithmetic, it was a lot 
slower than the Z80 (which was partly 16-bit).

Reply by Bernd Paysan ●December 25, 20102010-12-25

Muzaffer Kal wrote:
> In this context pad-limited most probably means that the die size is
> decided strictly by the number of pads one has to put around the die
> even if there is significant core area is left unused. Of course there
> technologies where one doesn't need to put the pads around the die and
> they can be placed inside but the cost for such a technique would
> probably be too high for the applications mentioned.

When you place pads around the die, you can pack them a lot closer 
together (staggered, minimal pitch 40&micro;m) than when you place them on top 
of the die for bumps (minimal pitch 250&micro;m, but now it's a 2d mesh, not 
just pads at the corners).

IMHO, there's absolutely no point - with actually still useful processes 
- in putting a CPU on a chip without the memory.  When you add the 
memory, how many bits your CPU uses doesn't matter that much - it's more 
how complex your CPU is.  My b16 is small enough that there is really 
little total area benefit from making it even smaller (and it's 16 bit).  
The main contributor to the area on my projects is the actual memory - 
and there, what you want is a compact program, not a low-bitsize CPU 
(where the program is significantly larger to achieve the same thing, 
since more instructions are necessary).

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Reply by Paul A. Clayton ●December 25, 20102010-12-25

On Dec 19, 11:12=A0pm, Stephen Fuld <SF...@alumni.cmu.edu.invalid>
wrote:
[snip]
> Perhaps, but I don't think that is the target market for 4 bit
> processors. =A0ARAIR the bit market for 4 bit processors is in the really
> low requirements consumer areas, such as remote controls for TV's etc.
> and things like microwave ovens, clothes washers, etc. =A0As such, speed
> isn't an issue, but low cost and perhaps low power (i.e. longer battery
> life) is. =A0So the market you are talking about may very well exist, but
> that doesn't mean it will replace the 4 bit market.

Performance does seem to be unimportant.  Note also that low power
can also exploit energy harvesting.  It is surprising to me, however,
that a mask-configurable 8-bit or 4-bit processor is not implemented
(or would dynamic selection be more appropriate/sufficiently power-
efficient? or fuse-configurable?), given that the registers/ALU/etc.
(or even the entire processor core) take up a small fraction of the
chip area.  (At larger bit widths, providing a half-width double-
threaded mode could be useful.)  It does seem unlikely to me that
a 4-bit processor makes sense: I strongly suspect that the area
savings relative to an 8-bit processor are not significant (when
ROM, RAM and peripherals take up most of the die space) and it seems
likely that memory accesses would consume a significant fraction of
the power reducing the impact of a wider processor.  If the processor
tends to be either fully active or fully off, then differences in
leakage (static) power would not be significant either it seems.
(For very simple processors, would wave pipelining and asynchronous
methods be attractive?)

The article linked to by the mentioned article
http://www.embeddedinsights.com/channels/2010/12/10/considerations-for-4-bi=
t-processing/
states that "EM Microelectronics . . . approaches a developer and
works to demonstrate how the 4-bit device can provide differentiation
to the developer=92s design and end product."  Perhaps I am being
excessively cynical, but this sounds like the technique of some
software vendors--approach less-technically knowledgeable managers
to make the sale.  I admit that being limited to a small number of
products by ROM mask validation cost constrains the methods available
to market the product, but it seems that a bidding process would also
work and still be open.

Having recently read Stanley Mazor's "The History of the
Microcomputer
- Invention  and Evolution"
(http://www.xnumber.com/xnumber/Microcomputer_invention.htm)
It seems that like the 4004 ("Dynamic RAM memory cells were also
used inside the CPU for the 64-b index register array and 48-b
Program counter/stack array."), some parts of these ultra-low power
processors could be implemented in DRAM, especially for data that
does not need to persist between active periods and is accessed
regularly during active periods (e.g., the PC might be loaded from
a vector table).  (For such specialized processors, it seems that
it might be reasonable for at least some interrupts to load values
from a local ROM table into some registers.)  (Xuejun Yang, Nathan
Cooprider, John Regehr, "Eliminating the Call Stack to Save RAM"
proposed putting return addresses into ROM to reduce RAM
requirements, which might also be useful for odd Architectures with
a tiny return address stack [the paper also proposed allocating
local variables into the global variable section using global
liveness analysis to minimize memory usage].)

Paul A. Clayton
just a technophile

Reply by George Neuner ●December 25, 20102010-12-25

On Sat, 25 Dec 2010 16:45:12 +0100, David Brown
<david.brown@removethis.hesbynett.no> wrote:

>On 24/12/10 13:43, larwe wrote:
>> On Dec 24, 6:29 am, David Brown<david.br...@removethis.hesbynett.no>
>> wrote:
>>
>>> non-pipelined.  This was one of the reasons that the Acorn BBC Micros
>>> were faster than many other home micros of that era.
>>
>> Well, "6502 faster than Z80" is only valid for small values of
>> "faster"; are we measuring a tight loop around a NOP, or an actual
>> useful function?
>>
>> Of course I did love the Beeb.
>
>Note that I didn't say the "6502 is faster than the Z80".  I said the 
>BBC Micro was faster than most contemporary home micros - several of 
>which happened to be Z80-based (such as the ZX Spectrum).  There were 
>many reasons for this.  The fact that the 6502 was pipelined meant that 
>it was significantly faster than it should have been as an 8-bit 
>register-based processor at a slower clock rate than the Z80 (2 MHz vs. 
>3.5 MHZ, IIRC) was only one of the reasons.

The original Z80 was 2.5Mhz.  IIRC, the early Z80 machines faired
rather poorly in a number of comparisons against the Apple's 1Mhz
6502.

Of course, the 6502 needed some clever coding to beat the Z80 in a
general mix of tasks (see below).  And when Zilog introduced the 4Mhz
part, the Z80 became faster in general.

[At least until the 65816 came along ... it took a 16Mhz Z80 to best a
3Mhz 65816 at most tasks.  The 8Mhz 65816 was an even match to a (real
mode) 10Mhz 80286 at many tasks.  The 286 was faster at software FP
arithmetic and, of course, offered protected mode multitasking.  The
816 also could multitask, but the implementation was (usually) more
complex due to the hardware stack being restricted to the first 64KB
of memory.  And, of course, there was no memory protection.]

>Looking purely at the cpu, the 6502 was fast for some things and slow 
>for others.  It had fast zero-page access - if you could hold your 
>important data there, code size was small and speed high.  But if you 
>needed to do a lot of data movement or 16-bit arithmetic, it was a lot 
>slower than the Z80 (which was partly 16-bit).

Yes, the 8080/Z80 had a number of dual 8/16 bit registers ... but IIRC
16-bit arithmetic took 2 extra cycles.

The 6502 had only 8-bit registers.  To do multi-byte arithmetic
quickly the data had to be in the zero page - addresses 00h..FFh - for
which the 6502 had a special 3-cycle address mode vs 4..6 cycles for
accessing a general 16-bit address, depending on the index mode.  

George

Reply by George Neuner ●December 25, 20102010-12-25

On Sun, 19 Dec 2010 20:25:49 -0600, Brett Davis <ggtgp@yahoo.com>
wrote:

>You could take a public CPU design like OpenRISC and replace the
>instruction decoder, and get an easy ~4x performance jump running
>65c802/65c816 code.
>
>Compatibility with the Apple// disk controller would be poor. ;)
>But lets ignore that for the moment, Apple made some work arounds
>for the Apple2GS, so that can be fixed.

The Apple ][ disk controller has precisely clocked subroutines that
require the 6502 to run at 1MHz.   The Apple //gs forced the 65816
into 6502 "compatibility mode" whenever it accessed addresses in
E00000h..E1FFFFh (where the bus slots, video and Apple ][ compatible
devices lived).

>Step 2 would be to add a boot loader to set the cache modes up
>correctly for your memory spaces, so everything is not write through.
>That will get you another ~2x speedup.

The 65816 brought out address valid lines for cache and DMA
implementations.  The stock Apple //gs had DMA and the various
accelerator boards for it added cache to the CPU.

>Is there a market for a 6502 era CPU that ran ~10x faster at 
>~10% more cost?

I have no idea whether 6502 compatible chips are in demand for any
purpose.

Even so, I wouldn't bother with the 6502, but rather I would implement
the 65816 (or the 65802 if you need 6502 pin compatibility).  The
658xx ISA is a superset of 6502 that is cleaner and easier to work
with even for 8-bit code.

George

Reply by larwe ●December 25, 20102010-12-25

On Dec 25, 9:45=A0pm, George Neuner <gneun...@comcast.net> wrote:

> I have no idea whether 6502 compatible chips are in demand for any
> purpose.
> Even so, I wouldn't bother with the 6502, but rather I would implement
> the 65816 (or the 65802 if you need 6502 pin compatibility). =A0The

At least until a year or two ago, Sunplus had a range of chips that
were 6502 core with some restrictions (IIRC no Y register and maybe a
couple of other oddities). Winbond and a couple of others also use
6502 or 65816 cores in their toy chips. I guess it depends on whether
you already have proprietary dev tools (for compiling proprietary
languages, building audio projects, assembling LCD data, etc) that
target 6502.

Reply by D Yuniskis ●December 25, 20102010-12-25

On 12/25/2010 8:39 PM, larwe wrote:
> On Dec 25, 9:45 pm, George Neuner<gneun...@comcast.net>  wrote:
>
>> I have no idea whether 6502 compatible chips are in demand for any
>> purpose.
>> Even so, I wouldn't bother with the 6502, but rather I would implement
>> the 65816 (or the 65802 if you need 6502 pin compatibility).  The
>
> At least until a year or two ago, Sunplus had a range of chips that
> were 6502 core with some restrictions (IIRC no Y register and maybe a
> couple of other oddities). Winbond and a couple of others also use
> 6502 or 65816 cores in their toy chips. I guess it depends on whether
> you already have proprietary dev tools (for compiling proprietary
> languages, building audio projects, assembling LCD data, etc) that
> target 6502.

Wasn't the 2A03 also a 6502 derivative?