EmbeddedRelated.com
Forums
Memfault Beyond the Launch

Pipelined 6502/z80 with cache and 16x clock multiplier

Started by Brett Davis December 19, 2010
On 24/12/10 03:54, D Yuniskis wrote:
> Hi Brett, > > On 12/23/2010 6:26 PM, Brett Davis wrote: >>> Rabbit tried this approach with the Z80. But, decided to make >>> something that wasn't 100% compatible with the original Z80. >>> At least the Z180 devices didn't suffer this fate. >> >> I looked at the Rabbit CPUs, its a quite nice upgrade that makes >> using C code for it viable. > > I didn't see that it bought you anything "appreciable". > I.e., if you aren't going to make a "100% compatible" > device, then why not come up with an entirely different > design (instead of reheating one that was decades old)? >
I guess they wanted an almost-but-not-quite Z80 to run their almost-but-not-quite C :-)
>> Its not pipelined, 2 cycles per opcode byte and data byte. > > Most of its "improvements" come from a full 8-bit ALU > (the original Z80 had a 4 bit ALU so had to push things > through, "twice") > >> Something I would have expected ~6 years after the Z80 came out, >> not ~26 years. ;) >
The 6502 was a pipelined design - IIRC it overlapped at least part of the instructions while competing designs (like the Z80) were entirely non-pipelined. This was one of the reasons that the Acorn BBC Micros were faster than many other home micros of that era.
On Dec 24, 6:29=A0am, David Brown <david.br...@removethis.hesbynett.no>
wrote:

> non-pipelined. =A0This was one of the reasons that the Acorn BBC Micros > were faster than many other home micros of that era.
Well, "6502 faster than Z80" is only valid for small values of "faster"; are we measuring a tight loop around a NOP, or an actual useful function? Of course I did love the Beeb.
Hi Lewin,

On 12/24/2010 5:43 AM, larwe wrote:
> On Dec 24, 6:29 am, David Brown<david.br...@removethis.hesbynett.no> > wrote: > >> non-pipelined. This was one of the reasons that the Acorn BBC Micros >> were faster than many other home micros of that era. > > Well, "6502 faster than Z80" is only valid for small values of > "faster"; are we measuring a tight loop around a NOP, or an actual > useful function?
It also depends a lot on how you normalize "operating conditions" (same clock frequency? same memory access time? same set of available I/O's? etc.). 6502/68xx vs 808x/Zx80 typified the early split between processor designs. Memory mapped vs. dedicated I/O space; interrupt handling; "single accumulator" vs. register file; etc. I had a friend who worked in the 68xx camp while I was dealing with 808x's... watching each other write code was almost an "anxious" event -- wondering what was going to happen next. E.g., I would plan ahead so everything I needed ended up in registers; he would grab what he needed *as* he needed it... Then TI came along suffering some major hallucinations with their 99xx(x)'s... :-/ (clean idea but technology went a different way). (sigh) It's too bad how *few* designs we have now to choose from.
On 24/12/10 13:43, larwe wrote:
> On Dec 24, 6:29 am, David Brown<david.br...@removethis.hesbynett.no> > wrote: > >> non-pipelined. This was one of the reasons that the Acorn BBC Micros >> were faster than many other home micros of that era. > > Well, "6502 faster than Z80" is only valid for small values of > "faster"; are we measuring a tight loop around a NOP, or an actual > useful function? > > Of course I did love the Beeb.
Note that I didn't say the "6502 is faster than the Z80". I said the BBC Micro was faster than most contemporary home micros - several of which happened to be Z80-based (such as the ZX Spectrum). There were many reasons for this. The fact that the 6502 was pipelined meant that it was significantly faster than it should have been as an 8-bit register-based processor at a slower clock rate than the Z80 (2 MHz vs. 3.5 MHZ, IIRC) was only one of the reasons. Looking purely at the cpu, the 6502 was fast for some things and slow for others. It had fast zero-page access - if you could hold your important data there, code size was small and speed high. But if you needed to do a lot of data movement or 16-bit arithmetic, it was a lot slower than the Z80 (which was partly 16-bit).
Muzaffer Kal wrote:
> In this context pad-limited most probably means that the die size is > decided strictly by the number of pads one has to put around the die > even if there is significant core area is left unused. Of course there > technologies where one doesn't need to put the pads around the die and > they can be placed inside but the cost for such a technique would > probably be too high for the applications mentioned.
When you place pads around the die, you can pack them a lot closer together (staggered, minimal pitch 40&micro;m) than when you place them on top of the die for bumps (minimal pitch 250&micro;m, but now it's a 2d mesh, not just pads at the corners). IMHO, there's absolutely no point - with actually still useful processes - in putting a CPU on a chip without the memory. When you add the memory, how many bits your CPU uses doesn't matter that much - it's more how complex your CPU is. My b16 is small enough that there is really little total area benefit from making it even smaller (and it's 16 bit). The main contributor to the area on my projects is the actual memory - and there, what you want is a compact program, not a low-bitsize CPU (where the program is significantly larger to achieve the same thing, since more instructions are necessary). -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/
On Dec 19, 11:12=A0pm, Stephen Fuld <SF...@alumni.cmu.edu.invalid>
wrote:
[snip]
> Perhaps, but I don't think that is the target market for 4 bit > processors. =A0ARAIR the bit market for 4 bit processors is in the really > low requirements consumer areas, such as remote controls for TV's etc. > and things like microwave ovens, clothes washers, etc. =A0As such, speed > isn't an issue, but low cost and perhaps low power (i.e. longer battery > life) is. =A0So the market you are talking about may very well exist, but > that doesn't mean it will replace the 4 bit market.
Performance does seem to be unimportant. Note also that low power can also exploit energy harvesting. It is surprising to me, however, that a mask-configurable 8-bit or 4-bit processor is not implemented (or would dynamic selection be more appropriate/sufficiently power- efficient? or fuse-configurable?), given that the registers/ALU/etc. (or even the entire processor core) take up a small fraction of the chip area. (At larger bit widths, providing a half-width double- threaded mode could be useful.) It does seem unlikely to me that a 4-bit processor makes sense: I strongly suspect that the area savings relative to an 8-bit processor are not significant (when ROM, RAM and peripherals take up most of the die space) and it seems likely that memory accesses would consume a significant fraction of the power reducing the impact of a wider processor. If the processor tends to be either fully active or fully off, then differences in leakage (static) power would not be significant either it seems. (For very simple processors, would wave pipelining and asynchronous methods be attractive?) The article linked to by the mentioned article http://www.embeddedinsights.com/channels/2010/12/10/considerations-for-4-bi= t-processing/ states that "EM Microelectronics . . . approaches a developer and works to demonstrate how the 4-bit device can provide differentiation to the developer=92s design and end product." Perhaps I am being excessively cynical, but this sounds like the technique of some software vendors--approach less-technically knowledgeable managers to make the sale. I admit that being limited to a small number of products by ROM mask validation cost constrains the methods available to market the product, but it seems that a bidding process would also work and still be open. Having recently read Stanley Mazor's "The History of the Microcomputer - Invention and Evolution" (http://www.xnumber.com/xnumber/Microcomputer_invention.htm) It seems that like the 4004 ("Dynamic RAM memory cells were also used inside the CPU for the 64-b index register array and 48-b Program counter/stack array."), some parts of these ultra-low power processors could be implemented in DRAM, especially for data that does not need to persist between active periods and is accessed regularly during active periods (e.g., the PC might be loaded from a vector table). (For such specialized processors, it seems that it might be reasonable for at least some interrupts to load values from a local ROM table into some registers.) (Xuejun Yang, Nathan Cooprider, John Regehr, "Eliminating the Call Stack to Save RAM" proposed putting return addresses into ROM to reduce RAM requirements, which might also be useful for odd Architectures with a tiny return address stack [the paper also proposed allocating local variables into the global variable section using global liveness analysis to minimize memory usage].) Paul A. Clayton just a technophile
On Sat, 25 Dec 2010 16:45:12 +0100, David Brown
<david.brown@removethis.hesbynett.no> wrote:

>On 24/12/10 13:43, larwe wrote: >> On Dec 24, 6:29 am, David Brown<david.br...@removethis.hesbynett.no> >> wrote: >> >>> non-pipelined. This was one of the reasons that the Acorn BBC Micros >>> were faster than many other home micros of that era. >> >> Well, "6502 faster than Z80" is only valid for small values of >> "faster"; are we measuring a tight loop around a NOP, or an actual >> useful function? >> >> Of course I did love the Beeb. > >Note that I didn't say the "6502 is faster than the Z80". I said the >BBC Micro was faster than most contemporary home micros - several of >which happened to be Z80-based (such as the ZX Spectrum). There were >many reasons for this. The fact that the 6502 was pipelined meant that >it was significantly faster than it should have been as an 8-bit >register-based processor at a slower clock rate than the Z80 (2 MHz vs. >3.5 MHZ, IIRC) was only one of the reasons.
The original Z80 was 2.5Mhz. IIRC, the early Z80 machines faired rather poorly in a number of comparisons against the Apple's 1Mhz 6502. Of course, the 6502 needed some clever coding to beat the Z80 in a general mix of tasks (see below). And when Zilog introduced the 4Mhz part, the Z80 became faster in general. [At least until the 65816 came along ... it took a 16Mhz Z80 to best a 3Mhz 65816 at most tasks. The 8Mhz 65816 was an even match to a (real mode) 10Mhz 80286 at many tasks. The 286 was faster at software FP arithmetic and, of course, offered protected mode multitasking. The 816 also could multitask, but the implementation was (usually) more complex due to the hardware stack being restricted to the first 64KB of memory. And, of course, there was no memory protection.]
>Looking purely at the cpu, the 6502 was fast for some things and slow >for others. It had fast zero-page access - if you could hold your >important data there, code size was small and speed high. But if you >needed to do a lot of data movement or 16-bit arithmetic, it was a lot >slower than the Z80 (which was partly 16-bit).
Yes, the 8080/Z80 had a number of dual 8/16 bit registers ... but IIRC 16-bit arithmetic took 2 extra cycles. The 6502 had only 8-bit registers. To do multi-byte arithmetic quickly the data had to be in the zero page - addresses 00h..FFh - for which the 6502 had a special 3-cycle address mode vs 4..6 cycles for accessing a general 16-bit address, depending on the index mode. George
On Sun, 19 Dec 2010 20:25:49 -0600, Brett Davis <ggtgp@yahoo.com>
wrote:

>You could take a public CPU design like OpenRISC and replace the >instruction decoder, and get an easy ~4x performance jump running >65c802/65c816 code. > >Compatibility with the Apple// disk controller would be poor. ;) >But lets ignore that for the moment, Apple made some work arounds >for the Apple2GS, so that can be fixed.
The Apple ][ disk controller has precisely clocked subroutines that require the 6502 to run at 1MHz. The Apple //gs forced the 65816 into 6502 "compatibility mode" whenever it accessed addresses in E00000h..E1FFFFh (where the bus slots, video and Apple ][ compatible devices lived).
>Step 2 would be to add a boot loader to set the cache modes up >correctly for your memory spaces, so everything is not write through. >That will get you another ~2x speedup.
The 65816 brought out address valid lines for cache and DMA implementations. The stock Apple //gs had DMA and the various accelerator boards for it added cache to the CPU.
>Is there a market for a 6502 era CPU that ran ~10x faster at >~10% more cost?
I have no idea whether 6502 compatible chips are in demand for any purpose. Even so, I wouldn't bother with the 6502, but rather I would implement the 65816 (or the 65802 if you need 6502 pin compatibility). The 658xx ISA is a superset of 6502 that is cleaner and easier to work with even for 8-bit code. George
On Dec 25, 9:45=A0pm, George Neuner <gneun...@comcast.net> wrote:

> I have no idea whether 6502 compatible chips are in demand for any > purpose. > Even so, I wouldn't bother with the 6502, but rather I would implement > the 65816 (or the 65802 if you need 6502 pin compatibility). =A0The
At least until a year or two ago, Sunplus had a range of chips that were 6502 core with some restrictions (IIRC no Y register and maybe a couple of other oddities). Winbond and a couple of others also use 6502 or 65816 cores in their toy chips. I guess it depends on whether you already have proprietary dev tools (for compiling proprietary languages, building audio projects, assembling LCD data, etc) that target 6502.
On 12/25/2010 8:39 PM, larwe wrote:
> On Dec 25, 9:45 pm, George Neuner<gneun...@comcast.net> wrote: > >> I have no idea whether 6502 compatible chips are in demand for any >> purpose. >> Even so, I wouldn't bother with the 6502, but rather I would implement >> the 65816 (or the 65802 if you need 6502 pin compatibility). The > > At least until a year or two ago, Sunplus had a range of chips that > were 6502 core with some restrictions (IIRC no Y register and maybe a > couple of other oddities). Winbond and a couple of others also use > 6502 or 65816 cores in their toy chips. I guess it depends on whether > you already have proprietary dev tools (for compiling proprietary > languages, building audio projects, assembling LCD data, etc) that > target 6502.
Wasn't the 2A03 also a 6502 derivative?

Memfault Beyond the Launch