Hello,
I am often away for a few days, and work a lot on
http://f-cpu.seul.org/whygee/vspsim/
It should now work a bit better (still under Mozilla/Firefox only).
I have not received any personal reply,
OTOH the posts in this thread are interesting.
-~o0o~-
Tim Wescott wrote :
> As long as you have a one-instruction bit set, I can synthesize a carry bit so I'm mostly happy.
I'm not sure to understand. Can you give an example ?
> There are times when I have done assembly language coding that I have found it convenient
> to wait a bit before I checked a condition bit, but I could probably cope with an
> add-skip-no-carry instruction (ASNC -- odd, but it'd do).
with a skip-on-no-carry, you can set a register or memory location
to a value that will be checked later.
However, most asm i have done uses the carry immediately, often to do something "else".
Bresenham-like algorithms come to my mind, and there are others.
> If you're inventing an instruction set, remember that the PowerPC architecture
> has an EIEIO instruction. Please try to top it.
hmmm i'm not trying to compete with IBM :-)
i'm trying to make "something cool, fun and maybe useful"
(and it's certainly very instructive,
it's at least a great way to learn JavaScript).
Have a look at http://f-cpu.seul.org/whygee/vspsim/doc/opcode_map.html
and tell me what names sound/look weird (or too obscure).
-~o0o~-
Paul Taylor suggested :
> With regards to SHR, SAR, SHL, ROL, ROR, are ROL and ROR _really_
> necessary? Not trying to discourage you from implementing them. My reason
> for asking is that I playing with a compact 16-bit design, and I looked at
> those and decided that I could sacrifice them.
> Regards,
> Paul.
I can't name a program that I have written that does not use shifts.
I even see the absence of rotation operator in C as a plague.
My VSP (I also discovered later that this name is also used by others,
if someone can find a better name, please apply :-P)
was designed for interactive/multimedia stream processing
(like : ID3 tag parsing) and user I/O (LCD matrix).
These applications require a certain amount of bit and byte-level processing.
Byte-level is ok (look at the IE group of instructions),
SHL provide some necessary functions but i'm still not satisfied
when it comes to bit stream insertion/extraction.
I'm limited to 2 reads and 1 write (with often the same address).
So yes, these 5 "bit shuffling" instructions are necessary
and IMHO not enough. I have probably found an answer in the
Cray1 architecture manual, with one clever trick, but I don't know
how/if i can implement it here.
-~o0o~-
Walter Banks remarked :
> You can certainly sacrifice left operations. Right operations will depend a
> lot on the rest of the instruction set. A single barrel rotate can replace them all.
I see ROL/ROR/SHL/SHR as different ways to use a shifter.
In the code i have written so far, i have not remarked a preference for
a specific direction. I have also examined the possibility of having
only one rotation direction but this could create problems at the algorithmic level.
The opcode space is still quite comfortable and i have seen no way or reason
to remove one of these opcodes.
Walter then added :
> There are quite a few processors that don't have a
> condition code register.
Right ( MIPS, Alpha come to my mind).
That's one of the RISC methodology cornerstones.
From my point of view, addind a separate register
is a lot of troubles, because new specific instructions
must be included.
> For extended math yours is
> one approach but you can also use some form of
> chained multiprecision math.
chained ? I don't know this method.
> Multiprecision operations
> with 32 bit processors probably could be dropped with
> very little impact on most applications.
Multiprecision is not the primary purpose.
Overflow detection is much more common.
> Conceptually skip and conditional skip are powerful tools
> that can be used in clever combinations. Generally more
> skip conditions can be used than conventional conditional
> branches. A lot of thought needs to be put into what happens
> with sequential skip instructions. Is a skip treated as a
> pre-another instruction or a separate instruction?
I'm not sure about what you mean but here is an example of VSP code :
; Addition of R2 to the 64-bit value R0:R1
adds2 r2 r0 ; r0 = r0+r2
; The next instruction is skipped if no carry was generated
add 1 r1 ; carry : r1 = r1+1 (long form : 2 half-words)
The core computes the address of the next instruction at every cycle.
Either it's a whole new address (then the prefetch mechanism is critical),
or the skip advances a small counter that addresses the prefetch buffer.
My idea is to do the following in parallel, during the same cycle:
- the prefetch buffer automatically advances by 1 or 2 half-words (16 or 32 bits)
- the new pointer into the prefetch buffer is computed in the early stages
of the pipeline (add 1 or 2 half-words to the given value,
plus 1 because skip 0 is equivalent to no-skip)
- the addition is performed and if a carry does not occur, then the
above computed pointer is committed into the buffer instead
of the automatically advanced pointer.
But that mechanism will be implemented later, i want to make sure
that the instruction set is satisfying now.
-~o0o~-
Terran Melconian asked :
> How about for implementing multiplication and division?
This makes me think that the core has no multiplier,
because it is not meant to computate stuff, only to move data around.
So if multiplies must be implemented, a bit-by-bit version is a good
compromise (complexity/latency/size, because a Booth multiply array
is obviously overkill).
I have two options : either create "multiply/divide step" instructions,
or build a separate, asynchronous unit (accessible through special registers).
Both have drawbacks :
- mulstep/divstep instructions would use some amount of program space,
and occupy the core. Also, i'm not sure how to implement the instructions.
- a separate, asynchronous unit would allow the core to execute other
instructions in parallel. The program would write the 2 operands to
the input registers, then poll until the multiplier has finished.
The problem ? I intend the VSP to become SMT later. So several threads
could compete for the access to this "shared" unit.
Any suggestion is welcome (and will be integrated if it is elegant)
> I often use them for serialization and deserialization of I/O data
> streams when that is being done in software.
"Bit banging" is often a major headache.
I tried to take this into account.
-~o0o~-
Jim Granville noticed :
> I think you have a variable-length skip - which is a good idea.
There are good reasons for this, on top of the pure coolness factor.
The most important aspect is that the instructions are variable-length too
(but they are quite simple, anyway).
So the decoding logic has probably not yet read or decoded the next
instructions, and may not know how long they are.
The assembly software must compute the skip length
so i though, if the core can skip 1 or 2 half-words, why not 3 or 4.
More would create problems, though, and i'll have to make sure
that the prefetching mechanisms can prepare instructions fast enough
to keep the instruction buffers filled with at least
2+4+2=8 16-bit words, or 16 bytes, or 128 bits...
Longer skips would create a fetching penalty
so i stick to 2 bits.
> Another benefit of a short-skip opcode, is for a core you
> wish to feed from Serial memory : SPI Flash is getting faster
> all the time [Winbond have 150MBd streaming], so the sequential
> access time is reasonable, but a branch is more costly.
> That means a skip makes sense, as it does not spawn a new address,
> and for small distances, that is faster than the jump.
I have never thought about this, because i think that the most used instructions
will be stored in on-chip SRAMs. SPI Flash would be used for bootstrap only,
probably with an Alpha-like method (fill the cache from external SPI
then let the CPU execute from address 0).
However, off-chip programs are going to exist, and
a typical use of the VSP core includes a single (or a couple)
SDRAM chip (16-bit wide bus) so your streaming example
is easy to translate to SDRAM.
> Some CPUs have conditional fields in the opcodes, which mean they
> can skip. It tends to be wasteful, as this is not often needed, but the
> CC bits come along for the ride anyway.
Condition Code Registers ... what a pain...
> I've also seen Conditional RET encoded, which used an otherwise
> unused field from the conditional jump variants, and that looked
> like a useful idea - esp. for assembler coding.
where ?
> Have you looked at the Lattice Mico8, and PicoBlaze / PacoBlaze
> SoftCPUs - they have some good 'compact' ideas.
I am not trying to make "the most compact code ever".
Often, this requires a lot of instruction-specific fields here
and there in the instruction word, and their proliferation is nefast
for decoding speed and complexity.
For example, the VSP uses only one immediate field
(16 bits should be enough for most instructions ;-P)
OTOH i have not found a way to use a single place for the
2-bit skip length field (it's in bits 6-7 in the ADDSx instructions,
but in bits 8-9 for conditional skip instructions).
Compromises...
-~o0o~-
Ulf Samuelsson wrote :
> COP800 , HPC16xxx...
This remark made me check what the COP8 is
and i have found an instruction that decrements,
then skips if the result is zeo.
That's used for loops and it's similar to one PIC instruction.
So all I did was generalise this idea.
cool :-)
Thanks everybody for the read,
Yann Guidon
http://ygdes.com
http://f-cpu.org