Sign in

username:

password:



Not a member?

Search fpga-cpu



Search tips

Subscribe to fpga-cpu



fpga-cpu by Keywords

Altera | CISCifying | IDE | ISA | Java | JHDL | JTAG | LBU | MicroBlaze | PAR | PCI | RISC | SoC | Spartan | Transputers | Verilog | VHDL | Virtex | VLIW | WebPack | Xilinx | Xsoc | YARD-1A

Ads

Discussion Groups

Discussion Groups | FPGA-CPU | Re: Stack Machine Instruction Set

This list is for discussion of the design and implementation of field-programmable gate array based processors and integrated systems. It is also for discussion and community support of the XSOC Project (see http://www.fpgacpu.org/xsoc).

IP Redux - Jan Gray - Nov 24 23:13:00 2004

All, please read http://www.fpgacpu.org/log/sep02.html#IP-redux. Agree?
Disagree? Discuss :-)

Jan.





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )


Re: IP Redux - rtstofer - Nov 25 2:51:00 2004


--- In , "Jan Gray" <jsgray@a...> wrote:
> All, please read http://www.fpgacpu.org/log/sep02.html#IP-redux.
Agree?
> Disagree? Discuss :-)
>
> Jan.

Jan,

I am not a developer so I don't see things from that perspective. I
do remember the early days of hobby computing (before personal
computing) and the myriad suppliers of components and software. I
won't enumerate them but a few examples like Borland, Lotus, Word
Perfect come to mind. Eventually they all rolled up into Microsoft.

In the beginning of FPGAs (from my limited perspective) there simply
wasn't enough talent available, even at the manufacturing companies,
to develop all the things that needed to be created. So, for a few
years, there is a cottage industry for IP developers until a certain
critical mass is achieved. Now, in an effort to compete for chip
sales, the manufacturers are throwing in IP cores. Altera and
Xilinx are in the business of selling chips, not IPs.

From a personal perspective I like the 'free' cores and the 'free'
tools. There is no way in the world I could afford to play with
FPGAs if I had to buy the tools. They would remain unexplored to my
loss.

Even in the highly evolved PC market their are still things that
don't come from Microsoft. Specialized software like Autocad and
PSpice and there are rumors of operating systems other than Windows
(yes, it's true! I have one). Even in the world of IP cores it
will be possible for a few developers to provide 'best of class'
products that integrate perfectly into the chip maker's platform.
But only the very best will survive!

In summary: chip manufacturers are in the business of selling
chips. Period. If it is necessary to give away development tools
and reference cores, so what? Sell more chips to pay for the
overhead. Best of class components will always demand a premium but
only the very best. Critical mass has been reached; everyone better
get on the same bus (pun intended?).

But what do I know, I'm retired. I will leave the battles to the
younger engineers.





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: IP Redux - John Kent - Nov 25 6:42:00 2004

Matlab and Simulink would probably be the best examples of a
GUI/integrated IDE with libraries.
Protel and Labview are also getting into the game with parameterizable
VHDL/Verilog Libraries.
I'm not sure if they allow users to integrate their own modules, but for
small freelance designers
the cost of these packages are fairly prohibitive.

I would quite like to see some work around SciLab, which is free from
Inria in France, to integrate an
open source framework for designing VHDL or Verilog components.

There used to be a free image procesing package called Khoros that had a
suite of applications for designing
and integrating image processing algorithms into glyphs. It was designed
to run under Unix and Linux.
Components communicated using intermediate files and it lacked the
Scientific / Mathematical simulation capability.

I'm not sure if there are any other GNU packages, such as schematic
capture packages, that can form
the basis for a System Builder or if they could be integrated into a
mixed signal simulation package
that allowed you to simulate the interaction between a microprocessor
program and analog circuits.

Some years ago, a friend from the internet pointed me to a mixed signal
simulation package that
allowed you to simulate Microprocessor, LCD displays and other
components, including analog designs,
and you could download and simulate microcomputer programs.

Its all out there, but whether it is affordable and an open
archictecture is another matter.

John.

Jan Gray wrote:

>All, please read http://www.fpgacpu.org/log/sep02.html#IP-redux. Agree?
>Disagree? Discuss :-)
>
>Jan. >

--
http://members.optushome.com.au/jekent





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Re: IP Redux - Jeff Brower - Nov 25 12:42:00 2004


> In summary: chip manufacturers are in the business of selling
> chips. Period. If it is necessary to give away development tools
> and reference cores, so what? Sell more chips to pay for the
> overhead. Best of class components will always demand a premium but
> only the very best. Critical mass has been reached; everyone better
> get on the same bus (pun intended?).

Like Chris, I'm another listener -- a DSP person not ASIC.

Some of Richard's posts are almost crazy -- like trying to resurrect a PDP11 on an
FPGA :-) I keep waiting for his subject line "It's alive!". But they're always fun,
and his point above is right on, exactly right on.

However, there are differences in how chip manufacturers attempt to succeed.
Although Texas Instruments beat Intel to the "first computer on a chip" in the late
'70s (remember calculators), Intel was the tortoise and stuck with the hard, slow
work of creating development tools and business models that helped developers. The
history of PCs is the result. Texas Instruments execs -- some of whom are still in
charge -- never forgot that lesson and they have since pummeled ADI and Mot (and
eliminated AT&T/Lucent completely) in DSPs, in part by focusing on development tools
and educational programs.

I would point out to Jan that TI has no qualms about stepping on its third-party
providers and stealing their products to build in qty and give away for free.
Sometimes, TI is nice about it and tries to buy the 3p :-)

The successful chip manufacturers tend to give away more things all the time. Jan's
theory that:

"If FPGA vendors give away enough free cores, the end effect could
be to discourage pure IP vendors from contributing to that device
vendor's value chain, reducing the supply of device optimized cores,
hence design wins, hence device sales."

will stick to the chip vendors like water to a duck.

As Rich said, the chip vendors are in the business of selling chips. Whatever they
need to do to accomplish that, they will.

-Jeff






(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Stack Machine Instruction Set - Arius - Rick Collins - Nov 25 15:38:00 2004

I am working on a custom CPU to implement in an FPGA. I am optimizing
nearly every aspect of the machine to keep the size small and the speed
high while minimizing the code size (very important since there is
little memory inside the FPGA). After going through several iterations
of designing the instruction set, I ended up with both a relative call
and an absolute call. The relative call was essentially free in terms
of the hardware since the jumps are relative. It seems to me that the
absolute call is the more useful of the two and I could live a rich,
full life without the relative call. However, I have currently
completed a first pass at the design with both call types in the
instruction set and a fair amount of the design is very well optimized.

My question is, will the relative call be pretty useless compared to the
absolute call? Or will both be useful? What situation would make the
relative call more useful? Or do I have it backwards and I should give
up the absolute call?

At this point there are some read/write internal register commands I
could use in place of either one of these call types, but they are
currently memory mapped and working fine. There is no strong need to make
any changes. I am reaching a point where changes to the instruction set
will require excessive amounts of redoing optimizing and debugging. So I
would like to make a final decision.

Any and all comments are appreciated!

Did I mention that my software target is to run forth on this chip? I
think that may make a difference. Rick Collins
Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Stack Machine Instruction Set - Jan Gray - Nov 25 15:59:00 2004

Both relative call and absolute call take you from a fixed address to
another fixed address. So you don't need both.
Relative call might be smaller.
Relative call might facilitate position independent code.
If your tools are always going to emit one or the other, there is no point
having both, documenting both, testing both, except that orthogonality might
make it less elegant to use the relative call opcodes for something else.
Consider the pdp-11 had some addressing modes combinations that were little
or never used, but it was simpler and more elegant to keep 'em than to
forbid or reuse them.
You will probably also need an indirect call, I would think...

In the xr16, although you could always form any call to any address 0xABCD
via the pair of 16-bit insns
IMM 0xABC ;; JAL rd,0xD(r0) // rd := pc, pc := 0xABCD
it was sufficiently frequent that I added a special opcode
CALL func
that encoded all of the above in one 16-bit instruction (assuming the call
target 0xABC0 is 16-byte aligned, and assuming rd==r15, the return address
linkage register).

You have my permission to leave things well enough alone for now. :-) And
just as you tune the ISA to the FPGA, so you should do an iteration of
tuning the ISA to the software tools that generate code for it. Once you
have a working tool flow, you can build and run some programs and that will
give you more refined ideas about what to change next -- in particular, what
to pitch that you thought you would need, but now realize you don't.

Wishing you much fun,
Jan.





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Stack Machine Instruction Set - rtstofer - Nov 25 21:25:00 2004



In my opinion (and yes, more often than not, it's crazy) I would keep
the relative and decide on the absolute. The reason: when you get to
building an operating system and start loading code into the machine
you don't have to translate relative addresses. It's clear you need
some absolute addressing if you are going to 'call' system routines
that are fixed in memory. In fact, absolute plus an index would be
great! Or indirect through a transfer vector, indexed of course.

And it you ever want to relocate a code block after it has been loaded
, relative addressing is the thing to have. All of the absolute
addresses were modified when the code was loaded so nothing changes
regardless of where the code is moved.

Remember guys, I am old, I am retired and I do this stuff for fun.
Oh, and the P4 machine is starting to work quite well. The hard part,
'call' and 'return' with and without parameters and return values
works as do some of the arithmetic functions. Logic functions come
next so I can start debugging 'if' 'then' 'else'.

My naive design is a gigantic synchronous state machine and I think I
will run out of space before I finish. I really want to take another
look at Jan's implementation. It is far more elegant! In fact, I
downloaded it again planning to steal from the design. As near as I
can tell the hardware platform is obsolete and the development tool
chain isn't free. So, I can start over with XSOC or keep going as I
am. Until I hit the wall, it is full steam ahead!

Richard
--- In , "Jan Gray" <jsgray@a...> wrote:
> Both relative call and absolute call take you from a fixed address to
> another fixed address. So you don't need both.
> Relative call might be smaller.
> Relative call might facilitate position independent code.
> If your tools are always going to emit one or the other, there is no
point
> having both, documenting both, testing both, except that
orthogonality might
> make it less elegant to use the relative call opcodes for something
else.
> Consider the pdp-11 had some addressing modes combinations that were
little
> or never used, but it was simpler and more elegant to keep 'em than to
> forbid or reuse them.
> You will probably also need an indirect call, I would think...
>
> In the xr16, although you could always form any call to any address
0xABCD
> via the pair of 16-bit insns
> IMM 0xABC ;; JAL rd,0xD(r0) // rd := pc, pc := 0xABCD
> it was sufficiently frequent that I added a special opcode
> CALL func
> that encoded all of the above in one 16-bit instruction (assuming
the call
> target 0xABC0 is 16-byte aligned, and assuming rd==r15, the return
address
> linkage register).
>
> You have my permission to leave things well enough alone for now.
:-) And
> just as you tune the ISA to the FPGA, so you should do an iteration of
> tuning the ISA to the software tools that generate code for it.
Once you
> have a working tool flow, you can build and run some programs and
that will
> give you more refined ideas about what to change next -- in
particular, what
> to pitch that you thought you would need, but now realize you don't.
>
> Wishing you much fun,
> Jan.






(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Re: Stack Machine Instruction Set - Arius - Rick Collins - Nov 26 0:59:00 2004

Thanks to both you and Jan for your posts. This reply is to both of you.

At 09:25 PM 11/25/2004, you wrote:

>In my opinion (and yes, more often than not, it's crazy) I would keep
>the relative and decide on the absolute. The reason: when you get to
>building an operating system and start loading code into the machine
>you don't have to translate relative addresses. It's clear you need
>some absolute addressing if you are going to 'call' system routines
>that are fixed in memory. In fact, absolute plus an index would be
>great! Or indirect through a transfer vector, indexed of course.

I understand what you are saying. But one of the reasons that I decided to
go with a stack architecture is to avoid the complication of these multiple
addressing modes. I think I did not say it until the end of my message,
but I am designing this CPU to optimize forth and will not be using any
other languages (other than the assembly language which is a lot like the
forth primitives.

Forth is "threaded" and I expect to use subroutine threading. This will
require a call instruction with a fixed destination. There is not much
need that I know of for relocatable code, especially since compiling is
very fast. Instead of linking precompiled modules, you can just recompile
the code.

So I don't think I need a relative call and I am certain I don't need
indexed or calculated calls. However, if I do need that, it can be done
with a pointer to a table of addresses and a call to a small routine that
calculates the address from the table puts it on the return stack and does
a return which then behaves like a jump. Since the return address from the
original call is still on the stack, the routine that is jumped to will
return to the original piece of code when done. >And it you ever want to relocate a code block after it has been loaded
>, relative addressing is the thing to have. All of the absolute
>addresses were modified when the code was loaded so nothing changes
>regardless of where the code is moved.

I think my question is really more of a Forth question. I have asked the
question there as well, I was just trying to cover my bases by discussing
it here. I think that other languages like to see a very flexible
instruction set that is not required to implement forth. In fact, the ISA
of this machine basically *is* forth. I just don't know a lot about how
best to implement forth either in hardware or software. >Remember guys, I am old, I am retired and I do this stuff for fun.
>Oh, and the P4 machine is starting to work quite well. The hard part,
>'call' and 'return' with and without parameters and return values
>works as do some of the arithmetic functions. Logic functions come
>next so I can start debugging 'if' 'then' 'else'.
>
>My naive design is a gigantic synchronous state machine and I think I
>will run out of space before I finish. I really want to take another
>look at Jan's implementation. It is far more elegant! In fact, I
>downloaded it again planning to steal from the design. As near as I
>can tell the hardware platform is obsolete and the development tool
>chain isn't free. So, I can start over with XSOC or keep going as I
>am. Until I hit the wall, it is full steam ahead!

I remember some of your posts here. I believe you are recreating an older
machine, right? >--- In , "Jan Gray" <jsgray@a...> wrote:
> > Both relative call and absolute call take you from a fixed address to
> > another fixed address. So you don't need both.
> > Relative call might be smaller.
> > Relative call might facilitate position independent code.
> > If your tools are always going to emit one or the other, there is no
>point
> > having both, documenting both, testing both, except that
>orthogonality might
> > make it less elegant to use the relative call opcodes for something
>else.
> > Consider the pdp-11 had some addressing modes combinations that were
>little
> > or never used, but it was simpler and more elegant to keep 'em than to
> > forbid or reuse them.
> > You will probably also need an indirect call, I would think...

I agree that having both rel and abs calls is not all that useful. The one
advantage of absolute addresses is that they are a lot more user friendly
while I am hand assembling code. So that is the way I am leaning. If I
had more opcode space, I would just leave it alone. But decoding a single
register takes a bit of work and can end up in a critical timing path if I
am not careful. With the 16 extra opcodes, I can add read and write of 8
registers as dedicated instructions or even use some of them for other
functions since I only have 3 registers at the moment. > > In the xr16, although you could always form any call to any address
>0xABCD
> > via the pair of 16-bit insns
> > IMM 0xABC ;; JAL rd,0xD(r0) // rd := pc, pc := 0xABCD
> > it was sufficiently frequent that I added a special opcode
> > CALL func
> > that encoded all of the above in one 16-bit instruction (assuming
>the call
> > target 0xABC0 is 16-byte aligned, and assuming rd==r15, the return
>address
> > linkage register).

I have looked at a lot of FPGA CPUs on the web, but I think I remember your
machine. I recall that it is *very* streamlined and is actually smaller
than mine. That is pretty good considering that it has a set of 16
registers and mine just has the two stacks. I expect the efficiency comes
from having the registers in LUT ram which can be very fast and includes
the output multiplexor while my machine has to have explicit multiplexors
for all the inputs to the stacks.

I believe my machine will have an advantage when used for implementing
Forth because it executes all the essential primitives in one clock cycle
while RISC machines will typically require multiple clock cycles. It also
has an 8 bit opcode which is a significant issue considering the small
amount of ram in the FPGA I am using, 10 blocks of 512 bytes. > > You have my permission to leave things well enough alone for now.
>:-)

Thank you, I appreciate that... :)
>And
> > just as you tune the ISA to the FPGA, so you should do an iteration of
> > tuning the ISA to the software tools that generate code for it.
>Once you
> > have a working tool flow, you can build and run some programs and
>that will
> > give you more refined ideas about what to change next -- in
>particular, what
> > to pitch that you thought you would need, but now realize you don't.

That is what I don't want to do. I want to wrap up this design and move on
to other things. I probably should not even try to optimize it further for
now, but I may not be coming back to it again for changes.

I'm sure you know that tune!
Rick Collins
Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Stack Machine Instruction Set - Paul Davis - Nov 26 4:09:00 2004


Arius - Rick Collins wrote:
> I am working on a custom CPU to implement in an FPGA. I am optimizing
> nearly every aspect of the machine to keep the size small and the speed
> high while minimizing the code size (very important since there is
> little memory inside the FPGA). After going through several iterations
> of designing the instruction set, I ended up with both a relative call
> and an absolute call. The relative call was essentially free in terms
> of the hardware since the jumps are relative. It seems to me that the
> absolute call is the more useful of the two and I could live a rich,
> full life without the relative call. However, I have currently
> completed a first pass at the design with both call types in the
> instruction set and a fair amount of the design is very well optimized.
>
> My question is, will the relative call be pretty useless compared to the
> absolute call? Or will both be useful? What situation would make the
> relative call more useful? Or do I have it backwards and I should give
> up the absolute call?
>
> At this point there are some read/write internal register commands I
> could use in place of either one of these call types, but they are
> currently memory mapped and working fine. There is no strong need to make
> any changes. I am reaching a point where changes to the instruction set
> will require excessive amounts of redoing optimizing and debugging. So I
> would like to make a final decision.
>
> Any and all comments are appreciated!
>
> Did I mention that my software target is to run forth on this chip? I
> think that may make a difference.

Some random thoughts (I'm assuming you're fairly new to processors,
correct me if I'm wrong :)) -

Why is the relative call 'essentially free'? It needs an adder, whereas
an absolute call doesn't. Do you need to worry about address wrap-around
on your relative calls?

What are the sizes of your relative and absolute calls? Is the relative
call more compact, or could it be made so? Or are all instructions
fixed-length?

Out of interest, did you look at the Xilinx cores? I think there are
free ones(?) - are they any good?

It sounds like you're optimising the hardware before you've had a chance
to run any code and verify your design - is this right?

Everything else being equal (which it normally isn't), the advantage of
a relative call is that it's position-independent; it doesn't require
tools to carry out fixups after relocation in the way that absolute
calls do. So, the $M question is, how important is relocatability to
you? will you have an OS; do you intend to run only one process, or more
than one? If more than one, how will you locate them in memory? Can you
write the tools to do address fixup if necessary, or will it never be
necessary, because you'll only ever have one program, located at address 0?

What about compilers/assemblers/etc.? Do you have a Forth compiler, and
do you know what it produces? Can you port it?

Cheers

Paul





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Stack Machine Instruction Set - rtstofer - Nov 26 9:32:00 2004



Rick,

I got to thinking about your original post 'after' I promoted the
idea of relative addressing. You're probably right, for a strictly
Forth type of machine it is probably unnecessary although I don't
know much about the implementation. That's the nice thing about
FPGAs - if you find out you just absolutely have to have it - add
it! Talk about the ultimate Erector Set!

My first project was to use the T80 core (Z80 emulation) and get
CP/M running with 16 ea 8 MB disk drives (2 CF modules), embedded
graphics (thanks to John Kent) and a PS/2 keyboard. A definite
retro project. Running at 12.5 MHz it is a pretty quick machine
compared to the originals - heck, it is 6 times faster than my
Altair 8800A and that's not counting the difference between CF and
8" floppies!

My current project is to implement the original Pascal P4
interpreter (stack machine) in hardware as opposed to software. My
goal is to have instruction execution speed faster than the CDC 6600
on which it was originally implemented. Faster than a speeding
mainframe!

Richard --- In , Arius - Rick Collins
<dsprelated@a...> wrote:
> Thanks to both you and Jan for your posts. This reply is to both
of you.
>
> At 09:25 PM 11/25/2004, you wrote:
>
> >In my opinion (and yes, more often than not, it's crazy) I would
keep
> >the relative and decide on the absolute. The reason: when you
get to
> >building an operating system and start loading code into the
machine
> >you don't have to translate relative addresses. It's clear you
need
> >some absolute addressing if you are going to 'call' system
routines
> >that are fixed in memory. In fact, absolute plus an index would
be
> >great! Or indirect through a transfer vector, indexed of course.
>
> I understand what you are saying. But one of the reasons that I
decided to
> go with a stack architecture is to avoid the complication of these
multiple
> addressing modes. I think I did not say it until the end of my
message,
> but I am designing this CPU to optimize forth and will not be
using any
> other languages (other than the assembly language which is a lot
like the
> forth primitives.
>
> Forth is "threaded" and I expect to use subroutine threading.
This will
> require a call instruction with a fixed destination. There is not
much
> need that I know of for relocatable code, especially since
compiling is
> very fast. Instead of linking precompiled modules, you can just
recompile
> the code.
>
> So I don't think I need a relative call and I am certain I don't
need
> indexed or calculated calls. However, if I do need that, it can
be done
> with a pointer to a table of addresses and a call to a small
routine that
> calculates the address from the table puts it on the return stack
and does
> a return which then behaves like a jump. Since the return address
from the
> original call is still on the stack, the routine that is jumped to
will
> return to the original piece of code when done. > >And it you ever want to relocate a code block after it has been
loaded
> >, relative addressing is the thing to have. All of the absolute
> >addresses were modified when the code was loaded so nothing
changes
> >regardless of where the code is moved.
>
> I think my question is really more of a Forth question. I have
asked the
> question there as well, I was just trying to cover my bases by
discussing
> it here. I think that other languages like to see a very flexible
> instruction set that is not required to implement forth. In fact,
the ISA
> of this machine basically *is* forth. I just don't know a lot
about how
> best to implement forth either in hardware or software. > >Remember guys, I am old, I am retired and I do this stuff for fun.
> >Oh, and the P4 machine is starting to work quite well. The hard
part,
> >'call' and 'return' with and without parameters and return values
> >works as do some of the arithmetic functions. Logic functions
come
> >next so I can start debugging 'if' 'then' 'else'.
> >
> >My naive design is a gigantic synchronous state machine and I
think I
> >will run out of space before I finish. I really want to take
another
> >look at Jan's implementation. It is far more elegant! In fact, I
> >downloaded it again planning to steal from the design. As near
as I
> >can tell the hardware platform is obsolete and the development
tool
> >chain isn't free. So, I can start over with XSOC or keep going
as I
> >am. Until I hit the wall, it is full steam ahead!
>
> I remember some of your posts here. I believe you are recreating
an older
> machine, right? > >--- In , "Jan Gray" <jsgray@a...> wrote:
> > > Both relative call and absolute call take you from a fixed
address to
> > > another fixed address. So you don't need both.
> > > Relative call might be smaller.
> > > Relative call might facilitate position independent code.
> > > If your tools are always going to emit one or the other, there
is no
> >point
> > > having both, documenting both, testing both, except that
> >orthogonality might
> > > make it less elegant to use the relative call opcodes for
something
> >else.
> > > Consider the pdp-11 had some addressing modes combinations
that were
> >little
> > > or never used, but it was simpler and more elegant to keep 'em
than to
> > > forbid or reuse them.
> > > You will probably also need an indirect call, I would think...
>
> I agree that having both rel and abs calls is not all that
useful. The one
> advantage of absolute addresses is that they are a lot more user
friendly
> while I am hand assembling code. So that is the way I am
leaning. If I
> had more opcode space, I would just leave it alone. But decoding
a single
> register takes a bit of work and can end up in a critical timing
path if I
> am not careful. With the 16 extra opcodes, I can add read and
write of 8
> registers as dedicated instructions or even use some of them for
other
> functions since I only have 3 registers at the moment. > > > In the xr16, although you could always form any call to any
address
> >0xABCD
> > > via the pair of 16-bit insns
> > > IMM 0xABC ;; JAL rd,0xD(r0) // rd := pc, pc := 0xABCD
> > > it was sufficiently frequent that I added a special opcode
> > > CALL func
> > > that encoded all of the above in one 16-bit instruction
(assuming
> >the call
> > > target 0xABC0 is 16-byte aligned, and assuming rd==r15, the
return
> >address
> > > linkage register).
>
> I have looked at a lot of FPGA CPUs on the web, but I think I
remember your
> machine. I recall that it is *very* streamlined and is actually
smaller
> than mine. That is pretty good considering that it has a set of
16
> registers and mine just has the two stacks. I expect the
efficiency comes
> from having the registers in LUT ram which can be very fast and
includes
> the output multiplexor while my machine has to have explicit
multiplexors
> for all the inputs to the stacks.
>
> I believe my machine will have an advantage when used for
implementing
> Forth because it executes all the essential primitives in one
clock cycle
> while RISC machines will typically require multiple clock cycles.
It also
> has an 8 bit opcode which is a significant issue considering the
small
> amount of ram in the FPGA I am using, 10 blocks of 512 bytes. > > > You have my permission to leave things well enough alone for
now.
> >:-)
>
> Thank you, I appreciate that... :) >
> >And
> > > just as you tune the ISA to the FPGA, so you should do an
iteration of
> > > tuning the ISA to the software tools that generate code for it.
> >Once you
> > > have a working tool flow, you can build and run some programs
and
> >that will
> > > give you more refined ideas about what to change next -- in
> >particular, what
> > > to pitch that you thought you would need, but now realize you
don't.
>
> That is what I don't want to do. I want to wrap up this design
and move on
> to other things. I probably should not even try to optimize it
further for
> now, but I may not be coming back to it again for changes.
>
> I'm sure you know that tune! >
> Rick Collins
>
> rick.collins@a...
>
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design http://www.arius.com
> 4 King Ave 301-682-7772 Voice
> Frederick, MD 21701-3110 301-682-7666 FAX





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Stack Machine Instruction Set - Paul Davis - Nov 26 10:45:00 2004

Paul Davis wrote:
<snipped>

Sorry - haven't got used to the latency here - most of this stuff has
already been answered..

Paul





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Stack Machine Instruction Set - Arius - Rick Collins - Nov 26 13:14:00 2004

I have been discussing this in several other forums and I can't remember
what I have posted where, so I don't mind responding to your post even if
it is a bit redundant. I'll try to keep my replies short.

At 04:09 AM 11/26/2004, you wrote:

>Some random thoughts (I'm assuming you're fairly new to processors,
>correct me if I'm wrong :)) -

Yes, you are wrong. I have been designing processors of one sort or
another for some 20 years starting with a microprogrammed IO board using a
Signetics sequencer chip. >Why is the relative call 'essentially free'? It needs an adder, whereas
>an absolute call doesn't. Do you need to worry about address wrap-around
>on your relative calls?

There are two reasons that make the adder for the relative call virtually
free. One is the fact that an adder is required to implement the PC <= PC
+ 1 used by most instructions. To calculate a relative jump or call the
offset is added by the same hardware. The other is the fact that the other
address calculations require a 4 input mux which is implemented as a pair
of 2 input muxes combined with a third 2 input mux. In my design I use the
adder as the final 2 input mux by adding enables to the first two muxes
which can zero their outputs when needed. The adder uses the same number
of LUTs and is required by the PC inc function anyway, so the relative jump
is free while the absolute call requires one of the inputs to the mux (two
actually, but they are the same two required by the relative jump/call so
that is free as well). >What are the sizes of your relative and absolute calls? Is the relative
>call more compact, or could it be made so? Or are all instructions
>fixed-length?

I have an 8 bit instruction with a 7 bit literal field which is appended to
any previous literal values loaded to the return stack immediately
before. This allows literals to be built up so that fewer bytes can be
used to represent smaller values.

The JMPx and CALx instructions contain a 4 bit literal value which is
appended in the same way. So a jump -8 to +7 is done with a single
byte. A jump -1 kB or +1 kB uses two bytes. The return stack and data
stack are 16 bits providing a 64 kB address space. However the program and
data memories are only 1 kB in the current implementation due to the
limited ram in the FPGA. >Out of interest, did you look at the Xilinx cores? I think there are
>free ones(?) - are they any good?

Yes, I did. I was intrigued by the very small pico-blaze. But when I
examined it closely, I found that it was a very limited processor. I don't
recall the details, but they specifically designed the processor around
what you could do with the very least amount of logic. So there are
various limitations to allow the special features of the Xilinx CLBs to be
put to maximum use. At least that is what I seem to recall.

The micro-blaze also has a small version, but it is not free. However
someone is working on an open source duplicate, but not the smaller
version, only the middle or larger version, I don't recall which.

The nios-II processor from Altera also looks very good, but they don't have
a version for the ACEX chips I am using and it certainly won't work in the
Spartan-3 I am also using. >It sounds like you're optimising the hardware before you've had a chance
>to run any code and verify your design - is this right?

Yes. I don't currently have any code to run. This CPU will be used to
control the operation of an FPGA that is emulating a UART interface to the
PC and performing DMA to the DSP memory on the board. I think this problem
is too complex for a state machine and a fancier processor would be
overkill (like the ARM chip I was planning to use). My only concern is the
program space available. I don't know for sure that 1 kB is large
enough. I do have room for growth, but not lots, 3 kB max if I use none
for hardware buffers. >Everything else being equal (which it normally isn't), the advantage of
>a relative call is that it's position-independent; it doesn't require
>tools to carry out fixups after relocation in the way that absolute
>calls do. So, the $M question is, how important is relocatability to
>you? will you have an OS; do you intend to run only one process, or more
>than one? If more than one, how will you locate them in memory? Can you
>write the tools to do address fixup if necessary, or will it never be
>necessary, because you'll only ever have one program, located at address 0?

Yes, that is the question I don't have an answer to, how useful relocatable
code is. The OS will be forth which does not make use of relocatable code
AFAIK. I am expecting the code to be fully recompiled anytime a change is
made. >What about compilers/assemblers/etc.? Do you have a Forth compiler, and
>do you know what it produces? Can you port it?

I don't have a forth tool yet. Right now my best choice seems to be the
MPE forth which I will have to port to my CPU. They have a $400 version
(which may be more if the USD keeps dropping) which is targeted to the
Z8. I am just unsure of how much support I will need to port it.

I am also posting this to the comp.lang.forth newsgroup and the hForth
yahoo group. So I have been getting some good feedback there as well. Thanks for your reply... Rick Collins
Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Stack Machine Instruction Set - Tomasz Sztejka - Dec 2 13:04:00 2004

--- Arius - Rick Collins <> wrote:
(....)
> Yes. I don't currently have any code to run. This CPU
> will be used to
> control the operation of an FPGA that is emulating a UART
> interface to the
> PC and performing DMA to the DSP memory on the board. I
> think this problem
> is too complex for a state machine and a fancier
> processor would be
> overkill (like the ARM chip I was planning to use).

Exactly. Ofcourse there are other processors you could use
(ex. AVR - 8 bit risc at 16Mips max, less than 10$ + free
tools) but I assume, you already have a good reason to use
FPGA in your project.

I think you may safely throw away relative call leaving
absolute call and relative branch (you can save some
decoder's space). Those two can emulate relative call very
easy:

call _fixed_bridge
_fixed_bridge: branch_relative _offset If you like to be minimal, please take a look at :

http://www.sztejkat.prv.pl/downloads/missm/index.html

This is a project of stack processor I made recently. I
did not implemented it in hardware - the behavioral
simulation was done however. You will also find
retargetable assembler what you may find usefull with two
example targets and set of GUI libraries to make simulator
frontend linked with verilog behavioral simulator. =====
Tomasz Sztejka
POLON ALFA
(work) http://www.polon-alfa.com.pl/
(private) http://www.sztejkat.prv.pl/
___________________________________________________________
Moving house? Beach bar in Thailand? New Wardrobe? Win £10k with Yahoo! Mail to make your dream a reality.
Get Yahoo! Mail www.yahoo.co.uk/10k






(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: Stack Machine Instruction Set - Arius - Rick Collins - Dec 2 14:18:00 2004

At 01:04 PM 12/2/2004, you wrote:

> Exactly. Ofcourse there are other processors you could use
>(ex. AVR - 8 bit risc at 16Mips max, less than 10$ + free
>tools) but I assume, you already have a good reason to use
>FPGA in your project.

Yes, the FPGA is already there. I also have a small MCU on the board as
well, but I am trying to minimize the parts cost and I can use a $2 very
low power MCU if I put the fast one in the FPGA. The FPGA is unpowered
(along with most of the rest of the board) for a low power standby mode
while the small MCU continues to run. > I think you may safely throw away relative call leaving
>absolute call and relative branch (you can save some
>decoder's space). Those two can emulate relative call very
>easy:
>
> call _fixed_bridge
>_fixed_bridge: branch_relative _offset

Currently I am not using the relative call and am leaving the opcode space
for one of two possible extensions; adding IO mapped IO fetch and store vs.
combining a return with about half of the current instructions. In an
instruction frequency analysis of Forth by Koopman,
http://www.ece.cmu.edu/~koopman/stack_computers/sec6_3.html, he found that
the CALL and RETURN instructions are used very frequently both by a dynamic
measure (how often they are executed) and a static measure (how often they
appear in the code). So by adding a return operation in a large number of
instructions, it saves both the code space used and the execution time for
the return. But I need to write some of my own code to find out which
instructions will be optimal to combine with the return (since I can't
combine all of them) and see if this is practical to implement in the
instruction space. > If you like to be minimal, please take a look at :
>
>http://www.sztejkat.prv.pl/downloads/missm/index.html
>
> This is a project of stack processor I made recently. I
>did not implemented it in hardware - the behavioral
>simulation was done however. You will also find
>retargetable assembler what you may find usefull with two
>example targets and set of GUI libraries to make simulator
>frontend linked with verilog behavioral simulator.

Yes, this is a very minimal machine. But I don't think it would be optimal
for my target. My instruction set is so simple, that I can implement the
assembler as part of my VHDL code. Once the CPU is running, I expect to
use Forth as the programming language.

Thanks for your inputs.

Rick Collins
Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX




(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: IP Redux - Alex Gibson - Dec 27 4:40:00 2004

John Kent wrote:

>Matlab and Simulink would probably be the best examples of a
>GUI/integrated IDE with libraries.
>Protel and Labview are also getting into the game with parameterizable
>VHDL/Verilog Libraries.
>I'm not sure if they allow users to integrate their own modules, but for
>small freelance designers
>the cost of these packages are fairly prohibitive. Only thing is with dxp2004 / protel designed to be used with their
software.
Not portable. Be interesting to see if eagle follows along.
(the only commerical cad program that is available for windows, linux
and mac osx)
>I would quite like to see some work around SciLab, which is free from
>Inria in France, to integrate an
>open source framework for designing VHDL or Verilog components. Surely it be easier to make it as a plugin for eclipse (like xiinx has
done),
only that means using java. http://www.eclipse.org/projects/index.html

Probably could get the eclipse tools guys to help with the addin.

A combination of these, would surely make a soc builder app
(well at least the front end)
http://www.eclipse.org/tools/index.html
http://www.eclipse.org/technology/index.html
http://www.eclipse.org/gef/ whats the
picture on the front page!(logic design app)

>There used to be a free image procesing package called Khoros that had a
>suite of applications for designing
>and integrating image processing algorithms into glyphs. It was designed
>to run under Unix and Linux.
>Components communicated using intermediate files and it lacked the
>Scientific / Mathematical simulation capability. Not free any more.
Was very buggy software that crashed very regularly in my experiance.
(windows or solaris)
news:comp.soft-sys.khoros

Recently became visiquest.

Should be able to find a copy of the free student of khoras verson
floating around somewhere,
or old linux / unix version(lots from google)
or 15 day eval version of visiquest
http://www.accusoft.com/support/evalcenter/

>I'm not sure if there are any other GNU packages, such as schematic
>capture packages, that can form
>the basis for a System Builder or if they could be integrated into a
>mixed signal simulation package
>that allowed you to simulate the interaction between a microprocessor
>program and analog circuits. >Some years ago, a friend from the internet pointed me to a mixed signal
>simulation package that
>allowed you to simulate Microprocessor, LCD displays and other
>components, including analog designs,
>and you could download and simulate microcomputer programs. There are a few but not free or opensource.

Wasn't a compiler / ide like sourcebuilder /pic c compiler
www.picant.com >Its all out there, but whether it is affordable and an open
>archictecture is another matter.
>
>John. Alex --
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.298 / Virus Database: 265.6.5 - Release Date: 26/12/2004





(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )

Re: IP Redux - John Kent - Dec 28 1:24:00 2004

Hi Alex, Alex Gibson wrote:

>Surely it be easier to make it as a plugin for eclipse (like xiinx has
>done),
>only that means using java. http://www.eclipse.org/projects/index.html
>
>Probably could get the eclipse tools guys to help with the addin.
>
>A combination of these, would surely make a soc builder app
>(well at least the front end)
>http://www.eclipse.org/tools/index.html
>http://www.eclipse.org/technology/index.html
>http://www.eclipse.org/gef/ whats the
>picture on the front page!(logic design app) >
I took a look at the eclipse web site and am downloading the code to see
what its all about.
The thing about SCILAB is that it provides a Maths and scientific
package on which to model
and simulate the algorithm.

>>There used to be a free image procesing package called Khoros that had a
>>
>>
>>
snip

>Not free any more.
>Was very buggy software that crashed very regularly in my experiance.
>(windows or solaris)
>news:comp.soft-sys.khoros
>
>Recently became visiquest.
>
>Should be able to find a copy of the free student of khoras verson
>floating around somewhere,
>or old linux / unix version(lots from google)
>or 15 day eval version of visiquest
>http://www.accusoft.com/support/evalcenter/ >
From what I could tell, when I looked up khoros on google a few years ago,
the business had taken a different direction, whether it was more along
the lines of
eclipse I can't remember. I have an old copy of Khoros but I think it
is reliant
on some of the old linux libraries, that might be hard to find now.

I remember in the early days of the internet downloading a 10Mbyte
Mathematical
Morphology library from Brazil, and wondering how much the download was
going to cost.
Its nothing now.

You are probably right in that it was pretty buggy.

>>I'm not sure if there are any other GNU packages, such as schematic
>>capture packages, that can form
>>the basis for a System Builder or if they could be integrated into a
>>mixed signal simulation package
>>that allowed you to simulate the interaction between a microprocessor
>>program and analog circuits.
>>
>>
>>
snip

> There are a few but not free or opensource.
> Wasn't a compiler / ide like sourcebuilder /pic c compiler
> www.picant.com

I think even the FPGA vendors are having trouble supplying
an integrated environment. I have a amateur radio friend trying to
use some old 3000 series FPGAs. The development software
was sourced from a number of companies. The license had expired
and Xilinx had fallen out with the original vendors so there was no
way of getting the software relicensed.

You could ask what he was doing still using the 3000 series.
The answer is that the product had a long lifetime and the
3000 package was particularly small so fitted in the space provided.

John.

--
http://members.optushome.com.au/jekent
[Non-text portions of this message have been removed]




(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )