EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Using DDR RAM

Started by rtstofer October 2, 2009
I bought a Digilent Spartan 3E Starter Board and it comes with 32M x 16 of DDR RAM. They don't provide a controller core.

So I started over at OpenCores and downloaded a DDR core but I haven't even begun to study it. Instead I went to the datasheet...

I notice that it takes 2 clocks to get the first word from memory. Then I can get a word every 1/2 clock. It's that initial 2 clocks that is bothering me. If I read randomly, I can count on at least 2 clocks of latency.

Is the idea to implement some kind of cache such that I read a burst into a page cache? Then, I suppose I need to have some kind of map to keep track of pages in cache, etc. I'm thinking that writes might not be cached (write through). Or maybe they are delayed until a page is swapped out (it's dirty).

My question is kind of general. Is this what everybody is doing with DDR RAM? Or are they just running such a fast RAM clock that, from the point of view of the CPU core, RAM is still pretty fast?

Any reference documents (other than the datasheet) that might get me started?

Richard



To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
On Fri, 2 Oct 2009, rtstofer wrote:

> Date: Fri, 02 Oct 2009 18:57:02 -0000
> From: rtstofer
> Reply-To: f...
> To: f...
> Subject: [fpga-cpu] Using DDR RAM
>
> I bought a Digilent Spartan 3E Starter Board and it comes with 32M x 16 of
> DDR RAM. They don't provide a controller core.
>
> So I started over at OpenCores and downloaded a DDR core but I haven't even
> begun to study it. Instead I went to the datasheet...
>
> I notice that it takes 2 clocks to get the first word from memory. Then I
> can get a word every 1/2 clock. It's that initial 2 clocks that is
> bothering me. If I read randomly, I can count on at least 2 clocks of
> latency.
>
> Is the idea to implement some kind of cache such that I read a burst into a
> page cache? Then, I suppose I need to have some kind of map to keep track
> of pages in cache, etc. I'm thinking that writes might not be cached (write
> through). Or maybe they are delayed until a page is swapped out (it's
> dirty).
>
> My question is kind of general. Is this what everybody is doing with DDR
> RAM? Or are they just running such a fast RAM clock that, from the point of
> view of the CPU core, RAM is still pretty fast?

Someone said "All programming is an exersize in caching" This also goes for
computer development.

DRAM access especially when page boundaries are crossed is still really slow.
modern CPUs may spin for 100s of clocks waiting for a new line of cache data
from the DRAM in this case. In fact DRAM transfer rates have gone up by a
factor of around 1000 since the first DRAMS but the random access time has
only been sped up by a factor of 10 or so. Even with a FPGA CPU, SDRAM is just
a cache filler...

>
> Any reference documents (other than the datasheet) that might get me
> started?
>
> Richard

Peter Wallace
Mesa Electronics

(\__/)
(='.'=) This is Bunny. Copy and paste bunny into your
(")_(") signature to help him gain world domination.



To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
If I remember correctly, I saw some benchmark from way back when, with best
processors at the time being P4-2.2GHz, P3-1.4GHz and
AMD-something-in-between, 1.6GHz or so, I think it was SpecInt 2000 with vpr
for the test. P3 had regular SDRAM, AMD had DDR SDRAM and P4 was one of
those unfortunate ones to have RLDRAM or something like that. P3 was the
fastest, AMD was next and P4 finished last despite of having the highest raw
clock frequency. P3 had least amount of RAM, so RAM size was not an issue,
just the RAM architecture. Even P3 at 1.26GHz was faster than P4 or AMD.

-----Original Message-----
From: f... [mailto:f...]On Behalf
Of Peter C. Wallace
Sent: Friday, October 02, 2009 2:33 PM
To: f...
Subject: Re: [fpga-cpu] Using DDR RAM
On Fri, 2 Oct 2009, rtstofer wrote:

> Date: Fri, 02 Oct 2009 18:57:02 -0000
> From: rtstofer
> Reply-To: f...
> To: f...
> Subject: [fpga-cpu] Using DDR RAM
>
> I bought a Digilent Spartan 3E Starter Board and it comes with 32M x 16
of
> DDR RAM. They don't provide a controller core.
>
> So I started over at OpenCores and downloaded a DDR core but I haven't
even
> begun to study it. Instead I went to the datasheet...
>
> I notice that it takes 2 clocks to get the first word from memory. Then
I
> can get a word every 1/2 clock. It's that initial 2 clocks that is
> bothering me. If I read randomly, I can count on at least 2 clocks of
> latency.
>
> Is the idea to implement some kind of cache such that I read a burst
into a
> page cache? Then, I suppose I need to have some kind of map to keep
track
> of pages in cache, etc. I'm thinking that writes might not be cached
(write
> through). Or maybe they are delayed until a page is swapped out (it's
> dirty).
>
> My question is kind of general. Is this what everybody is doing with DDR
> RAM? Or are they just running such a fast RAM clock that, from the point
of
> view of the CPU core, RAM is still pretty fast?

Someone said "All programming is an exersize in caching" This also goes
for
computer development.

DRAM access especially when page boundaries are crossed is still really
slow.
modern CPUs may spin for 100s of clocks waiting for a new line of cache
data
from the DRAM in this case. In fact DRAM transfer rates have gone up by a
factor of around 1000 since the first DRAMS but the random access time has
only been sped up by a factor of 10 or so. Even with a FPGA CPU, SDRAM is
just
a cache filler...

>
> Any reference documents (other than the datasheet) that might get me
> started?
>
> Richard
>
>
>
>

Peter Wallace
Mesa Electronics

(\__/)
(='.'=) This is Bunny. Copy and paste bunny into your
(")_(") signature to help him gain world domination.





To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
--- In f..., "rtstofer" wrote:

> I notice that it takes 2 clocks to get the first word from memory. Then I can get a word every 1/2 clock. It's that initial 2 clocks that is bothering me. If I read randomly, I can count on at least 2 clocks of latency.

This 2 clocks is only, if you are on the right page already, right ?



To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
Hi Richard,

rtstofer wrote:
> I bought a Digilent Spartan 3E Starter Board and it comes with 32M x 16 of DDR RAM. They don't provide a controller core.
>
> So I started over at OpenCores and downloaded a DDR core but I haven't even begun to study it. Instead I went to the datasheet...
>
> I notice that it takes 2 clocks to get the first word from memory. Then I can get a word every 1/2 clock. It's that initial 2 clocks that is bothering me. If I read randomly, I can count on at least 2 clocks of latency.
>
> Is the idea to implement some kind of cache such that I read a burst into a page cache? Then, I suppose I need to have some kind of map to keep track of pages in cache, etc. I'm thinking that writes might not be cached (write through). Or maybe they are delayed until a page is swapped out (it's dirty).
>
> My question is kind of general. Is this what everybody is doing with DDR RAM? Or are they just running such a fast RAM clock that, from the point of view of the CPU core, RAM is still pretty fast?
>
> Any reference documents (other than the datasheet) that might get me started?
>
> Richard
>
The DDR controller on open cores is very basic, but if you follow the
link to the company who is offering the commercial version, you will
find a detailed description of how the commercial controller core works.
The 2 clock delay is dependent on the configuration of the DDR. You can
clock the DDR faster if you use a 2.5 clock delay. I was study the
Micron data sheet fairly closely a month or two ago, but I have
forgotten the details.

First off you have to initialize the SDRAM with a configuration start up
procedure. I think the idea then is to do a row address strobe first,
which is called an Activate cycle, then a clock or two later you can do
a burst read or write cycle within a column. The data is actually
delayed a few clock cycles from the read and write request. If you want
to do a write after a read, you have to terminate the read cycle, (with
a precharge ?) then initiate a new write cycle.

The DDR is pipelined, so you can actually overlay some of the precharge
and activate cycles I think. To overcome having to terminate and
reinitiate a memory access, the chips usually have 4 banks that you can
access. I think the idea is that you can interleave bank accesses, so
for instance, you can be reading from one bank and writing to another
(others might care to correct me on that if I'm wrong). There are also
issues with auto refresh, which you might have to terminate the read or
write access cycle to perform.

I came up against that barrier with my 6809 design on the XESS board.
Although the SDRAM clocks at 100MHz, it takes 6 to 8 cycles for each
random access, so yes, you really need to use cache to get any speed out
of it. That is where dual ported block RAM comes to the rescue. If you
asked me about it a few months ago, I might have been able to give you a
more coherent answer.

Check out the data sheet for the chip. There are a number of different
ways you can use the SDRAMs to optimize them for the CPU design you are
using.

John.

--
http://www.johnkent.com.au
http://members.optusnet.com.au/jekent



To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
--- In f..., John Kent wrote:

> I came up against that barrier with my 6809 design on the XESS board.
> Although the SDRAM clocks at 100MHz, it takes 6 to 8 cycles for each
> random access, so yes, you really need to use cache to get any speed out
> of it. That is where dual ported block RAM comes to the rescue. If you
> asked me about it a few months ago, I might have been able to give you a
> more coherent answer.
>
> Check out the data sheet for the chip. There are a number of different
> ways you can use the SDRAMs to optimize them for the CPU design you are
> using.
>
> John.
>

Hi John,

I have been thinking about the instruction cache and perhaps doing something like fetching 16 words per access. Basically, I would just mask off the lower 4 bits of the address and start a fetch into BlockRAM. As soon as I have the word the CPU wants, I can let it proceed while I grab the remainder of the block.

Nothing very sophisticated. The idea of wide fetches has been around for a very long time.

Interleaving is another very old idea. I haven't read enough of the datasheet to see if that is helpful.

I think for my simple projects, DDR is going to be the long way around. I'm looking at the Spartan 3E (500k) device because it comes in a PQ208 package that I might actually be able to mount. I can add a simple SRAM and I'm in business. Or maybe the Spartan 3 (400k) in the TQ144 package.

Richard



To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
This bounced

John Kent wrote:
>
> rtstofer wrote:
>> Hi John,
>>
>> I have been thinking about the instruction cache and perhaps doing
>> something like fetching 16 words per access. Basically, I would just
>> mask off the lower 4 bits of the address and start a fetch into
>> BlockRAM. As soon as I have the word the CPU wants, I can let it
>> proceed while I grab the remainder of the block.
>>
>> Nothing very sophisticated. The idea of wide fetches has been around
>> for a very long time.
>>
>> Interleaving is another very old idea. I haven't read enough of the
>> datasheet to see if that is helpful.
>>
>> I think for my simple projects, DDR is going to be the long way
>> around. I'm looking at the Spartan 3E (500k) device because it comes
>> in a PQ208 package that I might actually be able to mount. I can add
>> a simple SRAM and I'm in business. Or maybe the Spartan 3 (400k) in
>> the TQ144 package.
>>
>> Richard
>>
> Hi Richard,
>
> There is the Xilinx Memory Interface Generator (MIG 2.1). I notice
> Xilinx are running a Webinar on the memory interface in the Virtex 6
> and Spartan 6 on Tuesday. It would be interest to see, but it's at
> 11am US EDT or 2am in the morning Australian EDST time.
>
> I don't think you said what CPU you were using the memory for. What
> size address bus are you using ? With the block RAM, I assume its a
> matter of storing and comparing the high order address in the cache as
> well as the data, to determine if the cache entry is valid. You could
> buffer 16 words, but if you are using a burst access you are limited
> to 2, 4 or 8 words. Given that the block RAM is going to be larger
> than 16 words, you may as well use it as a full blown cache. I was
> looking at designing a 4 way (set ?) associative cache that worked
> with the (tag ?) cache and allowed you to cache memory in 4 different
> address ranges.
>
> I was having a discussion with Tommy Thorn on this list back in April
> this year about his MIPS compatible YARI CPU that used 4-way
> associative instruction and data cache. That might give you some
> design clues.
>
> http://yari.thorn.ws/YARI/Introduction.html
>
> I've seen very little activity on this list since then. I hope my
> email address has been working.
>
> John.
>

--
http://www.johnkent.com.au
http://members.optusnet.com.au/jekent



To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
--- In f..., John Kent wrote:
>
> This bounced
>
> John Kent wrote:
> > Hi Richard,
> >

> > I don't think you said what CPU you were using the memory for. What
> > size address bus are you using ? With the block RAM, I assume its a
> > matter of storing and comparing the high order address in the cache as
> > well as the data, to determine if the cache entry is valid. You could
> > buffer 16 words, but if you are using a burst access you are limited
> > to 2, 4 or 8 words. Given that the block RAM is going to be larger
> > than 16 words, you may as well use it as a full blown cache. I was
> > looking at designing a 4 way (set ?) associative cache that worked
> > with the (tag ?) cache and allowed you to cache memory in 4 different
> > address ranges.
> >
> > John.

I bought the Spartan 3E Starter Board from Digilent because it is the basis of the latest Pacman arcade design. This is the very same project I didn't finish years ago when I bought the BurchEd board. I got as far as getting the T80 core to run, added a couple of Compact Flash devices and brought up CP/M. Great fun!

For the last 3 or 4 years I have been using the Spartan 3 Starter Board because, among other things, it has SRAM. It is also available with 1000k gates which is useful since my VHDL looks like it was written by a kindergartener with crayons.

As you may recall, I got the IBM 1130 emulation to run on that board and since then I have also gotten a PDP11-xx to run. The xx would be -40 if I had the MMU working. As it is, it runs RTL fairly well. I want the -40 version so I can run Unix V6.

The only reason I brought up the DDR question is that the 3E board has DDR but no SRAM. I was wondering if I could actually use the board for something other than an arcade machine. I was going to build a one-off PCB for the machine and I would have the 3E board for something else. But, given the lack of SRAM, maybe it is better to just bury the board inside the cabinet and go back to what I know; the Spartan 3 board. SRAM is good!

I have no idea where I am going with all this but I had thought to port both of my projects. I think I'll abandon that idea. I just keep bumping into things and sometimes my projects work out.

Richard



To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
rtstofer wrote:
> I bought the Spartan 3E Starter Board from Digilent because it is the basis of the latest Pacman arcade design. This is the very same project I didn't finish years ago when I bought the BurchEd board. I got as far as getting the T80 core to run, added a couple of Compact Flash devices and brought up CP/M. Great fun!
>
> For the last 3 or 4 years I have been using the Spartan 3 Starter Board because, among other things, it has SRAM. It is also available with 1000k gates which is useful since my VHDL looks like it was written by a kindergartener with crayons.
>
> As you may recall, I got the IBM 1130 emulation to run on that board and since then I have also gotten a PDP11-xx to run. The xx would be -40 if I had the MMU working. As it is, it runs RTL fairly well. I want the -40 version so I can run Unix V6.
>
> The only reason I brought up the DDR question is that the 3E board has DDR but no SRAM. I was wondering if I could actually use the board for something other than an arcade machine. I was going to build a one-off PCB for the machine and I would have the 3E board for something else. But, given the lack of SRAM, maybe it is better to just bury the board inside the cabinet and go back to what I know; the Spartan 3 board. SRAM is good!
>
> I have no idea where I am going with all this but I had thought to port both of my projects. I think I'll abandon that idea. I just keep bumping into things and sometimes my projects work out.
>
> Richard
>
Hi Richard,

Yes, I remember you sending me the code for the Z80 CP/M system on the
B5-X300 board. The boot up system was quite easy. OK on using the 1000K
gate Spartan 3 starter board. An internet friend used the System09 CPU
core to build a FPGA CoCo3 using that board. he wrote most of it in
verilog.

A friend from the Flex users group sent me this:

http://alexfreed.com/FPGApple/CB/System09.html

Carte Blanche is an FPGA-based card for the Apple II that has just become available

http://www.applelogic.org/CarteBlanche.html

The system09 for the Spartan 3E500 board just uses Block RAM. There is
enough RAM for monitor ROM, VDU and 32KBytes of user RAM, although I
need another 8KBytes of RAM to run the Flex Operating System.

Ok on the PDP-11 design. Sounds pretty good. An acquaintance from the
list sent me his PDP-8 design which I was playing with a few years ago
on the B5-X300 board. I think I have told you about that some time ago.
There were a few friends who were looking at interfacing the FPGA PDP-8
to some original DEC interface boards. I have lost touch with them, so
I'm not sure what they are up to now.

I have two 3S500E boards and I bought a VDEC-1 video digitizer board to
do some image processing on one of them. I would have liked to couple
two of the boards together to do stereo vision, but there are not enough
spare pins to pipe the video stream between the two boards. I came up
against the same problem as you with the lack of DDR controller although
I have not tried using the MIG2.1.

Stereopsis can be performed in a raster scan fashion, which is fairly
efficient for DDR as you can buffer a complete scan line in one column
access, but if you want to perform non raster accesses, as might be the
case for say warping using back projection, the bandwidth of SDRAM
becomes a problem, and you have to work out efficient caching strategies
to overcome the bottle neck.

I'm still using ISE and EDK 7.1 and 8.1. I can't really afford to be
upgrading EDK each time Xilinx make a new release. I did download
webpack ISE 11 the other day but I have not installed it yet. I am told
they have removed support for the Spartan 2 so I'll need to maintain the
old versions if I want to still work on the BurchED boards. The low cost
boards from most of the vendors still seem to be using the Spartan 3. I
haven't seen much around for the more recent Spartan FPGAs, but then I
have not really been looking.

Anyway, I better get on with my work.
Talk soon.

John.

--
http://www.johnkent.com.au
http://members.optusnet.com.au/jekent



To post a message, send it to: f...
To unsubscribe, send a blank message to: f...
Hi Richard,

> As you may recall, I got the IBM 1130 emulation to run on that board
> and since then I have also gotten a PDP11-xx to run. The xx would be
> -40 if I had the MMU working. As it is, it runs RTL fairly well. I
> want the -40 version so I can run Unix V6.

very nice! If I remember correctly, the MMU of the -40 isn't
that complicated, so that the effort of implementing it should
be moderate.

I finally had success in porting UNIX V7 to my ECO32 processor
which runs on an XESS XSA-3S1000 board. I put the board onto
an XST extender board, which gives me two serial interfaces
and an IDE connector for the hard disk (as well as a bunch of
other interfaces - audio, video, ethernet, USB - but I cannot
use them yet from within V7).

The board has 32 Mbytes of SDRAM. I used a slightly modified
version of the RAM controller that XESS is providing (thanks
to Dave van den Bout, who was always willing to help). The
memory is relatively slow, so the machine would benefit a lot
from a cache.

A port to the Digilent 3E Starter Board with its DDR RAM
is also on my agenda. But as always, there are many ideas
and little time to spare...

Hellwig



To post a message, send it to: f...
To unsubscribe, send a blank message to: f...

The 2024 Embedded Online Conference