Rambus aims for 1 TeraByte per second memory bandwidth by 2010| page 3

Reply by Ken Hagan ●December 10, 20072007-12-10

On Fri, 07 Dec 2007 22:51:06 -0000, daytripper  
<day_trippr@REMOVEyahoo.com> wrote:

> So, in short, you don't think the biggest problem confronting processor  
> design and performance isn't important because "it's hard"...
>
> /daytripper (well, that's one way to go, I guess ;-)

I dunno if its a fair summary of Robert's position, but it is a fair
piece of strategy. It is silly to try to solve an impossible problem.
It is almost as silly to try to solve an almost impossible problem.

Reply by ●December 10, 20072007-12-10

On Dec 8, 9:18 am, Robert Redelmeier <red...@ev1.net.invalid> wrote:
> In comp.sys.ibm.pc.hardware.chips Robert Myers <rbmyers...@gmail.com> wrote in part:
>
> > For latency, there is nowhere left to go in terms of
> > completely unpredictable reads from memory (or disk).
>
> Sure there is -- SRAM and other designs which take more xtors
> per cell.  With the continually decreasing marginal cost
> of xtors and a shortage of useful things to do with them,
> I expect this transition to happen at some point.
>

SRAM could shave off 15 ns in the case of DRAM page miss. Or 50-55ns
in the case of page conflict, but those are very rare. In the
supposedly most common case of DRAM page hit SRAM doesn't help at all.
Actually, you will have hard time finding commodity SRAMs that is as
fast as now common DDR2-800 CL5 at page hit.
Another potential saving with SRAM comes from the fact that memory
controller is simpler. Don't know how much it could bring. The likes
of Opteron and Power6 run their MCs at very high speed so I'd guess it
would be hard to shave off more than 1-2 ns here.

Now look at the flop side:
1. Pins - SRAM address bus is up to twice wider than the DRAM. You can
construct SRAM with pseudo-pages and multiplexed address bus, but then
you give up on part of the latency advantage.
2. Capacity.  The big one. SRAM capacity lags behind DRAM by factor of
5-10. It means that you will either need more channels (expensive
motherboard, expensive packaging of MPU/NB; not always possible due to
mechanical constrains) or more DIMMs per channel. The later noramally
means more buffering = higher latency. For example, for DDR2-667 one
can put on one channel 2 unbuffered DIMMs (lowest latency), 4
registered DIMMs (medium latency) or up to 8 fully-buffered DIMMs (the
highest latency).
3. Power consumption. I'm not an expert in this area, but according to
my understanding under heavy load SRAM consumes 2-3 times more power
than the equivalent DRAM. That's partly compensated by lower idle
power consumption (no need for refresh).
4. Cost. That's the other unfortunate effect of lower capacity.

Reply by John Ahlstrom ●December 10, 20072007-12-10

Ken Hagan wrote:
> On Fri, 07 Dec 2007 22:51:06 -0000, daytripper 
> <day_trippr@REMOVEyahoo.com> wrote:
> 
>> So, in short, you don't think the biggest problem confronting 
>> processor design and performance isn't important because "it's hard"...
>>
>> /daytripper (well, that's one way to go, I guess ;-)
> 
> I dunno if its a fair summary of Robert's position, but it is a fair
> piece of strategy. It is silly to try to solve an impossible problem.
> It is almost as silly to try to solve an almost impossible problem.


How about
    It's not important because it is not cost-effective?

-- 
A language that doesn't affect the way
you think about programming is
not worth knowing.
           Alan Perlis

Reply by ●December 11, 20072007-12-11

In comp.sys.ibm.pc.hardware.chips already5chosen@yahoo.com wrote in part:
> SRAM could shave off 15 ns in the case of DRAM page miss. Or
> 50-55ns in the case of page conflict, but those are very
> rare. In the supposedly most common case of DRAM page hit
> SRAM doesn't help at all.  Actually, you will have hard
> time finding commodity SRAMs that is as fast as now common
> DDR2-800 CL5 at page hit.

You are talking device response times, and I appreciate your
information.  However, I am interested in system response (software
performance), and my measurements are far less encouraging:

Latency    CPU@MHz mem.ctl RAM
  ns

  88       k8@2000 NForce3 DDR400
 144       P3@1000 laptop SO-PC133?
 148     2*P3@860  Serverworks ??
 178       P4@1800 i850   RDRAM
 184       K7@1667 SiS735 PC133
 185       P3@600  440BX  PC100
 217    2*Cel@500  440BX  PC90
 234       P2@350  440BX  PC100?
 288       P2@333  440BX  PC66

I do need to find & test some more modern systems, but I'm
underwhelmed by the slowness of latency improvement.  CPU has
increased min 4x, latency response at best 2.5x .  Run this
pgm from L2 (small set) and it comes back around 10 ns.



compile: $  gcc -O2 lat10m.c
run:     $  time ./a.out    [multiply user time by 100 to give ns]

/*  lat10m.c - Measure latency of 10 million fresh memory reads
  (C) Copyright 2005 Robert Redelmeier - GPL v2.0 licence granted  */
int  p[ 1<<21 ] ;
main (void) {
int i, j ;
for ( i=0   ; i < 1<<21   ; i++ )  p[i] = 0x1FFFFF & (i-5000) ;
for ( j=i=0 ; i < 9600000 ; i++ )     j = p[j] ;
return  j ; }


-- Robert

Previous 1 23Next

Rambus aims for 1 TeraByte per second memory bandwidth by 2010

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group