EmbeddedRelated.com
Forums

Rambus aims for 1 TeraByte per second memory bandwidth by 2010

Started by AirRaid December 3, 2007
On Fri, 07 Dec 2007 22:51:06 -0000, daytripper  
<day_trippr@REMOVEyahoo.com> wrote:

> So, in short, you don't think the biggest problem confronting processor > design and performance isn't important because "it's hard"... > > /daytripper (well, that's one way to go, I guess ;-)
I dunno if its a fair summary of Robert's position, but it is a fair piece of strategy. It is silly to try to solve an impossible problem. It is almost as silly to try to solve an almost impossible problem.
On Dec 8, 9:18 am, Robert Redelmeier <red...@ev1.net.invalid> wrote:
> In comp.sys.ibm.pc.hardware.chips Robert Myers <rbmyers...@gmail.com> wrote in part: > > > For latency, there is nowhere left to go in terms of > > completely unpredictable reads from memory (or disk). > > Sure there is -- SRAM and other designs which take more xtors > per cell. With the continually decreasing marginal cost > of xtors and a shortage of useful things to do with them, > I expect this transition to happen at some point. >
SRAM could shave off 15 ns in the case of DRAM page miss. Or 50-55ns in the case of page conflict, but those are very rare. In the supposedly most common case of DRAM page hit SRAM doesn't help at all. Actually, you will have hard time finding commodity SRAMs that is as fast as now common DDR2-800 CL5 at page hit. Another potential saving with SRAM comes from the fact that memory controller is simpler. Don't know how much it could bring. The likes of Opteron and Power6 run their MCs at very high speed so I'd guess it would be hard to shave off more than 1-2 ns here. Now look at the flop side: 1. Pins - SRAM address bus is up to twice wider than the DRAM. You can construct SRAM with pseudo-pages and multiplexed address bus, but then you give up on part of the latency advantage. 2. Capacity. The big one. SRAM capacity lags behind DRAM by factor of 5-10. It means that you will either need more channels (expensive motherboard, expensive packaging of MPU/NB; not always possible due to mechanical constrains) or more DIMMs per channel. The later noramally means more buffering = higher latency. For example, for DDR2-667 one can put on one channel 2 unbuffered DIMMs (lowest latency), 4 registered DIMMs (medium latency) or up to 8 fully-buffered DIMMs (the highest latency). 3. Power consumption. I'm not an expert in this area, but according to my understanding under heavy load SRAM consumes 2-3 times more power than the equivalent DRAM. That's partly compensated by lower idle power consumption (no need for refresh). 4. Cost. That's the other unfortunate effect of lower capacity.
Ken Hagan wrote:
> On Fri, 07 Dec 2007 22:51:06 -0000, daytripper > <day_trippr@REMOVEyahoo.com> wrote: > >> So, in short, you don't think the biggest problem confronting >> processor design and performance isn't important because "it's hard"... >> >> /daytripper (well, that's one way to go, I guess ;-) > > I dunno if its a fair summary of Robert's position, but it is a fair > piece of strategy. It is silly to try to solve an impossible problem. > It is almost as silly to try to solve an almost impossible problem.
How about It's not important because it is not cost-effective? -- A language that doesn't affect the way you think about programming is not worth knowing. Alan Perlis
In comp.sys.ibm.pc.hardware.chips already5chosen@yahoo.com wrote in part:
> SRAM could shave off 15 ns in the case of DRAM page miss. Or > 50-55ns in the case of page conflict, but those are very > rare. In the supposedly most common case of DRAM page hit > SRAM doesn't help at all. Actually, you will have hard > time finding commodity SRAMs that is as fast as now common > DDR2-800 CL5 at page hit.
You are talking device response times, and I appreciate your information. However, I am interested in system response (software performance), and my measurements are far less encouraging: Latency CPU@MHz mem.ctl RAM ns 88 k8@2000 NForce3 DDR400 144 P3@1000 laptop SO-PC133? 148 2*P3@860 Serverworks ?? 178 P4@1800 i850 RDRAM 184 K7@1667 SiS735 PC133 185 P3@600 440BX PC100 217 2*Cel@500 440BX PC90 234 P2@350 440BX PC100? 288 P2@333 440BX PC66 I do need to find & test some more modern systems, but I'm underwhelmed by the slowness of latency improvement. CPU has increased min 4x, latency response at best 2.5x . Run this pgm from L2 (small set) and it comes back around 10 ns. compile: $ gcc -O2 lat10m.c run: $ time ./a.out [multiply user time by 100 to give ns] /* lat10m.c - Measure latency of 10 million fresh memory reads (C) Copyright 2005 Robert Redelmeier - GPL v2.0 licence granted */ int p[ 1<<21 ] ; main (void) { int i, j ; for ( i=0 ; i < 1<<21 ; i++ ) p[i] = 0x1FFFFF & (i-5000) ; for ( j=i=0 ; i < 9600000 ; i++ ) j = p[j] ; return j ; } -- Robert