Reply by Jeremy Williamson January 31, 20052005-01-31
"Ronald H. Nicholson Jr." <rhn@mauve.rahul.net> wrote in message
news:ctf9ic$n4c$1@blue.rahul.net...
> In article <ct3tks$8gj$1@news01.intel.com>, > Jeremy Williamson <jeremiah.d.williamson@NOSPAMintel.com> wrote: > > > >"Ronald H. Nicholson Jr." <rhn@mauve.rahul.net> wrote in message > >news:cssfcg$fva$1@blue.rahul.net... > >> In article <name99-42B264.17530821012005@localhost>, > >> Maynard Handley <name99@name99.org> wrote: > >> >Bottom line is that this thing doesn't resemble any traditional CPU
and
> >> >is therefore a godawful match to existing languages, compilers and > >> >algorithms. > >> > >> GPU shader algorithms and languages? Common DSP library/toolbox calls? > ... > >You do realize that this will still have a GPU, and as a matter of fact
Sony
> >gave up the idea of doing it themselves and gave the contract to nVidia
(my
> >guess was cost vs. bowing to requests from the SW community).
Essentially
> >that means all your basic T&L (including your shader algorithms) are
still
> >done on the GPU. > > Yes, but aren't people experimenting with using shader and DSP languages > and tools for stuff that has nothing to do with the workstation display > or audio output? > > The question is whether this software is commercially > interesting and whether this cell device is more suited for this stuff > than the GPU's on which these algorithm were developed. > > > IMHO. YMMV. > -- > Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ > #include <canonical.disclaimer> // only my own opinions, etc.
Yes, especially since GPUs are slowly becoming more generically programmable (due to the pixel and vertex shaders). AIUI, the next gen is likely to have primitive branching. There was a full day workshop on porting apps to the GPU at SIGGRAPH last year. There was even a published paper of someone porting a database to the GPU. But, what I was trying to say is the Cell is not a GPU nor is it likely to take away many of the tasks currently farmed to today's GPUs (T&L). Jeremy
Reply by Ronald H. Nicholson Jr. January 29, 20052005-01-29
In article <ct3tks$8gj$1@news01.intel.com>,
Jeremy Williamson <jeremiah.d.williamson@NOSPAMintel.com> wrote:
> >"Ronald H. Nicholson Jr." <rhn@mauve.rahul.net> wrote in message >news:cssfcg$fva$1@blue.rahul.net... >> In article <name99-42B264.17530821012005@localhost>, >> Maynard Handley <name99@name99.org> wrote: >> >Bottom line is that this thing doesn't resemble any traditional CPU and >> >is therefore a godawful match to existing languages, compilers and >> >algorithms. >> >> GPU shader algorithms and languages? Common DSP library/toolbox calls?
...
>You do realize that this will still have a GPU, and as a matter of fact Sony >gave up the idea of doing it themselves and gave the contract to nVidia (my >guess was cost vs. bowing to requests from the SW community). Essentially >that means all your basic T&L (including your shader algorithms) are still >done on the GPU.
Yes, but aren't people experimenting with using shader and DSP languages and tools for stuff that has nothing to do with the workstation display or audio output? The question is whether this software is commercially interesting and whether this cell device is more suited for this stuff than the GPU's on which these algorithm were developed. IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
Reply by Jeremy Williamson January 24, 20052005-01-24
"Ketil Malde" <ketil+news@ii.uib.no> wrote in message
news:egu0p7f6aq.fsf@ii.uib.no...
> "Xenon" <xenonxbox2@xboxnext.com> writes: > > > Cell Architecture Explained: Introduction > [...] > > 250 GFLOPS (Billion Floating Point Operations per Second) > [...] > > 6.4 Gigabit / second off-chip communication > > A little bit memory starved, I guess -- or do you have an application > that performs in the neighborhood of fifty FLOPS per *bit*? > > -kzm > -- > If I haven't seen further, it is by standing in the footprints of giants
GPUs do. Stages and stages of pure logic circuitry. On this beast, every CPU/APU would have to be executing out of cache of course. Memory starved is par for the course. One of those issues that increases as we move forward. J
Reply by Jeremy Williamson January 24, 20052005-01-24
"Ronald H. Nicholson Jr." <rhn@mauve.rahul.net> wrote in message
news:cssfcg$fva$1@blue.rahul.net...
> In article <name99-42B264.17530821012005@localhost>, > Maynard Handley <name99@name99.org> wrote: > >Bottom line is that this thing doesn't resemble any traditional CPU and > >is therefore a godawful match to existing languages, compilers and > >algorithms. > > GPU shader algorithms and languages? Common DSP library/toolbox calls? > > -- > Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ > #include <canonical.disclaimer> // only my own opinions, etc.
??? You do realize that this will still have a GPU, and as a matter of fact Sony gave up the idea of doing it themselves and gave the contract to nVidia (my guess was cost vs. bowing to requests from the SW community). Essentially that means all your basic T&L (including your shader algorithms) are still done on the GPU. J
Reply by Andrew Ryan Chang January 24, 20052005-01-24
Xenon <xenonxbox2@xboxnext.com> wrote:
>Cell Architecture Explained: Introduction
A discussion at Joystiq points out that Mr. Blachford, from who you stole the article, also explained how to make an antigravity device, and how light reduces in frequency the further it travels... http://www.blachford.info/quantum/gravity.html http://www.blachford.info/quantum/dimeng.html follow-ups set to rgv.sony; apologies for this idiot crossposter to the rest of you all. -- "It's only now, with "Blinded by the Right," that conservatives have grown a sense of journalistic skepticism when it comes to [David] Brock." - "Fight or Flight", David Talbot, Salon Apr 17 2002
Reply by Nicholas O. Lindan January 24, 20052005-01-24
> In alt.games.video.xbox CEO Gargantua <gamers@r.lamers> wrote:
This high authority on the issue spake:
> Moore's Law is dead, and it's taking Wintel down with it.
Pantagruel sez: Then they'll have to start making the software faster. Won't that be a hoot! http://www.centaurgalleries.com/Art/00077/I04274-02-500h.jpg
Reply by Doug Jacobs January 24, 20052005-01-24
In alt.games.video.xbox CEO Gargantua <gamers@r.lamers> wrote:

> Moore's Law is dead, and it's taking Wintel down with it.
But is it even needed anymore? Think about it - how many people *NEED* that 3+Ghz processor? And the people who do actually need that sort of power are going with SMP or paralell processing systems already. Even Doom3 is more concerned with the processor and memory of the graphics card - not with your main CPU. If anything, the days of the single CPU system are numbered. If you can't make the individual chip go faster, why not throw more chips at the problem? This won't lead to a speed-up across the board, but imagine being able to dedicate a processor to each application you're running on your system? BeOS used to be able to do this and even let you set how many processors you wanted dedicated to each process if you wanted. Just don't set it to 0 CPUs for the OS...bad things would happen ;) As for margins on its chips, I wouldn't worry about Intel just yet. If they start doing multiple core processors (one chip, 2 or more CPUs) which will push their prices along nicely. After all, there's only so much room on a standard desktop ATX motherboard.
Reply by Douglas Siebert January 24, 20052005-01-24
Ketil Malde <ketil+news@ii.uib.no> writes:

>"Xenon" <xenonxbox2@xboxnext.com> writes:
>> Cell Architecture Explained: Introduction > [...] >> 250 GFLOPS (Billion Floating Point Operations per Second) > [...] >> 6.4 Gigabit / second off-chip communication
>A little bit memory starved, I guess -- or do you have an application >that performs in the neighborhood of fifty FLOPS per *bit*?
The claim made in the Cell paper is that there are 8 Rambus XDR channels at 3.2 GB/s each for a total of 25.6 GB/s (I could have sworn XDR was supposed to be 6.4 GB/s, but maybe that was down the road) That 6.4 GB/s off chip communication is the hypertransport equivalent (and also supposed to be per pin, can't remember how wide that was supposed to be) Not that this gets it near 250 GLOPS usable for problems larger than a few megabytes. -- Douglas Siebert dsiebert@excisethis.khamsin.net "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety" -- Thomas Jefferson
Reply by Ketil Malde January 24, 20052005-01-24
"Xenon" <xenonxbox2@xboxnext.com> writes:

> Cell Architecture Explained: Introduction
[...]
> 250 GFLOPS (Billion Floating Point Operations per Second)
[...]
> 6.4 Gigabit / second off-chip communication
A little bit memory starved, I guess -- or do you have an application that performs in the neighborhood of fifty FLOPS per *bit*? -kzm -- If I haven't seen further, it is by standing in the footprints of giants
Reply by Robert Myers January 22, 20052005-01-22
On Sat, 22 Jan 2005 01:53:48 GMT, Maynard Handley <name99@name99.org>
wrote:

>In article <QvadnatFwfzw6W3cRVn-ow@comcast.com>, > "Xenon" <xenonxbox2@xboxnext.com> wrote: > >> The lack of cache and virtual memory systems means the APUs operate in a >> different way from conventional CPUs. This will likely make them harder to > ^^^^^ >> program but they have been designed this way to reduce complexity and >> increase performance. > >You don't say. >Programming Itanic was a picnic compared to programming this thing; at >least Itanic used a traditional computer architecture. >And yet Intel/HP, with all the money in the world, couldn't make it fly. >Please tell us why IBM/Sony/Toshiba can do what Intel/HP could not. >
Itanium and Cell both offer advantages for problems that can be formulated to exploit the architecture. In the case of Itanium, the advantages have turned out not to be overwhelming. In the case of stream processors, there are already off-the-shelf GPU's that can significantly outperform any conventional microprocessor for some kinds of problems, and the advantage of stream processors will only grow as feature sizes decrease.
>(Note, I am not denying that Cell may make a fine Playstation chip. >I AM denying that it will make a fine workstation chip, will take over >the computing world, make all other CPUs obsolete, blah blah blah.) >
Predicting the future is really hard. Genuine paradigm shifts are rare, but I think this one is on its way. The future of computing is more like what happens on network processors and GPU's than what happens on x86, PowerPC, or Itanium. The change is being driven by physics, not marketing.
>> This may sound like an inflexible system which will be complex to program >> and it most likely is but this system will deliver data to the APU registers > >So in return for giving up cache, your code has to manually move data >to/from memory. That'll be easy for the compiler to figure out. >
Of course it won't. But the same problem exists--how do I figure out how to get the data to where I need it when I need it?--in any architecture. Cache and registers add a set of tools for dealing with that problem; they don't make it go away. In the case of at least some stream processors, there is a _register_ hierarchy: a low-bandwidth stream register file that faces memory and local register files that act much like a conventional vector register. <snip>
> >There's (much much, so much) more blather and ranting about how how >fantastic Cell is and how it will solve any problem you can possibly >imagine, but for those of us in the reality-based community, I think the >points I have extracted above are the highlights. > >Bottom line is that this thing doesn't resemble any traditional CPU and >is therefore a godawful match to existing languages, compilers and >algorithms. Unless IBM/Sony/Toshiba have, in some other pocket, and kept >an extremely good secret that solves problems many people have been >working on for more than twenty years, you'll be programming this thing >with an assembly language mindset, even if you are nominally using a >high-level language --- like you program AltiVec today. Only it'll be so >much more fun because not only will you be worrying about alignment and >algorithm issues, you'll be trying to juggle fitting your instructions >and data into local memory (we weren't given a size for this but if it >is to run at L1 cache speeds, it can't be wildly far off from say 64K to >512K bytes); none of that getting the cache to just hide the problem for >you if you might want to load from an infrequent used table, handle a >rare exception condition or whatever; it'll be manual segment swapping >all over again. Not to mention the other glorious aspects. You'll be >using some bizarro method to handle coherency. You'll have the engine >that drives your code and handles exceptions and such running on a >different processor from where the compute intensive code lives. >
Maybe. Somebody likes programming these things because people are already doing it--just for fun, apparently. The problems are formidable, but it is early days yet when it comes to inventing programming models and algorithms for stream processors. One future I can see is that data (and instructions) will no longer be associated with memory locations but with labelled packets. There will always be something that looks like a conventional microprocessor? Let's wait and see what the promised workstations look like. Weren't we supposed to have seen them last fall? The one thing in all this that _really_ gives me pause is that making it work in the general case seems like getting a dataflow machine to work in the general case. There's a really nice summary of GPU programming entering the mainstream at http://www.computer.org/computer/homepage/1003/entertainment/
>And all this from IBM/Sony/Toshiba, three companies traditionally known >for their openness and willingness to share with the public. I imagine >Intel, AMD and Microsoft are quaking in their boots. >
They couldn't possibly be less open than the graphics card manufacturers have been. RM