Hi George,
[Early morning meeting -- WTF? Had *hoped* I'd get a nap in beforehand
but obviously got caught up in "stuff"... <frown>]
>>>> ... Previous systems just were "processor farms" -- typically
>>>> all "powered" just waiting for "workloads". The idea of *bringing*
>>>> another node on-line, ON-DEMAND to address increasing needs wasn't
>>>> part of their scope (why should it be? Unless you're concerned
>>>> with power consumption!). Nor was there a concern over taking
>>>> nodes OFF-line when they weren't technically needed.
>>>
>>> The current crop of tera-scale computers consume megawatts, and the
>>> largest peta-scale computers consume 10s of megawatts when all their
>>> CPUs and attached IO devices are active. They have extensive power
>>> control systems to manage the partitioning of active/inactive devices.
>>
>> But their goal is to *use* all of that compute power, not let it
>> idle. They tend to be more homogeneous environments with more
>> "level" I/O usage. It's not like turning on CCTV cameras "because
>> it's getting dark outside" and, as a result, *needing* that extra compute
>> power to do video processing.
>
> Supercomputing centers all are batch oriented just like mainframes
> used to be. The difference is they shut off CPUs that aren't in use -
> if any - to lower the power bills.
>
> It's true that a lot of older machines have plenty of work to keep
> them runnning ... but in the last 10-15 years, many newer ones have
> had odd architectures that make writing software for them difficult
> and time consuming. It is true that a lot of them use Intel or ARM
> processors, but it isn't true that they all run Linux and can be
> programmed using GCC/OpenMP. Some of the world's most powerful
> systems sit idle much of the time, simply for lack of software.
>
> And a lot of the software itself is surprisingly flexible. In most
> SCC environments there is no multi-tasking: a set of CPUs is dedicated
> to a program for its duration. But external factors may cause a
> program to be halted before finishing. The stopped program may be
> restarted later with a different number of CPUs according to the mix
> of programs that are running at that time.
Different than being "paused" while "relocated" -- yet expecting
the rest of the "system" to accommodate their time "in transit".
Note that, conceptually, my code could run on a single CPU
(with enough resources) -- it isn't *inherently* parallel.
I think this is an essential aspect for development and
future maintenance: you only need to expect the *illusion*
of parallelism.
[The other aspects we've been discussing are all hidden from
the application developer in much the same way as GC. I.e.,
if you think about things, you realize "its happening" but
your code never really understands why/where]
> Programs which are expected to need many CPUs (or vast amounts of
> memory which very often is tied to the number of CPUs), or which are
> expected to run for more than a few minutes - such programs often are
> written to checkpoint intermediate processing, to be restartable from
> saved checkpoints, and to adapt dynamically to the number of CPUs they
> are given when (re)started.
>
>>> And nobody has yet come up with a really good way to exploit massively
>>> parallel hardware for general application programming. But that's a
>>> different discussion.
>>
>> I don't believe we'll see any "effective" algorithms -- largely because
>> there aren't many "available" installations. So, you have groups
>> with very specific sorts of application sets trying to tackle the
>> problem for *their* needs; not for "general" needs.
>
> The point is that "massively parallel" is becoming the norm. There
> are commodity server chips now with 16 and 32 cores, and this year's
> high end server chip will be in a laptop in 5 years.
I consider SMP a different beast than distributed systems. E.g.,
my blade server has ~60 (?) cores in the same box -- but it's really
14 separate "machines" that really are only suited to working in
a "server farm" sort of environment; tied to applications that
are simply replicated with a split workload (instead of a single
application that has been "diced up" to run on the many processors).
> [ASUS now will happily sell you a water-cooled !! laptop with an
> overclocked 6th-gen i7 paired with 2 Nvidia 1080 GPUs. If you
> disconnect the water line it melts ... or maybe just slows down 50%.
> But seriously, what do you expect for only $7K?]
>
> In any event, more cores are fine for running more programs
Exactly. Which is how I've exploited "CPUs" (and, now, cores).
Conveniently sidestepping the issue of "how do you develop
for this environment" by treating it as many "programs" instead
of a single program that magically diced itself to pieces.
> simultaneously, but there's no good *general* way to leverage more
> cores to make a single program run faster. The ways that are known to
> be automatically exploitable (by a compiler) are largely limited to
> parallelizing loops. Parallelizing non looping code [which is most of
> most programs that need it] invariably relies on the programmer to
> recognize possible parallelism and write special code to take
> advantage of it.
>
> Why should anyone care? Good question. I don't have a good answer,
> but a couple of data points:
>
> Lots of experience has shown that the average programmer can't write
> correct parallel code [or even just correct serial code, but that's
I think that's inherent in the way that most people *think* of algorithms:
"this, then that, followed by yet another thing". Being a hardware
person, I implicitly see walking and chewing gum as "obvious"... why
should this bit of hardware *pause* waiting for this other bit to
finish ("either-or")?
> another discussion]. Automating parallelism - via compilers or smart
> runtime systems - is the only way any significant percentage of
> programs will be able to benefit.
I don't see big gains, there. E.g., I probably represent an "above
average compute load" (in terms of what and how much I use machines for
during development), yet am reasonably sure most of the apps that
I run would not SIGNIFICANTLY benefit from even a *second* core
(well, maybe autorouting a PCB while simultaneously doing something
else, but that's the excpetion, not the rule).
Note my previous comment re: how I've decided to use multicore CPUs
in the HA system: by assigning specific jobs to specific cores
KNOWING what those loads represent in my system *and* thereby
being able to sidestep the core-scheduling issue.
Will this partitioning be the most efficient use of the cores? Probably
not. But, they're inexpensive and this *will* make it easier to ignore
some of the costs that my design "naturally" incurs.
> Surveys have shown that for many people, the only "computer" they own
> or routinely use is their smartphone. The average person quite soon
> will reasonably expect their phone to able to do anything: word
> processing, spreadsheets, audio/video editing and presentation, i.e.
> general business computing, and (for those few taking college STEM
> courses) solving differential equations, performing circuit
> simulations, virtual reality walkthroughs of pyramids, galaxies,
> cadavers, etc. ...
> ... while snapchatting, tweeting, facebooking, pinteresting, and still
> providing 40 hours of use on a battery charge.
I disagree with the "while" assumption (as in true concurrency).
I still think people think about "work" serially. As long as you
*appear* to make some progress on "task A" while they are busy
with "task B", they're unable to understand how *good* your
effort happened to be.
E.g., if I'm updating a schematic *while* autorouting (another) design,
I only care that the autorouter made *some* progress while I had
the machine "tied up" with my schematic entry activities. It's
not like I review its progress and think: "Gee, it should be farther
along than it is...". OTOH, if it had (obviously) *paused*
"while I was looking elsewhere", I'd be annoyed.
With this in mind, I think it makes it easier to develop application
sets that can achieve "satisfactory" performance without aiming for
"ideal" performance. The challenging (portions of) applications being
those things that the user *expects* to run in "real time" (e.g.,
the "answering machine" application on your phone can't suddenly start
speaking the OGM in slow motion just because your attention is focused
on something else!).
E.g., I can do "commercial detection" in recorded video "off-line"...
as long as I can get it DONE before the user wants to view that
video! (which, of course, can't be *while* it is streaming cuz
it consumes more real time than it *occupies*).
OTOH, I have to do motion detection (CCTV) in real time cuz that
affects the latency of the actions triggered by that motion.
> Unless someone comes up with a way to pack kWh into a AA sized package
> [that can be recharged in under 30 minutes], making better use of
> large numbers of low powered cores is going to be the only way
> forward.
Or, convince people that they don't need to "take it with them"!
E.g., my UI's have far less power requirements than the "code"
to run them implies. (OTOH, they're featherweight devices so
the battery issue still applies). I don't think people care *where*
the processing is performed; they just want to ACCESS it "locally"
(wherever "locally" may be!)
I look at how my HA system (and its successor) have evolved and
that's the single-most striking "optimization" in it all!
Decouple the UI from the application. And, make the UI ubiquitous
at the same time --> simpler, more desirable system!
[Of course, for most people, that means reliance on someone else to
provide that "service"... still baffles me to see what people pay
for that "convenience" re: cell phones!]
(sigh) Shower then gather up my prototypes and get my *ss out of here.
THEN, perhaps, the nap I've been awaiting?