This thread is proof that the Armed Forces have no business trying to defend
us (or our sacred oil). In this day and age, anonymity is the greatest
weapon of all. These jokers build battleships capable of sniffing out subs
at hundreds of miles, but a fishing boat a few miles away at night with a
bunch of divers experienced in underwater demolition will put it at the
bottom of the ocean. In this case, a few strings of bad code (wait,
Microsoft only writes software with hundreds of megabytes of bad code, these
days) is enough to bring Rome to a grinding halt. Makes me laugh.
--
Jesse Bazarnick
President, Liquid Express Financial Services, Inc.
Commercial Finance Broker
401-369-1933 cell
"Anton Erasmus" <nobody@spam.prevent.net> wrote in message
news:1122798918.e182e2417f5b736f9209f5e6d0c4da5f@teranews...
> On 31 Jul 2005 07:10:09 GMT, Guy Macon
> <_see.web.page_@_www.guymacon.com_> wrote:
>
>>
>>
>>
>>John Perry wrote:
>>>
>>>Guy Macon wrote:
>>>
>>>> Jerry Avins wrote:
>>>>
>>>>>The navy used Windows NT to run a heavy cruiser.
>>>>>They had to tow the Yorktown home from its shakedown cruise.
>>>>
>>>> They could have had the same problem with any OS, including a RTOS.
>>>
>>>Sheesh, Guy, the silliness of your conclusion is mind-boggling!
>>>
>>>There have been high-integrity systems, including RTOS's, built since
>>>the '80's that were immune to that sort of _system_ error, much less
>>>_user_ error.
>>>
>>>Except for the suits who were trying to justify their selection of
>>>hardware, software, and integrator vendors, all the responses to the
>>>article, even in your list of URL's, recognized that the affair showed a
>>>catastrophic lack of professionalism in the selection, design,
>>>implementation, and testing of the whole system. Note my use of order
>>>here -- it's obvious that selection by suits took place before
>>>engineering. Some of the corrections were listed, and I'm sure others
>>>were buried too deep to be exposed.
>>
>>It appears to me that, your comment about mind-boggling silliness
>>notwithstanding, you agree with my conclusion - that the OS was not
>>the problem.
>
> [snipped]
>
> His order is "Selection" first. That implies they chose the wrong OS.
> If one starts of with the wrong OS, then even if all the following
> steps are done as it should be done, one will end up with a product
> that would fail. If one chooses the right OS, and the rest is done
> incorrectly, then the end result would still be failure. In a
> building, if the foundations are unsuitable for the final load, it
> does not matter whether the rest is built even a 1000% over the
> required spec. The building is still unsafe an will fail. One can
> still build an unsafe building when the foundations is within spec,
> but one cannot build a safe building if the foundations are not up to
> the job.
>
> It is the same in a chain of logic. If a certain step is wrong, then
> the rest is wrong no matter how correct it would be as a seperate
> chain of logic. Hence making a mistake at the end of a chain would
> influence the final conclusion only a bit. A mistake at the beginning
> of the chain makes the final conclusion totally meaningless.
>
> Regards
> Anton Erasmus
>
Reply by Guy Macon●August 1, 20052005-08-01
John Perry wrote:
>
>> John Perry wrote:
>>
>> By the way, sorry about the "silliness". Wrong is not necessarily
>> silly, and I do know better.
Thanks. I was a bit put off by that.
>And, of course, disagreeing is not necessarily "wrong", either. Between
>the obvious political posturing and the possibility of disgruntled
>underlings, we'll probably never know for sure. It's remotely possible
>that the error could have caused the application code at each of the
>terminals to lock up and ignore input from a properly functioning
>network, although I've never seen any indication from even the Smart
>Ship defenders that that's what happened. That's the only thing I can
>think of that would exonerate NT.
What application code at each of the terminals? Unless I am mistaken,
this was not a NT Server system. It was an NT *Terminal* Server
system -- all applications running on the server, all terminals stop
working if the server goes down.
At the other end of the LAN they had remote operator station units
(OSUs) to talk to the humans and remote headless embedded data
acquisition units (DAUs) to talk to engines, damage control, etc.
No way for a sailor to talk to a DAU other than through the network.
As I understand it, all OSUs and DAUs were controlled by a single
giant Ada program running on the NT Terminal Server.
I think that they are calling a terminal server failure a network
failure. On a NT Server system, bringing down the network doesn't
bring down the workstations. On a NT Terminal Server system it does.
Reply by Guy Macon●August 1, 20052005-08-01
John Perry wrote:
>
>Guy Macon wrote:
>
>> John Perry wrote:
>>
>>>Guy Macon wrote:
>>>
>>>>Jerry Avins wrote:
>>>>
>>>>>The navy used Windows NT to run a heavy cruiser.
>>>>>They had to tow the Yorktown home from its shakedown cruise.
>>>>
>>>>They could have had the same problem with any OS, including a RTOS.
>>>There have been high-integrity systems, including RTOS's, built since
>>>the '80's that were immune to that sort of _system_ error, much less
>>>_user_ error.
>>>...
>>
>> It appears to me that [...] you agree with my conclusion - that
>> the OS was not the problem.
>
>Well, no. The OS was _part_ of the problem.
Oh yes indeed. NT made it easier for the idiots to screw up.
A different group who were non-idiots would have had a hard time
making a reliable system based on NT, and would have had a much
easier time making a reliable system based on a better OS. A good
OS is necessary, but not sufficient.
>Good programs protect themselves against such obvious and common
>operator error. Good software libraries check the processor exceptions
>and do reasonable recovery. Good networking processes don't lock up the
>network for hours at a time. Good OS's don't hang without timing out
>when the network is unavailable.
>
>Notice than only one of those characteristics didn't depend upon NT.
>
>> In my opinion, a system with the same catastrophic lack of
>> professionalism in the selection, design, implementation, and
>> testing of the whole system but with a different OS would still
>> have failed. The specific failure mode would have been different,
>> but simply applying a magic bullet of using another OS would have
>> done nothing to address the core problem of a bad system design.
>
>This is where we can agree, assuming your statements above concede that
>the whole network would not have collapsed with another system (which is
>the point of all of us who blame NT for the collapse).
It is my considered opinion that, even if the OS didn't collapse (which
shouldn't be possible, but is with NT), the *application* would still
have failed to control the engines. Not in the same way, but IMO the
whole project was a game of whack-a-mole; take away the "NT allows the
network to crash" failure mode and one of the thousands of other failure
modes would have eventually bitten them in the arse. Just as one can
write assembly in any language, so one can write an unreliable engine
control program under any OS. The bare fact that they had no way to
start the engines with the network down tells us that the engine control
program was unreliable. There are dozens of engineers in this newsgroup
who would have built in a fallback method of manually controlling the
engines - the classic shouting at the engine room operator through a
tube, for example. "Full Speed Ahead, Mr. Perry!" "Aye sir, Full
Speed Ahead!"
So it's the fault of NT and it's the fault of the idiots who designed
the system. Just replacing the OS would not have made a good system,
but replacing the idiots with some professionals would have resulted
in a system that kept working even if NT crashed. Of course the
choice of NT was and still is an indicator of an idiot who does not
understand high-availability systems....
>> There are problems that can be fixed with a different OS. A monolithic
>> application written in Ada that controls a database and engine propulsion
>> and which crashes when someone tells the database that valve X is closed
>> is not on of them.
>
>An no one disagrees, as far as I know. Except for the ideologues, our
>point is that NT was a major part of the problem, and specifically the
>complete collapse of the ship's systems was an NT problem. The user
>input error should have been caught at many places during program
>execution, and if NT had protected itself properly, even the unit that
>had the bad data would not have crashed.
I agree 100%. Plenty of blame to lay upon NT here. Yet still, replacing
the OS would not, IMO, have resulted in a reliable ship.
Reply by John Perry●August 1, 20052005-08-01
Guy Macon wrote:
> John Perry wrote:
>
>>Guy Macon wrote:
>>
>>
>>>Jerry Avins wrote:
>>>
>>>
>>>>The navy used Windows NT to run a heavy cruiser.
>>>>They had to tow the Yorktown home from its shakedown cruise.
>>>
>>>They could have had the same problem with any OS, including a RTOS.
>>
>>Sheesh, Guy, the silliness of your conclusion is mind-boggling!
>>
>>There have been high-integrity systems, including RTOS's, built since
>>the '80's that were immune to that sort of _system_ error, much less
>>_user_ error.
>>...
>
> It appears to me that, your comment about mind-boggling silliness
> notwithstanding, you agree with my conclusion - that the OS was not
> the problem.
>
Well, no. The OS was _part_ of the problem.
Good programs protect themselves against such obvious and common
operator error. Good software libraries check the processor exceptions
and do reasonable recovery. Good networking processes don't lock up the
network for hours at a time. Good OS's don't hang without timing out
when the network is unavailable.
Notice than only one of those characteristics didn't depend upon NT.
> In my opinion, a system with the same catastrophic lack of
> professionalism in the selection, design, implementation, and
> testing of the whole system but with a different OS would still
> have failed. The specific failure mode would have been different,
> but simply applying a magic bullet of using another OS would have
> done nothing to address the core problem of a bad system design.
>
> There are problems that can be fixed with a different OS. A monolithic
> application written in Ada that controls a database and engine propulsion
> and which crashes when someone tells the database that valve X is closed
> is not on of them.
>
>
An no one disagrees, as far as I know. Except for the ideologs, our
point is that NT was a major part of the problem, and specifically the
complete collapse of the ship's systems was an NT problem. The user
input error should have been caught at many places during program
execution, and if NT had protected itself properly, even the unit that
had the bad data would not have crashed.
jp
By the way, sorry about the "silliness". Wrong is not necessarily
silly, and I do know better.
Reply by John Perry●August 1, 20052005-08-01
> John Perry wrote:
> By the way, sorry about the "silliness". Wrong is not necessarily silly, and I do know better.
And, of course, disagreeing is not necessarily "wrong", either. Between
the obvious political posturing and the possibility of disgruntled
underlings, we'll probably never know for sure. It's remotely possible
that the error could have caused the application code at each of the
terminals to lock up and ignore input from a properly functioning
network, although I've never seen any indication from even the Smart
Ship defenders that that's what happened. That's the only thing I can
think of that would exonerate NT.
jp
Reply by Anton Erasmus●July 31, 20052005-07-31
On 31 Jul 2005 07:10:09 GMT, Guy Macon
<_see.web.page_@_www.guymacon.com_> wrote:
>
>
>
>John Perry wrote:
>>
>>Guy Macon wrote:
>>
>>> Jerry Avins wrote:
>>>
>>>>The navy used Windows NT to run a heavy cruiser.
>>>>They had to tow the Yorktown home from its shakedown cruise.
>>>
>>> They could have had the same problem with any OS, including a RTOS.
>>
>>Sheesh, Guy, the silliness of your conclusion is mind-boggling!
>>
>>There have been high-integrity systems, including RTOS's, built since
>>the '80's that were immune to that sort of _system_ error, much less
>>_user_ error.
>>
>>Except for the suits who were trying to justify their selection of
>>hardware, software, and integrator vendors, all the responses to the
>>article, even in your list of URL's, recognized that the affair showed a
>>catastrophic lack of professionalism in the selection, design,
>>implementation, and testing of the whole system. Note my use of order
>>here -- it's obvious that selection by suits took place before
>>engineering. Some of the corrections were listed, and I'm sure others
>>were buried too deep to be exposed.
>
>It appears to me that, your comment about mind-boggling silliness
>notwithstanding, you agree with my conclusion - that the OS was not
>the problem.
[snipped]
His order is "Selection" first. That implies they chose the wrong OS.
If one starts of with the wrong OS, then even if all the following
steps are done as it should be done, one will end up with a product
that would fail. If one chooses the right OS, and the rest is done
incorrectly, then the end result would still be failure. In a
building, if the foundations are unsuitable for the final load, it
does not matter whether the rest is built even a 1000% over the
required spec. The building is still unsafe an will fail. One can
still build an unsafe building when the foundations is within spec,
but one cannot build a safe building if the foundations are not up to
the job.
It is the same in a chain of logic. If a certain step is wrong, then
the rest is wrong no matter how correct it would be as a seperate
chain of logic. Hence making a mistake at the end of a chain would
influence the final conclusion only a bit. A mistake at the beginning
of the chain makes the final conclusion totally meaningless.
Regards
Anton Erasmus
Reply by Guy Macon●July 31, 20052005-07-31
John Perry wrote:
>
>Guy Macon wrote:
>
>> Jerry Avins wrote:
>>
>>>The navy used Windows NT to run a heavy cruiser.
>>>They had to tow the Yorktown home from its shakedown cruise.
>>
>> They could have had the same problem with any OS, including a RTOS.
>
>Sheesh, Guy, the silliness of your conclusion is mind-boggling!
>
>There have been high-integrity systems, including RTOS's, built since
>the '80's that were immune to that sort of _system_ error, much less
>_user_ error.
>
>Except for the suits who were trying to justify their selection of
>hardware, software, and integrator vendors, all the responses to the
>article, even in your list of URL's, recognized that the affair showed a
>catastrophic lack of professionalism in the selection, design,
>implementation, and testing of the whole system. Note my use of order
>here -- it's obvious that selection by suits took place before
>engineering. Some of the corrections were listed, and I'm sure others
>were buried too deep to be exposed.
It appears to me that, your comment about mind-boggling silliness
notwithstanding, you agree with my conclusion - that the OS was not
the problem.
In my opinion, a system with the same catastrophic lack of
professionalism in the selection, design, implementation, and
testing of the whole system but with a different OS would still
have failed. The specific failure mode would have been different,
but simply applying a magic bullet of using another OS would have
done nothing to address the core problem of a bad system design.
There are problems that can be fixed with a different OS. A monolithic
application written in Ada that controls a database and engine propulsion
and which crashes when someone tells the database that valve X is closed
is not on of them.
Reply by John Perry●July 31, 20052005-07-31
Guy Macon wrote:
> Jerry Avins wrote:
>
>
>>The navy used Windows NT to run a heavy cruiser.
>>They had to tow the Yorktown home from its shakedown cruise.
>
>
> They could have had the same problem with any OS, including a RTOS.
Sheesh, Guy, the silliness of your conclusion is mind-boggling!
There have been high-integrity systems, including RTOS's, built since
the '80's that were immune to that sort of _system_ error, much less
_user_ error.
Except for the suits who were trying to justify their selection of
hardware, software, and integrator vendors, all the responses to the
article, even in your list of URL's, recognized that the affair showed a
catastrophic lack of professionalism in the selection, design,
implementation, and testing of the whole system. Note my use of order
here -- it's obvious that selection by suits took place before
engineering. Some of the corrections were listed, and I'm sure others
were buried too deep to be exposed.
John Perry
"A systems administrator fed bad data into the ship's Remote
Database Manager, which caused a buffer overflow when the
software tried to divide by zero. The overflow crashed
computers on the LAN and caused the Yorktown to lose control
of its propulsion system, Navy officials said."
"Because the ships' new propulsion control system was developed
quickly, his programmers knew there were inherent risks,
Rushton said"
"NT was never the cause of any problem on the ship, Rushton said.
The problems were all in programs, database and code within
the individual pieces of software that we were using"
"Using Windows NT, which is known to have some failure modes,
on a warship is similar to hoping that luck will be in our favor,
wrote Anthony DiGiorgio, an engineer with the Atlantic Fleet
Technical Support Center, in a June 1998 article titled
"The Smart Ship is Not The Answer.""
It seems the problem comes from two main reasons:
- The system was not designed to cope with the failure of the
database server.
- The system was developed "quickly".
It is common practice to have separate busses on systems
with safety constraints. One bus is the safety bus and
only certified devices are connected on it. Another bus is
the supervisory bus where operators can monitor and control
the installation.
There can be yet another bus for delayed tasks such as
data anaysis. There are gateways between busses that ensure
that errors are not propagated. A gateway between a
safety bus and another bus ensures that data coming
from non safety bus cannot compromise safety.
If an installation has identified safety constraints
windows NT should not be used as a central system, only
certified safety devices should be in control. However
the use of safety devices in itself does not ensure that
safety errors cannot occur. The whole installation must
be certified.
A "quick" development is not compatible with this kind
of constraints. This was not a programming error but a
system design error.
Windows NT has advantages at some level in the system. The
availability of numerous and powerful software and the
connectivity with many peripheral devices makes it a
system of choice for some tasks.
The fact that it is promoted for safety functions is
probably the result of budgetary constraints becoming
central and technical constraints neglected. This
happens more and more and can be related to the
emphasis put on revenue and the lack of expertise
of managers in technical fields.