EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

How to choose a firmware partner

Started by robi...@tesco.net May 26, 2004
CBFalconer says...
> >Guy Macon <http://www.guymacon.com> wrote: > >> rickman says... >> >>> Guy Macon <http://www.guymacon.com> wrote: >>>> >>>> I am typing this on a Comopaq Proliant 5500R server that I got >>>> on eBay. Corrects all 1-bit errors, detects all 2-bit errors. >>>> Has 4 200Mhz Pentium Pro uPs with 1MB of cache on each one (I >>>> will upgrade to 500Mhz Pentium III Xeons when the price drops) >>>> 3GB of ECC RAM, and a twelve disk SCSI raid array with hotswap >>>> drives. All for under $500. >>> >>> I am curious, how many 1 bit or 2 bit errors has the system >>> reported? >> >> None, in two years of 24/7 use and two full weeks of running >> memory diagnostics. With older Compaq servers, if you can >> measure the error rate it is much too high. > >I assume that is running under Linux or your own software.
I quadruple boot to=FreeDOS, QNX, Slackware linux, and Windows 2000.
>AIUI there is no provision in Windoze for recording memory >failures and/or corrections.
This is true of desktop systems, but Proliant servers are a another story. A desktop PC has a minimal BIOS that just gets the hardware into a state where the OS can boot. A fully equiped Proliant server has another CPU with embedded software that does such wonderful things as detecting that Windows has crashed and resetting the system, switching over to another server if there is a hardware failure, etc. In addition Compaq provides a ProLiant System Management Driver for Windows that adds the following capabilities to Windows: Logging of real-time clock battery errors. Logging of processor errors. Fan outage and temperature detect alarms. Logging of power module errors. Logging of corrected memory errors. -- Guy Macon, Electronics Engineer & Project Manager for hire. Remember Doc Brown from the _Back to the Future_ movies? Do you have an "impossible" engineering project that only someone like Doc Brown can solve? My resume is at http://www.guymacon.com/
On Thu, 27 May 2004 09:03:14 -0700, Jim Stewart <jstewart@jkmicro.com>
wrote:

>Anthony Fremont wrote: >> "42Bastian Schick" <bastian42@yahoo.com> wrote in message >> >> >>>Forgive my ignorance (I was released 1970), what is the difference >>>between core and RAM chips ? >> >> >> Wow, do I feel old now. > >No kidding. Did anyone else have the high school >algebra book that had a picture of a core plane >on the cover? Latest technology then. > > Magnetic core was a bunch of tiny little >> "donuts" that looked something like miniscule ferrite beads. It was >> literally sewn together by meticulous women (peering thru microscopes) >> into an X, Y type lattice that also had a sense/inhibit wire running >> throughout all the "donuts". Each little donut could be magnetized into >> one of two polarities to represent a 1 or 0. When core was read, it >> destroyed the data stored and had to be automatically rewritten by the >> hardware. Some guy named Wang figured all this out. >>
Boy, you guys are really making me feel old! I started this business working on Burroughs 205 vacume tube system that ran at 200khz clock and had drum memory. Program input was punched paper tape and punched cards. Console was a maze of neon lights. That's were I learned to read binary. After that I worked on the Burroughs 220 whic had core memory. My memory may be failing me but it seems like it had a 16K stack that was about 2'x2'x3' or so. Power was a big 400 hz generator set. Another memory was seeing a movie, some kind of space rocket to the moon where the crew controlled the rocket with a box that was actually a diode checker for a 205. I almost died laughing at that. All the logic diodes were clip-in and a lot of time was spent checking diodes. It's fun going down memory lane, isn't it? Regards, Art
Guy Macon wrote:
> CBFalconer says... >> Guy Macon <http://www.guymacon.com> wrote: >>> rickman says... >>>> Guy Macon <http://www.guymacon.com> wrote: >>>>> >>>>> I am typing this on a Comopaq Proliant 5500R server that I got >>>>> on eBay. Corrects all 1-bit errors, detects all 2-bit errors. >>>>> Has 4 200Mhz Pentium Pro uPs with 1MB of cache on each one (I >>>>> will upgrade to 500Mhz Pentium III Xeons when the price drops) >>>>> 3GB of ECC RAM, and a twelve disk SCSI raid array with hotswap >>>>> drives. All for under $500. >>>> >>>> I am curious, how many 1 bit or 2 bit errors has the system >>>> reported? >>> >>> None, in two years of 24/7 use and two full weeks of running >>> memory diagnostics. With older Compaq servers, if you can >>> measure the error rate it is much too high. >> >> I assume that is running under Linux or your own software. > > I quadruple boot to=FreeDOS, QNX, Slackware linux, and Windows 2000. > >> AIUI there is no provision in Windoze for recording memory >> failures and/or corrections. > > This is true of desktop systems, but Proliant servers are a another > story. A desktop PC has a minimal BIOS that just gets the hardware > into a state where the OS can boot. A fully equiped Proliant server > has another CPU with embedded software that does such wonderful > things as detecting that Windows has crashed and resetting the > system, switching over to another server if there is a hardware > failure, etc. In addition Compaq provides a ProLiant System > Management Driver for Windows that adds the following capabilities > to Windows: > > Logging of real-time clock battery errors. > Logging of processor errors. > Fan outage and temperature detect alarms. > Logging of power module errors. > Logging of corrected memory errors.
Nice. Would you like to make a quick $10 profit on that box? :-) Am I correct that there is no real reason for this Windoze failing in general; i.e. it simply needs to enable the appropriate HW interrupt and service it accordingly? -- Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. <http://cbfalconer.home.att.net> USE worldnet address!
CBFalconer <cbfalconer@yahoo.com> says...

>Am I correct that there is no real reason for this Windoze failing > [not showing the user any ECC errors} >in general; i.e. it simply needs to enable the appropriate HW >interrupt and service it accordingly?
No way to tell; windows doesn't release their source code. Maybe doing so would upset some fragile part of Windows. Maybe this is one of the many parts of Windows where Microsoft no longer has the knowledge to make any changes. Perhaps the memory manufacturers are paying Microsoft to not report errors. Unless software is Open Source, these kinds of questions can never be answered. -- Guy Macon, Electronics Engineer & Project Manager for hire. Remember Doc Brown from the _Back to the Future_ movies? Do you have an "impossible" engineering project that only someone like Doc Brown can solve? My resume is at http://www.guymacon.com/
"Anthony Fremont" <spam@anywhere.com> wrote in message news:<hbPtc.24674$lY2.15714@fe1.texas.rr.com>...
> <robin.pain@tesco.net> wrote > > > We have to enable WDT to conform. I hate it because the only post > > production failure we had was WDT induced: > > > > At cold temperatures the (CR) period of the (independent) WDT reduces > > by 20%. I did not realise this. I set all WDT reset-times at the most > > infrequent possible (at room temperature). > > It's worse than that, read on. > > > At low temperatures, our system became locked in a constant re-boot > > cycle. > > So what you seem to be saying is, "I don't read datasheets and when > things go wrong for me it's the manufacturers fault". If you think it > can only vary by 20% then you will be rudely educated again. For > example on a 16F628 the acceptable limits (according to the datasheet) > is 7 - 33mS with 18mS being "typical" (no prescaler assigned). I'm > really not trying to be an ass, but you absolutely have to RTFM when > working with these things.
No, I do read data sheets but I have a small attention span and a bad memory so I expect to make mistakes. Therefore the simpler the system, the more likely it will work (for me). The more sensible the job the more likely it will fascinate my brain. The sillier the job the more likely my brain will wander, and make mistakes. Therefore I expect to make more mistakes implementing e.g. <wdt> types. It may be that I am unique but I doubt it. I think some others do likewise and I think the sheer dullness and difficulty (impossibility) of optimising the placement of the wdt_resets (ignoring future maintainability horror) will likely lead to wholesale sloppiness sooner to later and agressive management or timescale pressure will guarrantee it. Cheers Robin
<robin.pain@tesco.net> wrote in message
news:bd24a397.0406022345.42dfad7b@posting.google.com...
> > No, I do read data sheets but I have a small attention span and a bad > memory so I expect to make mistakes.
Remind me not to hire you ;).
> I think the sheer dullness and difficulty > (impossibility) of optimising the placement of the wdt_resets > (ignoring future maintainability horror) will likely lead to wholesale > sloppiness sooner to later and agressive management or timescale > pressure will guarrantee it.
Errr... see my earlier post. There is no "dullness" or "horror" involved in kicking watchdogs - one just has to be methodical. And since being methodical is the name of this particular game (embedded design), I can't help but wonder whether you've chosen the right career. Steve http://www.sfdesign.co.uk http://www.fivetrees.com
On 3 Jun 2004 00:45:31 -0700, robin.pain@tesco.net
(robin.pain@tesco.net) wrote:

[...]
>No, I do read data sheets but I have a small attention span and a bad >memory so I expect to make mistakes. > >Therefore the simpler the system, the more likely it will work (for >me). > >The more sensible the job the more likely it will fascinate my brain. > >The sillier the job the more likely my brain will wander, and make >mistakes.
> >Therefore I expect to make more mistakes implementing e.g. <wdt> >types. It may be that I am unique but I doubt it. I think some others >do likewise and I think the sheer dullness and difficulty >(impossibility) of optimising the placement of the wdt_resets >(ignoring future maintainability horror) will likely lead to wholesale >sloppiness sooner to later and agressive management or timescale >pressure will guarrantee it.
Misreading (or not reading, or forgetting) the datasheet was a silly mistake. But the sillier one was trying to "optimize" your WDT updates. There is really no reason to do so. For my superloop code, the WDT is updated in exactly two places: immediately after reset, and at the top of the superloop. Combined with special "come from" tests to ensure the program flow is as was expected, our systems have no trouble surviving some really nasty ESD testing required by our customers. (Without a WDT, I _guarantee_ your system will fail these tests.) Multitasking systems also update the WDT in exactly two places: immediately after reset, and in a low-priority periodic task that checks to make sure the system is operating correctly. Under normal operation the WDT is being updated several orders of magnitude more often than is necessary. Who cares? It's abnormal operation I care about, and what the WDT is supposed to remedy. Regards, -=Dave -- Change is inevitable, progress is not.
<robin.pain@tesco.net> wrote
> "Anthony Fremont" <spam@anywhere.com> wrote
> > So what you seem to be saying is, "I don't read datasheets and when > > things go wrong for me it's the manufacturers fault". If you think
it
> > can only vary by 20% then you will be rudely educated again. For > > example on a 16F628 the acceptable limits (according to the
datasheet)
> > is 7 - 33mS with 18mS being "typical" (no prescaler assigned). I'm > > really not trying to be an ass, but you absolutely have to RTFM when > > working with these things. > > No, I do read data sheets but I have a small attention span and a bad > memory so I expect to make mistakes.
I hope you read what I wrote, and will try to remember that. ;-)
> Therefore the simpler the system, the more likely it will work (for > me).
I often find the inverse to be true. The simpler the problem, the more likely I will underthink it and make a silly mistake. At one time I was a Cobol programmer (a long time ago, so don't laugh). I often made the stupidest mistakes when writing a new program. When I wrote assembly language, I often had great success at getting it right the first time due to the larger amount of thought required to reason the problem out in my tiny brain.
> The more sensible the job the more likely it will fascinate my brain.
I think you will face many boring situations then.
> The sillier the job the more likely my brain will wander, and make > mistakes.
Words like silly and sensible don't often correctly describe real world problems. Maybe the required solutions, but not the problems.
> Therefore I expect to make more mistakes implementing e.g. <wdt> > types. It may be that I am unique but I doubt it. I think some others > do likewise and I think the sheer dullness and difficulty > (impossibility) of optimising the placement of the wdt_resets > (ignoring future maintainability horror) will likely lead to wholesale > sloppiness sooner to later and agressive management or timescale > pressure will guarrantee it.
As others have said, kicking the watchdog just in the nick of time is unnecessary and foolhardy. That's why there is a prescaler. I will agree that relying on a watchdog to cover up deficient software design is a poor practice, but lousy software is not the only reason that embedded applications lock up. It's more a matter of the real world environment that necessitates the watchdogs. You can't control nature so don't waste your time trying. ;-) If I had an application that occasionally hung up in a controlled environment, I would find the software problem causing it and not rely on the watchdog to cover it up.
Art K6KFH <art_horne@phase4wireless.com> writes:
Jim Stewart <jstewart@jkmicro.com> wrote:
> >Anthony Fremont wrote: > >> Magnetic core was a bunch of tiny little > >> "donuts" that looked something like miniscule ferrite beads. It was > >> literally sewn together by meticulous women (peering thru microscopes) > >> into an X, Y type lattice that also had a sense/inhibit wire running > >> throughout all the "donuts". Each little donut could be magnetized into > >> one of two polarities to represent a 1 or 0. When core was read, it > >> destroyed the data stored and had to be automatically rewritten by the > >> hardware. Some guy named Wang figured all this out.
Univac had a machine that strung the cores on all three axes in the mid-1960s. It was fascinating to watch it work.
> Boy, you guys are really making me feel old! I started this business > working on Burroughs 205 vacume tube system that ran at 200khz clock > and had drum memory. Program input was punched paper tape and punched > cards. Console was a maze of neon lights. That's were I learned to > read binary. After that I worked on the Burroughs 220 whic had core > memory. My memory may be failing me but it seems like it had a 16K > stack that was about 2'x2'x3' or so. Power was a big 400 hz generator > set. Another memory was seeing a movie, some kind of space rocket to > the moon where the crew controlled the rocket with a box that was > actually a diode checker for a 205. I almost died laughing at that. > All the logic diodes were clip-in and a lot of time was spent checking > diodes.
My first introduction to digital computers was the Bendix G15 in the mid-1960s. It was a decimal, serial, drum memory machine using a lot of diodes in its logic. It's amazing how long it takes to find all the failed diodes after a lightning strike!
iddw@hotmail.com (Dave Hansen) wrote in message news:<40bf21f8.132385989@News.individual.net>...
> On 3 Jun 2004 00:45:31 -0700, robin.pain@tesco.net > (robin.pain@tesco.net) wrote: > > [...] > >No, I do read data sheets but I have a small attention span and a bad > >memory so I expect to make mistakes. > >
<snip>
> > Misreading (or not reading, or forgetting) the datasheet was a silly > mistake.
Yes it's a silly mistake that *anyone* can make.
> But the sillier one was trying to "optimize" your WDT updates. There > is really no reason to do so. > > For my superloop code, the WDT is updated in exactly two places: > immediately after reset, and at the top of the superloop. Combined > with special "come from" tests to ensure the program flow is as was > expected, our systems have no trouble surviving some really nasty ESD > testing required by our customers. (Without a WDT, I _guarantee_ your > system will fail these tests.)
Very well, _guarantee_ that this code will fail without the wdt enabled:- ... jmp test jmp test test bitset PORT,test_bit bitclear PORT,test_bit jmp test jmp test ...
> > Multitasking systems also update the WDT in exactly two places: > immediately after reset, and in a low-priority periodic task that > checks to make sure the system is operating correctly. > > Under normal operation the WDT is being updated several orders of > magnitude more often than is necessary. Who cares? It's abnormal > operation I care about, and what the WDT is supposed to remedy. >
So a high energy particle changes your program counter and you have a random GOTO occur but your program does not reset because the chances of the GOTO landing near a wdt_reset is now "several orders of magnitude" higher? And you say "Who cares?" Cheers Robin

The 2024 Embedded Online Conference