EmbeddedRelated.com
Forums

books for embedded software development

Started by Alessandro Basili December 12, 2011
On 12/14/2011 11:30 PM, Steve B wrote:
> > Interesting. This is off the topic of the thread, but I think a star > tracker will be quite difficult to get tuned and working after the fact. > Not impossible, but having the optical and mechanical calibration and > integration done right would be essential. > So I bet it would make for a very interesting and challenging task. >
Well, until the software is not ready to reliably take images it would be hard to do anything you mentioned. I don't quite understand what do you mean by mechanical calibration.
>> >> Except for two items that were space rated, all the rest (~600 dsp >> units, ~20 microcontrollers and tons of fuse-logic fpgas) have been >> chosen for their tolerance to radiation after doing tests on particle >> accelerators. That actually means that the rate - cross section - is low >> enough to not adversely affect operations. We have a built-in test that >> performs a check over the program memory and calculates the CRC, any >> deviation from what we expect will be handled with a reboot of the node. >> We don't have yet looked at a distribution of these events, but the rate >> is ~1/2 per day. >> >> > > Sounds quite good then. I guessed from the cern.ch domain on your email > that whoever you're working with must have access to lots of radiation > data or facilities.
Most of the testing was done off site, mostly at GSI with heavy nuclei with energies range from 100 to 1000 MeV/Nucleon. At CERN there are several facilities which monitors level of radiation, mostly ionization dose, but they are of course sensitive to SEE as well. There's a great deal, now that the machine is working, to reassess the status of the electronics, given the fluxes are much higher than anticipated [reference needed...].
Hi Alessandro,

>> I don't know about the "80" figure (from the sorts of bugs I have >> encountered in RELEASED code, I would have imagined the figure to >> be even HIGHER!) but I find this to be very true. Even the act of >> aggressively writing test cases against a specification will >> often turn up lots of things that weren't considered. >> >> E.g., I go to great lengths to try to design data representations >> so that "nasty" values CAN'T exist. -- so that someone can't >> fabricate a set of inputs that I'm not prepared to handle. > > Could you make an example here?
Using unsigned's for counts (can you have a negative number of items?). Using relative measurements instead of absolutes (e.g., "worksurface is 23 inches from reference; position of actuator is 3.2 inches from edge of worksurface" contrast with "worksurface is at 23.0, actuator is at 22.5 -- oops!")
>>> The point about writing requirements specifications is that they have >>> to be >>> clear, concise and testable statements of what is required in the >>> system by >>> way of functionality, interfaces, performance and maintainability. The >>> specification document should be free of assumptions but if >>> assumptions are >>> necessary these should be very clearly identified as such and the >>> basis for >>> the assumptions clearly described. However, it would be better to >>> eliminate >>> them altogether. >> >> It's hard to eliminate all assumptions. E.g., I invariably assume >> that the next instruction executed WILL be the one that is intended >> to be executed :> > > How would you go with the assumption that the compiler of your > application works? How do you check that?
<grin> Buy a compiler that has passed a validation suite. And hope your code doesnt stress it in some bizarre way :>
>> But, people seem to find it very hard to ask themselves, "What >> am I *assuming*, here?" To often, assumptions are *so* fundamental >> that they bend into the landscape. If, instead, you approach >> it in an EQUIVALENT manner as "What am I RELYING ON, here?", >> it tends to result in a more apprehensive approach to that >> exercise. I.e., as if there *is* some vulnerability and >> you are tasked with *finding* it! (looking for "assumptions" >> seems to be less "threatening" -- and, perhaps, less *motivating*) > > That is quite an interesting point. In my previous life (five years > ago!) I was building hardware for the same detector and the development > cycle surely included a timing static analysis on the FPGAs, but we > didn't _assume_ that was enough, rather we decided not to _rely_ on the > output of that analysis and stuff the electronic in the thermal chamber > and did a fully functional test. Those tests not only spotted few bad > components (infant mortality!), but it gave us the grounds to believe > that in all possible thermal conditions the hardware behaved the way we > expected.
In recent years, MIN and MAX numbers seem to have disappeared from datasheets. Everything is "typ" and at "ambient". Worst case design practices seem to be a thing of the past. :< And, if you try to do *anything* "out of the ordinary", you're often "on your own"!
> Those tests were part of our acceptance tests for the flight electronics > and we are currently benefiting a lot from it!
(IMO) You have to adopt a similar cynicism when it comes to software. "Will every client follow these rules? What happens if they don't?" I try to formalize my contracts with assertions at the top of each function *proving* that the caller has "followed the rules" and that what I am about to do in the function is safe. E.g., ASSERT(count > 0) average = total / count I should be able to remove those ASSERT()s with no change in functionality -- they should never be tripped. (This also gives developers an unambiguous explanation of what the interface *does* guarantee.)
>> Once you identify the assumptions ("reliances"/dependencies?), >> things get easier. But, you still have to be incredibly >> honest (cynical?) in how you assess them. >> >> If you dismiss an assumption as "safe", is it *really*? >> What PREVENTS those things that "can't happen" from >> actually happening? If you can't prevent it, do you at >> least try to *detect* it (as a safeguard)? If it TRULY >> can't happen, then you should feel SUPREMELY CONFIDENT adding >> this line of code to your product: >> >> if (cant_happen) { >> give_away_all_assets(mine); >> self.shoot() >> } > > what if the cant_happen variable sits in a memory which has a bit flip?
What if the register that it gets loaded into (from UNCORRUPTED memory) gets wacked by a particle? :>
> Would you then protect the variable with a CRC and instead of a > cant_happen variable have a cant_happen() function which retrieves the > variable and calculate the crc comparing it with the stored CRC?
What if a "Jump on Non Zero" opcode gets corrupted into a "Jump on Zero" as it is fetched from memory? What if some other UNRELATED piece of code gets corrupted resulting in a *jump* directly to the self.shoot() code fragment? As I qualified Paul's comments about getting rid of assumptions... there are always some assumptions that you can't get away from. If "doing something" has disproportionate consequences, then you should try to determine the predicate condition in two different INDEPENDENT ways -- hoping that BOTH can't be wrong (bug) or corrupted at the same time. The same sort of approach can be applied to the associated hardware.
>> >> :-/ >
[much elided]

>I think synchronization is really complex whenever you are down to the >multi-thread business and/or have multiple interrupt servicing. Given >the old technology and luckily very few support for an OS (I haven't >found any), I was aiming to have a very simple, procedural design which >I believe would be much easier to test and to make it meet the specs. > >To backup a bit more this motivation I just finished to write an >extremely simple program to toggle a flag through the timer counter >interrupt. The end result is that I failed to get the period I want and >moreover is clear that interrupts are lost from time to time. > >Since in this last case I was kicking the dog with this flag, I actually >couldn't care less if I lost an interrupt as long as the period is >enough to keep the dog quiet. But I got discouraged by a post on >comp.dsp which stated: "This is embedded ABC basics: don't kick a dog in >the interrupts." but no motivation was given. > >Now my point is, how much time should I invest to make it working rather >than exploiting a totally different path? If I had an infinite time I >would probably try to make this stupid interrupt work the way I expect >but these details may delay a lot if not irreversibly the project.
[more elided] Level-sensitive interrupts could be the answer here, if you are not already using them. From memory (and it is quite a long time since I designed one in), the ADSP-21k series allow(ed) you to specifiy interrupts as either edge- or level-sensitive. --------------------------------------- Posted through http://www.EmbeddedRelated.com
Alessandro Basili wrote:
> Since in this last case I was kicking the dog with this flag, I actually > couldn't care less if I lost an interrupt as long as the period is > enough to keep the dog quiet. But I got discouraged by a post on > comp.dsp which stated: "This is embedded ABC basics: don't kick a dog in > the interrupts." but no motivation was given.
(I realize this is just a sidebar in a much more interesting post.) The motivation would have been: you should reset the watchdog in some very high level process that truly reflects that the processing is being done. Resetting the watchdog on, say, a timer interrupt keeps the watchdog happy even if the only facilities working are the timer and the interrupts. The rest of the processing could have completely fallen off the rails, and the watchdog wouldn't reset the system. You can see this happening on a desktop sometimes when the OS has become wedged, nothing is being processed, but the little arrow on the screen still moves when you move the mouse. Mel.
On Fri, 16 Dec 2011 08:58:18 -0500, Mel Wilson <mwilson@the-wire.com>
wrote:

>Alessandro Basili wrote: >> Since in this last case I was kicking the dog with this flag, I actually >> couldn't care less if I lost an interrupt as long as the period is >> enough to keep the dog quiet. But I got discouraged by a post on >> comp.dsp which stated: "This is embedded ABC basics: don't kick a dog in >> the interrupts." but no motivation was given. > >(I realize this is just a sidebar in a much more interesting post.) The >motivation would have been: you should reset the watchdog in some very high >level process that truly reflects that the processing is being done. >Resetting the watchdog on, say, a timer interrupt keeps the watchdog happy >even if the only facilities working are the timer and the interrupts. The >rest of the processing could have completely fallen off the rails, and the >watchdog wouldn't reset the system. > >You can see this happening on a desktop sometimes when the OS has become >wedged, nothing is being processed, but the little arrow on the screen still >moves when you move the mouse.
True. If one is stuck in an endless loop in the main process but the watchdog kick occurs in a timer interrupt then one can get hung without the watchdog reset occurring. One way to avoid that is to have the main loop set a permissive flag and the timer interrupt test for and reset the flag and then kick the dog. Of course, if the bit of code in the main loop that sets the flag is included in the stuck endless loop, one still ends up with a broken system. A defense against that is to use multiple flags or, perhaps, a multi-valued flag: set to 1 at the top of the main loop; somewhere inside a must-run portion, if flag is 1 then flag is 2; possibly additional if/then levels; and finally have a periodic interrupt test for the terminal value. Kick the dog only if all intermediate steps have occurred. -- Rich Webb Norfolk, VA
On 2011-12-16, Don Y <not.to.be@seen.com> wrote:
> > Using unsigned's for counts (can you have a negative number > of items?). Using relative measurements instead of absolutes > (e.g., "worksurface is 23 inches from reference; position of > actuator is 3.2 inches from edge of worksurface" contrast with > "worksurface is at 23.0, actuator is at 22.5 -- oops!") >
I strongly agree about the unsigned int issue. _Every_ integer I declare in C is unsigned unless I actually need a signed integer. I find that the number of unsigned integers in my code is _vastly_ greater overall than the number of signed integers. Personally, I think C should have made unsigned integers the default. BTW, on a related data representation note, another thing I like to do is when I build something like a state machine is to start the numbers assigned to the state symbols at value 1, instead of value 0 so that I stand a greater chance of catching uninitialised state variables. Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Simon Clubley wrote:

> I find that the number of unsigned integers in my code is _vastly_ > greater overall than the number of signed integers. > > Personally, I think C should have made unsigned integers the default.
Hell NO. Been burned many times with mixed signed/unsigned arithmetic, I consider all of the integers signed unless they explicitly meant to be unsigned.
> BTW, on a related data representation note, another thing I like to do is > when I build something like a state machine is to start the numbers > assigned to the state symbols at value 1, instead of value 0 so that I > stand a greater chance of catching uninitialised state variables.
Bad style. State variables should be special class or enumerated type. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On 16.12.2011 03:13, Alessandro Basili wrote:

> The only missing piece is the archiver (the /ar/ utility does > not support the COFF format!)
I'm quite certain you got that wrong. 'ar' doesn't even care about object file formats at all unless you try to use the 's' modifier. And I've been using ar with COFF format file for about a decade --- it's what the DOS port of of GNU tools has been using forever. You may need to use a target-specific build of 'ar', though. I.e. g21-whatever-ld and g21-whatever-gcc are meant to go with g21-whatever-ar.
On 2011-12-16, Vladimir Vassilevsky <nospam@nowhere.com> wrote:
> > Simon Clubley wrote: > >> I find that the number of unsigned integers in my code is _vastly_ >> greater overall than the number of signed integers. >> >> Personally, I think C should have made unsigned integers the default. > > Hell NO. > > Been burned many times with mixed signed/unsigned arithmetic, I consider > all of the integers signed unless they explicitly meant to be unsigned. >
Is that more to do with how C handles signed/unsigned type conversions or some issue around signed type conversions in general ? I think I know where you are coming from; I've seen reports of various type conversion issues in C which surprised me, but given the type of things I use C for (mainly low level work), I have not yet been caught by them. Still, I know this is a issue that people have differing opinions on for various reasons and I realise that not everyone prefers unsigned integers.
>> BTW, on a related data representation note, another thing I like to do is >> when I build something like a state machine is to start the numbers >> assigned to the state symbols at value 1, instead of value 0 so that I >> stand a greater chance of catching uninitialised state variables. > > Bad style. State variables should be special class or enumerated type. >
Sorry, bad wording. They are in a enumerated type; it's just that I set the first state symbol in the type to start at one instead of zero. Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
On 12/16/2011 09:01 PM, Simon Clubley wrote:

>>> BTW, on a related data representation note, another thing I like to do is >>> when I build something like a state machine is to start the numbers >>> assigned to the state symbols at value 1, instead of value 0 so that I >>> stand a greater chance of catching uninitialised state variables. >> >> Bad style. State variables should be special class or enumerated type. >> > > Sorry, bad wording. They are in a enumerated type; it's just that I set > the first state symbol in the type to start at one instead of zero.
I start at 0, and make that the initial state.