EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Same code, same data, different results

Started by Tim Wescott October 6, 2015
Paul Rubin wrote:
> Tim Wescott <seemywebsite@myfooter.really> writes: >> I was thinking of doing something like your UDP suggestion. If Labview >> provides a mechanism for doing it without even interfacing with a DLL I >> may go there -- I'll have to see what NI says about it. > > 1) If you mean UDP to a customer site 14 hours away by air, you probably > better use TCP to not get further confused by lost packets.
Oh, probably not.
> Visual > Studio may supply some convenient drag and drop tools for this TCP > wrapping, using what used to be called Windows Automation interfaces.
If that's what I think it is, it bit me - hard - last year.
> Labview might also support that. I used Automation in the distant past > but I'm way behind the times about how it's done now. > > 2) This doesn't sound like a calling format issue though, since the > customer's installation works most of the time. >
-- Les Cargill
On Monday, October 12, 2015 at 6:55:34 PM UTC-4, Clifford Heath wrote:
> On 12/10/15 23:55, Dave Nadler wrote: > > I last saw one in 1975, as I was running faster than it was... > > It was running, but you saw it coming ;) > > The 7/16 was a 16-bit machine, obviously. We learned low-level assembly > language programming on it - and that train set sure could produce a lot > of interrupts - every current spike from the wheels of eight locos > rolling along any of the 30-ish segments of track. It was amazing that > the low-priority (non-interrupt) code got anything done.
I think I used a 7/32 to write some microcode for fast graphics; we had a home-brew memory-mapped color display (super-high-tech for 1975). This was at the Architecture Machine Group, shortly after renamed the MIT Media Lab, working for Nicholas Negroponte, before I moved over to Project MAC (Laboratory for Computer Science). 40+ years ago! Yikes...
Dave Nadler <drn@nadler.com> wrote:


(snip on the Interdata 8/32, then I wrote)`
>> Is the 7/32 the same? >> When did you last see one run?
> IIRC from 40 years ago, the 8/32 was a slightly-enhanced 7/32, but I > don't remember the difference. I remember colleagues trying to convince > engineers from Perkin-Elmer to add MMU and paging support, to which the > Perkin-Elmer guys said "Paging? that went out with the PDP-8!". > I last saw one in 1975, as I was running faster than it was...
I saw one last month. Anyone remember Unix from before the cd command, when you had to chdir, instead? (and before alias) -- glen
On Mon, 12 Oct 2015 15:48:45 -0700, Paul Rubin
<no.email@nospam.invalid> wrote:

>Tim Wescott <seemywebsite@myfooter.really> writes: >> I was thinking of doing something like your UDP suggestion. If Labview >> provides a mechanism for doing it without even interfacing with a DLL I >> may go there -- I'll have to see what NI says about it. > >1) If you mean UDP to a customer site 14 hours away by air, you probably >better use TCP to not get further confused by lost packets.
The point was to run LabView and a separate application process in the customer's embedded machine and these would communicate with each other within that machine using UDP. Then you use Remote Desktop or Telnet to communicate with that embedded system from the other side of the world.
upsidedown@downunder.com wrote:
> On Tue, 06 Oct 2015 22:19:53 -0500, Tim Wescott > <seemywebsite@myfooter.really> wrote: > >> On Wed, 07 Oct 2015 01:36:33 +0000, Rob Gaddi wrote: >> >>> On Tue, 06 Oct 2015 19:35:13 -0500, Tim Wescott wrote: >>> >>>> This is about code that clings to "embedded" by it's fingernails -- >>>> it's running on a fast PC-compatible single-board computer, under >>>> Windows, as a DLL. So it's not exactly some little thing shoehorned >>>> into 4kB of flash. > > Exactly what kind of processor is the target using and from which > manufacturer ? There might be some minor differences e.g. in IEEE > floating point such as handling of non-normalized values.
* On rare occasion i have seen weird results. Like <fpvar1>-<fpvar2> producing what is technically minus zero, and depending on the environment, it can act as a true zero or can make a program go tits up. Gotta look at the binary results to find that negative zero...
> >>>> >>>> At any rate: >>>> >>>> I have a rather complicated algorithm that I've coded up, to do >>>> marvelous stuff for my customer. It recently grew quite a bit, and in >>>> the process I've introduced some subtle bugs. I'm looking for ideas on >>>> things to look for to see if I can figure out what's going on. >>>> >>>> Here's the deal: >>>> >>>> First, some time this spring I got a shiny new machine, and went ahead >>>> and loaded 64-bit Linux onto it, with all its 64-bit appurtenances. >>>> This did not, at the time, cause problems. >>>> >>>> I coded up a bunch of changes, tested it on my 64-bit machine, and >>>> happily shipped it off to my customer -- who reported that it broke, >>>> horribly. >>>> >>>> Oh drat. On top of this, at some point the MinGW stream library broke, >>>> so my test code no longer worked under Wine -- I could only test with >>>> the Linux version. >>>> >>>> After much trial and tribulation, I managed to get Linux 32 and 64-bit >>>> versions, and Windows 32-bit versions all working. I tracked down my >>>> problems (size_t and unsigned int are not the same size in gcc 64 bit >>>> for Linux), fixed them, and shipped. >>>> >>>> So now I'm getting four different results from three different software >>>> loads and two different circumstances. I can't go into detail, but I'm >>>> going to give a general story 'cause I'm looking for general things to >>>> look for: >>>> >>>> Under Linux 32-bit I get behavior A (correct operation) >>>> >>>> Under Linux 64-bit I get behavior B (correct operation, just different) >>>> >>>> Under Wine running a 32-bit Windows program I get behavior B >>>> >>>> My customer calls my DLL from Labview. Nine times out of ten he gets >>>> some correct behavior -- he's not sophisticated enough that I can know >>>> whether it's A, B or something else. The tenth time the thing fails to >>>> work correctly. >>>> >>>> So, I suspect that I've got some uninitialized memory someplace. But, >>>> I'm running the Linux versions under Valgrind and it's not finding any >>>> problems (Valgrind is great, by the way -- great enough that for my >>>> embedded ARM stuff I do unit testing under Linux and Valgrind). >>>> >>>> I'm going through the code with a fine-toothed comb, and so far I've >>>> only found a few very minor problems that border on the stylistic, >>>> although one of the changes that I made did improve things a bit. >>>> >>>> So -- other than picking through the code line by line, can you guys >>>> suggest anything that I can do or look for in specific? >>>> >>>> Also, does anyone know of a Linux tool that'll randomly populate the >>>> heap with junk then call a program? I suspect that I'm not seeing the >>>> "sometimes it is, sometimes not" behavior that my customer is because >>>> of the different environment, not because Linux is magically fixing my >>>> bugs. Suggestions on how to make the bugs apparent would be helpful. >>>> >>>> Thanks for reading, suggestions welcome -- I'm becoming a candidate for >>>> a rubber room over this one. >>> >>> Without getting into the A/B specifics, is the difference something that >>> could be chalked up to floating point error? >> >> Between A and B, yes. In fact, it was tweaks to some floating point >> calculations to make them more kosher that caused the change in the >> Windows version. >> >> However, the customer's one out of ten problem is, I'm pretty sure, >> different -- first, because it's a failure and not just a little >> difference, and second, he's running the same file through all the time, >> and occasionally it's spitting up. I don't know what could cause that in >> my code other than using an uninitialized variable. > > Sometimes an interrupt occurs during your code and sometimes not ? > Any bugs in the interrupt processing (either HW or SW) would cause > such problems. > >> >> It may possibly be a bug on his side, but I don't want to start pointing >> at his side of things unless I'm pretty certain of mine. >
On Fri, 30 Oct 2015 22:33:07 -0800, Robert Baer
<robertbaer@localnet.com> wrote:

>upsidedown@downunder.com wrote: >> On Tue, 06 Oct 2015 22:19:53 -0500, Tim Wescott >> <seemywebsite@myfooter.really> wrote: >> >>> On Wed, 07 Oct 2015 01:36:33 +0000, Rob Gaddi wrote: >>> >>>> On Tue, 06 Oct 2015 19:35:13 -0500, Tim Wescott wrote: >>>> >>>>> This is about code that clings to "embedded" by it's fingernails -- >>>>> it's running on a fast PC-compatible single-board computer, under >>>>> Windows, as a DLL. So it's not exactly some little thing shoehorned >>>>> into 4kB of flash. >> >> Exactly what kind of processor is the target using and from which >> manufacturer ? There might be some minor differences e.g. in IEEE >> floating point such as handling of non-normalized values. >* On rare occasion i have seen weird results. > Like <fpvar1>-<fpvar2> producing what is technically minus zero, and >depending on the environment, it can act as a true zero or can make a >program go tits up. > Gotta look at the binary results to find that negative zero...
Doing fpvar1-fpvar2 when they are nearly but exactly as big will cause a lot of problems. There might be very few significant bits (perhaps only one) and the rounding/truncating mode might make things worse. Result = BigMultiplier * (fpvar1-fpvar2) ; Might give different results compared to SmallDiff = fpvar1-fpvar2 ; /* Some other instructions */ Result = BigMultiplier * SmallDiff ; Especially if the first example is executed on the x87 80 bit stack and then rounded or the latter with 32 bit SIMD instructions only. Make sure you have proper divide by zero trapping also for 1/-0. Also check that you handle NaN (Not a Number) cases properly. You should try to rearrange your algorithms to avoid fpvar1 - fpvar2 cases in which both are nearly equal. If you accept IEEE floating values from some external device (such as PLC), at entry to your system, check for NaN,+/-INF and non-normalzed values and handle it properly, before letting it go deep into your calculations, causing havoc like NaN * -INF calculations.
The 2026 Embedded Online Conference