EmbeddedRelated.com
Forums

linear interpolation / Assembler / ATMega32

Started by Rolf Bredemeier June 12, 2004
On Sat, 12 Jun 2004 19:35:39 +0200, "Rolf Bredemeier"
<rolf_gsxr@gmx.net> wrote:

>Hi all, > >i`m searching an ASM-example for 2d lin. interpolation. >(No need for floating point) > >In the EEPROM is stored an table, like so > >.eseg >LTable: >.db 0, 75 >.db 40, 37 >.db 60, 48 >.db 80, 180 >.db 100, 170 > >(The first byte in every .db line is the index, the second the value.)
If possible, use a table with evenly spaced indexes, thus eliminating the need to store the index into the table. If possible, fill the tables with indexes that are spaced with some power of 2, e.g. using point values at 0, 16, 32, 48 and so on, thus the range is always 16 and the final division by 16 can be implemented by a 4 position right shift.
>Now i need the interpolated value for an index 55.
If the indexes are spaced by 16 into a byte table, simply shifting the index right by 4 places (3 in this case) will give the offset into a byte table. Thus, the value can be found from address Table+3 in this case.
> >The calculation for that is not so difficult: > > 48 - 37 >X = 37 + --------- * (55 - 40) > 60 - 40 > >So result is 45 for the index 55
Do the multiplication before division to avoid truncation problems. The multiplication should produce a 16 bit result. If the difference between the index values is 16 in the table, the division by 16 can be replaced by shifting the 16 bit product right by four places.
>But, how is this do do in Assembler? >And how to handle negative slope?
If signed multiply and division are used, there should not be a problem. If arithmetic shift right is used to handle 2's complement negative numbers, the results might not be quite as expected, since e.g. -1 >> 4 = -1. If the 16 bit multiplication product is negative, convert it first from 2's complement to 1's complement (or negate it), shift it right 4 places and then convert it back to 2's complement (or negate the result respectively). If the product was positive, just shift it right 4 places. Paul
"Rolf Bredemeier" <rolf_gsxr@gmx.net> writes:

> i`m searching an ASM-example for 2d lin. interpolation. > (No need for floating point)
You have already got several good answers, but there are two things which might help you: #1 Can the intervals in the EEPROM be a power of two? #2 There is a simple way for proper rounding of the results.
> LTable: > .db 0, 75 > .db 40, 37 > .db 60, 48 > .db 80, 180 > .db 100, 170
If you could make this table so that the intervals are powers of two, then you can replace the division by a simple shift (arithmetic shift right, ASR). Then you can also find the table position by shifting instead of dividing. This will save a lot of time and code. Let's say you have the tabulated data with 32-unit intervals. Then when you need to find the value for f(x): ; index to the table (i <- x/32, rounded down) i = x / 32 ; sub-index (remainder of the previous calculation) s = x mod 32 ; base value from the table b = t[i] ; difference to the next tabulated value d = t[i+1] - t[i] ; the result r = b + s/32 * d Now, here we still have all the precision and performance worries. This can be simplified significantly because of the simpler division and modulo with a power-of-two. i = x shr 5 s = x and 0x1f b = t[i] d = t[i+1] - t[i] r = b + (s * d) asr 5 Beware, be sure to check there is no possibility for any overflow on the last row. Also, if d can be negative, you have to use asr intead of shr. Alternatively, you may branch the algorithm: ... if t[i+1] < t[i] then d = t[i] - t[i+1] r = b - (s * d) shr 5 else d = t[i+1] - t[1] r = b + (s * d) shr 5 This enables the use of positive (unsigned) numbers only. This will give you two bits more headroom and sometimes simpler code, depending on the machine. (This does not apply to ATmega, since there is a built-in signed multiply instruction.) There is still one more problem: rounding. Basically, the standard method is to do something like this: r = b + ((s * d shl 1) + 1) asr 6 This adds a one bit bias to the first drop-out bit. Everything above 1/2 will be rounded up, just as it should be. Of course, you may shift your look-up table by one if you want to avoid the shift at this point. If you want to use this rounding algorithm with the sign branching shown above, be sure to _subtract_ the half-bit in the negative branch. Otherwise -1.5 will round to -2 instead of -1. This will introduce an odd jump around zero. Difficult to spot and gives very annoying noise (been there, done that). There are at least umpteen ways to polish the algorithm to give the fastest performance. Some machines might benefit from using biased unsigned instead of signed, and the best way to perform the look-up depends on the machine. If you have a lot of room in the memory, you may calculate the differences beforehand. If you do not have a hardware multiplier, you may be fastest and smallest off by rolling your own: (Doing this on ATmega isn't a very good idea, as there is the hw multiplier.) i = x shr 5 ; this becomes unnecessary: ; s = x and 0x1f r = t[i] d = t[i+1] - t[i] ; check the highest bit of s (16's in x) a = 0 if (x and 0x10) = 0x10 then a = a + d ; second highest (8's) d = d shr 1 x = x shl 1 if (x and 0x10) = 0x10 then a = a + d ; 4's d = d shr 1 x = x shl 1 if (x and 0x10) = 0x10 then a = a + d ; 2's d = d shr 1 x = x shl 1 if (x and 0x10) = 0x10 then a = a + d ; 1's d = d shr 1 x = x shl 1 if (x and 0x10) = 0x10 then a = a + d ; round-up a = a + 1 a = a shr 1 r = r + a In the code above, the multiplication loop has been rolled out, but there is no reason why it couldn't be written as a loop (apart from slight performance hit and need for yet another variable). Many small processors offer a simple method for checking a bit, something like "skip if bit n in register r is set". One of the advantages of this method is that it requires only one extra bit. If you can toss rounding (by taking it into account in the tabulated data!), then even this bit is not required. - Ville -- Ville Voipio, Dr.Tech., M.Sc. (EE)
Paul Keinanen <keinanen@sci.fi> writes:

> If signed multiply and division are used, there should not be a > problem. If arithmetic shift right is used to handle 2's complement > negative numbers, the results might not be quite as expected, since > e.g. -1 >> 4 = -1.
Well, no problem, as: -17 >> 4 = 11101111b >> 4 = 11111110b = -2 -16 >> 4 = 11110000b >> 4 = 11111111b = -1 ... -1 >> 4 = 11111111b >> 4 = 11111111b = -1 0 >> 4 = 00000000b >> 4 = 00000000b = 0 ... 15 >> 4 = 00001111b >> 4 = 00000000b = 0 16 >> 4 = 00010000b >> 4 = 00000001b = 1 So, everything is rounded down: -2 <= x < -1 -> -2 (bin average -1.5) -1 <= x < 0 -> -1 (bin average -0.5) 0 <= x < 1 -> 0 (bin average 0.5) 1 <= x < 2 -> 1 (bin average 1.5) The results are half-step lower than they should be (they should coincide with the bin averages). This can be corrected by adding 8 (that is 0.5 * 2^4) to the initial value: (-9 + 8) >> 4 = 11111111b >> 4 = -1 (-8 + 8) >> 4 = 00000000b >> 4 = 0 ... ( 7 + 8) >> 4 = 00001111b >> 4 = 0 ( 8 + 8) >> 4 = 00010000b >> 4 = 1 Now the limits are exactly where they shoud be: -9 / 16 = -0.5625 -> -1 -8 / 16 = -0.5 -> 0 ... 7 / 16 = 0.4375 -> 0 8 / 16 = 0.5 -> 1 Everything is rounded to the nearest integer, halves are rounded up. So, the plain ASR works fine apart from the average -0.5 bias in the result. The bias can be compensated for in the tabulated data or by adding the half before shifting. The worst thing you may do is to unsignedify the number, round it down, and then put the sign back. Because: -17 / 16 -> -(17 >> 4) -> -1 -16 / 16 -> -(16 >> 4) -> -1 -15 / 16 -> -(15 >> 4) -> 0 ... 0 / 16 -> ( 0 >> 4) -> 0 ... 15 / 16 -> (15 >> 4) -> 0 16 / 16 -> (16 >> 4) -> 1 The function becomes dead around zero! The result is zero over two units, and everything linear becomes non-linear. Cross-over distortion sets in... Unfortunately, this is exactly what C does when you write: int z, x, y; z = x / y; There is a technical reason for this; division is easier to do, if you do not need to write different branches for different signs. C rounds towards zero. Not up, not down, but towards zero. (This may even be a desired behaviour in some cases, but with measurements it is a very bad thing.) One simple way around this is to use biased unsigned divisions: // bias by 15: z = (x + 15 * y) / y - 15; If x + 15 * y always >= 0. This is rather easy with constant divisors. With variable divisors finding a suitable constant is very difficult. Pick a too low one, and the sum will underflow. Pick a too high one, and the sum will overflow. The case with the possibility of a negative divisor is even more complicated. Then the only alternative is to branch the calculations by the signs before doing anything; int x, y, z; unsigned int ax, ay, az; bool sign; // find out the sign of the result and the absolute values of x, y if (x < 0) { ax = -x; sign = false; } else { ax = x; sign = true; } if (y < 0) { ay = -y; sign = ~sign; } else ay = y; // if the sign of the result is negative, round away from zero if (sign == false) { az = (ax + ay - 1) / ay; z = -az; } else { az = ax / ay; z = az; } Yes, there are a lot of extra variables which may be avoided by using type casts. However, I think it is better avoid the type casts and let the compiler get rid of the extra variables. So, twenty-something lines of code to achive a very simple thing. The good thing is that this does not give a very large performance hit, as the signed division algorithm does this anyway. To make the algorithm round right (towards nearest, halves up) requires only the following changes: negative branch: az = (ax + ay - 1 - ay/2) / ay positive branch: az = (ax + ay/2) / ay Even though it is very tempting to say: ay - 1 - ay/2 == ay/2 - 1, don't. Because it isn't. For, e.g., ay = 17: 17 - 1 - 8 == 8 - 1 8 == 7 Oops. I know this is somewhat complicated. The effect of false rounding is usually quite small. I personally crashed into this well-known (generally, that is, not well-known by me) problem when I was looking at a F-transform of some real-world data. There were some harmonics there shouldn't've been. I scratched through the skin of my head before finding out the problem. If off-by-one is bad, off-by-half is not much nicer, either. - Ville -- Ville Voipio, Dr.Tech., M.Sc. (EE)
Hello Ville, hello Paul!

I'm very astonished about the lot of answers you
have written for me.

Many, many thanks! :-))

So my goal is to do the implementation personally.
While doing this, i will learn much for writing
better asm code.

Perhaps, in 10 or 12 years, i can help other
people for implementing algorithm! ;-)

Thanks again for the lot of time you spend to me.
Now i will try it!

Best regards, Rolf


Ville Voipio <vvoipio@kosh.hut.fi> wrote in message news:<i3k3c4zg27a.fsf_-_@kosh.hut.fi>...
> Paul Keinanen <keinanen@sci.fi> writes: >
---snip---
> > The worst thing you may do is to unsignedify the number, round > it down, and then put the sign back. Because: > > -17 / 16 -> -(17 >> 4) -> -1 > -16 / 16 -> -(16 >> 4) -> -1 > -15 / 16 -> -(15 >> 4) -> 0 > ... > 0 / 16 -> ( 0 >> 4) -> 0 > ... > 15 / 16 -> (15 >> 4) -> 0 > 16 / 16 -> (16 >> 4) -> 1 > > The function becomes dead around zero! The result is zero over > two units, and everything linear becomes non-linear. Cross-over > distortion sets in... > > Unfortunately, this is exactly what C does when you write: >
---snip--- Hi Most high level languages round towards zero, mostly because it is traditional. Doing floored rounding makes more sense but even that has to be done carefully. In something like DSP, one can accumulate a DC offset because of floored rounding. As you point out, you get a cross over distortion for rounding towards zero so there is no absolutely right way to do this. One has to look carefully at the application one intends to use it in before determining which way to go on this. One might even use both methods in the same stretch of code. I once designed a XY table that requires 24 bit math that a fellow was doing 16 bit math by simply extending it to 32 bits while using the round towards zero. It was a disaster. The table would occasionally jump by a sizable amount while incrememnting by a joy stick. This was a table that should have positioned to about two 10/1000th of an inch. This was traced to his round towards zero math in the Pascal he was using to do the extended math. It would have been nice if the uP makes had included both types in the hardware so we could choose which to use and where. Dwight
On 14 Jun 2004 17:48:02 -0700, dkelvey@hotmail.com (dwight elvey)
wrote:

> Most high level languages round towards zero, mostly >because it is traditional.
I think it should be more appropriate to talk about truncation in this case and this behaviour makes sense in many situations. IMHO, rounding should be reserved for rounding towards the nearest integer. At least old Pascal did not allow direct assignment of floating point values to integer, but instead the trunc or round functions had to be used.
>Doing floored rounding makes >more sense but even that has to be done carefully.
This is a third method and makes sense in this interpolation case, however, this is more of a special case.
>In something like DSP, one can accumulate a DC offset >because of floored rounding. As you point out, you >get a cross over distortion for rounding towards zero >so there is no absolutely right way to do this.
It should be noted that if the hardware uses 1's complement or sign/magnitude representation, there is the problem with +0 and -0 and thus the potential for "cross over distortion". Paul
On Sun, 13 Jun 2004 07:38:13 +0200, "Rolf Bredemeier" 


>In the subject of my posting the type of MC is given, AVR ATMega32. >This device does not support DIV, and has only 8 bit Registers. >But math software routines are available. > >Unfortunately i'm not an good asm-programmer.
Than I'd say: Write it in C, compile it to assembly and see where you can improve it. But first: Improve your C code ! -- 42Bastian Do not email to bastian42@yahoo.com, it's a spam-only account :-) Use <same-name>@epost.de instead !
Ville Voipio <vvoipio@kosh.hut.fi> wrote in message news:<i3k7jubg5j5.fsf@kosh.hut.fi>...
> "Rolf Bredemeier" <rolf_gsxr@gmx.net> writes: > > > i`m searching an ASM-example for 2d lin. interpolation. > > (No need for floating point) > > You have already got several good answers, but there are two > things which might help you: > > #1 Can the intervals in the EEPROM be a power of two? > #2 There is a simple way for proper rounding of the results. > > > > LTable: > > .db 0, 75 > > .db 40, 37 > > .db 60, 48 > > .db 80, 180 > > .db 100, 170 > > If you could make this table so that the intervals are powers > of two, then you can replace the division by a simple shift > (arithmetic shift right, ASR). Then you can also find the > table position by shifting instead of dividing. This will > save a lot of time and code. > > Let's say you have the tabulated data with 32-unit intervals. > Then when you need to find the value for f(x): > > ; index to the table (i <- x/32, rounded down) > i = x / 32 > > ; sub-index (remainder of the previous calculation) > s = x mod 32 > > ; base value from the table > b = t[i] > > ; difference to the next tabulated value > d = t[i+1] - t[i] > > ; the result > r = b + s/32 * d > > Now, here we still have all the precision and performance worries. > This can be simplified significantly because of the simpler division > and modulo with a power-of-two. > > i = x shr 5 > s = x and 0x1f > b = t[i] > d = t[i+1] - t[i] > r = b + (s * d) asr 5 > > Beware, be sure to check there is no possibility for any > overflow on the last row. Also, if d can be negative, you have to > use asr intead of shr. Alternatively, you may branch the > algorithm: > > ... > if t[i+1] < t[i] then > d = t[i] - t[i+1] > r = b - (s * d) shr 5 > else > d = t[i+1] - t[1] > r = b + (s * d) shr 5 > > This enables the use of positive (unsigned) numbers only. > This will give you two bits more headroom and sometimes simpler > code, depending on the machine. (This does not apply to ATmega, > since there is a built-in signed multiply instruction.) > > There is still one more problem: rounding. Basically, the standard > method is to do something like this: > > r = b + ((s * d shl 1) + 1) asr 6 > > This adds a one bit bias to the first drop-out bit. Everything > above 1/2 will be rounded up, just as it should be. Of course, > you may shift your look-up table by one if you want to avoid the > shift at this point. > > If you want to use this rounding algorithm with the sign branching > shown above, be sure to _subtract_ the half-bit in the negative > branch. Otherwise -1.5 will round to -2 instead of -1. This > will introduce an odd jump around zero. Difficult to spot > and gives very annoying noise (been there, done that). > > There are at least umpteen ways to polish the algorithm to > give the fastest performance. Some machines might benefit from > using biased unsigned instead of signed, and the best way to > perform the look-up depends on the machine. If you have a > lot of room in the memory, you may calculate the differences > beforehand. > > If you do not have a hardware multiplier, you may be fastest > and smallest off by rolling your own: (Doing this on ATmega > isn't a very good idea, as there is the hw multiplier.) > > i = x shr 5 > > ; this becomes unnecessary: > ; s = x and 0x1f > > r = t[i] > d = t[i+1] - t[i] > > ; check the highest bit of s (16's in x) > a = 0 > if (x and 0x10) = 0x10 then > a = a + d > > ; second highest (8's) > d = d shr 1 > x = x shl 1 > if (x and 0x10) = 0x10 then > a = a + d > > ; 4's > d = d shr 1 > x = x shl 1 > if (x and 0x10) = 0x10 then > a = a + d > > ; 2's > d = d shr 1 > x = x shl 1 > if (x and 0x10) = 0x10 then > a = a + d > > ; 1's > d = d shr 1 > x = x shl 1 > if (x and 0x10) = 0x10 then > a = a + d > > ; round-up > a = a + 1 > a = a shr 1 > > r = r + a > > > In the code above, the multiplication loop has been rolled out, > but there is no reason why it couldn't be written as a loop > (apart from slight performance hit and need for yet another > variable). Many small processors offer a simple method for > checking a bit, something like "skip if bit n in register r is > set". > > One of the advantages of this method is that it requires only > one extra bit. If you can toss rounding (by taking it into > account in the tabulated data!), then even this bit is not > required. > > - Ville
Here's a suggestion. Do you have a C compiler for your CPU. If not try to get a demo version. THen write the solution is C and look at the asm output. Then you have the asnwer you're looking for. George