EmbeddedRelated.com
Forums

Adding a percentage to a 16-bit number in 8051

Started by mik3ca 5 years ago15 replieslatest reply 5 years ago1105 views

I am currently using early 8051's in my major project (specifically AT89S52 and AT89C4051 with 22.1184Mhz crystals attached), but one thing I want to be able to do with them is take a 24-bit number stored in XRAM and add a percentage to it. I can easily add two 24-bit numbers together using the following code:

;Load data from NUMBER1 to R4:R3:R2
;R4=most significant byte
;Address NUMBER1+2=most significant byte
mov DPTR,#NUMBER1
movx A,@DPTR
mov R2,A
inc DPL
movx A,@DPTR
mov R3,A
inc DPL
movx A,@DPTR
mov R4,A
;Load old result from NUMBER2 and store new result back into NUMBER2
mov DPTR,#NUMBER2
movx A,@DPTR
add A,R2
movx @DPTR,A
inc DPL
movx A,@DPTR
addc A,R3
movx @DPTR,A
inc DPL
movx A,@DPTR
addc A,R4
movx @DPTR,A


Anyways, What I really want to do is take a 24-bit number and add a percentage to it. The original 24-bit number can be overwritten with the new result.

Here are my thoughts about solving this problem...

Thought 1 - slowest: Increment the memory address by 1 until the full percentage is reached and decrement by something? But then incrementing alone will be a pain as it takes at least 3 cycles per increment and slowness is not desired.

Thought 2 - Shifting: If the percentages to add were of fixed values and were always not prime (example: 2, 4, or 8%), then I could shift the whole 24-bit number a few times and shift again, but afterwards, that's still mundane slow.

Thought 3 - A series of 8-bit MUL: Could I somehow use MUL and something as a carry much like how ADD and ADDC can produce a carry?

As for normal calculation of a percent, I would generally take the number and multiply it by whatever the percent value is at the time and divide by 100 and add the percent value to the result. For example, if the percent value is 13, then I'd multiply the result by 13 and divide by 100. Both multiply and divide are very expensive operations especially when dealing with 16-bits on an 8-bit micro.

How can I tackle this issue without wasting a ton of clock cycles? P.S. let's assume the percent to add is 25.64. Then I would have to shift it to do any math and that makes 2564 a 16-bit number. Any ideas where to start? I mean I searched for this on the internet and found nothing.
[ - ]
Reply by mr_banditDecember 7, 2018

Not quite understanding what you want to do.

1. what are the 24-bit numbers for?

2. are you using fixed point math (eg the 24-bit numbers already *100 for two-bits of resolution)?

2. if your percentage calculates to 25.64, and your results are not using fixed point, the .64 is rounded out

3. shifting is the quickest way to multiply/divide for factors of 2. If you want to multiply by, say 3, binary is 0011. Take a mask of 1000 and AND it with your mult factor. if the AND != 0, then shift by the bit position of the mask.

Remember, multiply is like repeated adds, divide like repeated subtracts. By looking at the binary values of one of the values (multiply, does not matter; divide you need the divisor [denominator] (x/y  .. divisor is y). If a bit in the divisor is 1, then shift right by the bit index. Basically, the concept is to add or subtract the powers of 2.

A * 3 == A + A + A == 2A + A

A * 5 == A + A + A + A + A = 4A + A ; 5 == b0101

A * 4 == A + A + A + A == 4A; ; 4 = b0100

mask == b0010; mult factor == 0011, AND == 0010: shift left 1 times (== 2A) and add A (+A)

4. No, silly boy, CARRY is for adds.

5.  Why do you care about clock cycles? If this project is so time critical, use something else. (Yes, I realize 40% - last stat I saw - of the micro market is 8051.) 


[ - ]
Reply by mik3caDecember 7, 2018

>> 1. what are the 24-bit numbers for?

24-bit is the maximum bit capacity. Meaning the numbers I want to store would be as large as 1,000,000, so therefore I need to use 3 consecutive bytes of memory in an 8051 to store such a large number

>> 2. are you using fixed point math (eg the 24-bit numbers already *100 for two-bits of resolution)?

I plan to use fixed point math on the percent values.

>> 2. if your percentage calculates to 25.64, and your results are not using fixed point, the .64 is rounded out

Ok so it seems my first step is to multiply by 100.

>> 3. shifting is the quickest way to multiply/divide for factors of 2. If you want to multiply by, say 3, binary is 0011. Take a mask of 1000 and AND it with your mult factor. if the AND != 0, then shift by the bit position of the mask.

So I'm supposed to AND an 8 with the number I'm multiplying with? I don't get it.

Now if I were to multiply 257 by 14 in 8051, I'd do this:

;R7:R6 = our big number
;R3:R2 = Answer
NUM1 equ 257
MULTIPLIER equ 14
mov R7,#((NUM1 SHR 8) AND 255) ;R7=high byte of number
mov R6,#(NUM1 AND 255)         ;R6=low byte of number
mov A,#MULTIPLIER
clr C
mov R1,#8h ;counter for every bit in multiplier byte
count:
rrc A
mov R5,A ;save accumulator
jnc noadd
mov A,R2 ;then add shifted 16-bit number to answer
add A,R6
mov R2,A
mov A,R3
addc A,R7
mov R3,A
noadd:
clr C ;shift original number over by 1
mov A,R6
rlc A
mov R6,A
mov A,R7
rlc A
mov R7,A
mov A,R5 ;restore accumulator
djnz R1,count

I might have fudged up something somewhere but what I done was basically shift and add for each byte that is set.

>> 5.  Why do you care about clock cycles? 

Same reason why people want speed. lol. I mean I want something of reasonable speed like several microseconds, not something that takes one second to process.


[ - ]
Reply by mr_banditDecember 7, 2018

-- You

>> 1. what are the 24-bit numbers for?

24-bit is the maximum bit capacity. Meaning the numbers I want to store would be as large as 1,000,000, so therefore I need to use 3 consecutive bytes of memory in an 8051 to store such a large number

>> 2. are you using fixed point math (eg the 24-bit numbers already *100 for two-bits of resolution)?

I plan to use fixed point math on the percent values.

>> 2. if your percentage calculates to 25.64, and your results are not using fixed point, the .64 is rounded out

Ok so it seems my first step is to multiply by 100.

--- Me:

I would recommend just keeping everything in fixed point (mult by 100).

For this you need 4 bytes. 1,000,000 * 100 == 100,000,000 = 0x05F5E100. But it seems the extra few bytes for your storage is not a big deal.


--- you

>> 3. shifting is the quickest way to multiply/divide for factors of 2. If you want to multiply by, say 3, binary is 0011. Take a mask of 1000 and AND it with your mult factor. if the AND != 0, then shift by the bit position of the mask.

So I'm supposed to AND an 8 with the number I'm multiplying with? I don't get it.

--- me

Work through the math on paper. Remember mult is repeated add, division is repeated subtraction.

The key is one of the values (denominator in terms of division) represents the *number* of subtractions (for division) or adds (for mult). Each bit in that value specifies the number, specified in powers of 2, of the subtraction or adds.

A * 5 == A + A + A + A + A = 4A + A ; 5 == b0101

start: accumulator = 0

mask   mult  AND    action             accum

1000   0101   0     none                 0

0100   0100   1     accum += A << 2    4*A

0010   0101   0     none               4*A

0001   0101   1     accum += A << 0    5*A

For subtraction, same actions, but accum = A at the start, and "mult" is the denominator

-- you

>> 5.  Why do you care about clock cycles? 

Same reason why people want speed. lol. I mean I want something of reasonable speed like several microseconds, not something that takes one second to process.

-- me

If this is not a timing critical system (if it is, don't use an 8051), the issue is: can a human detect the difference? Does the human care?

If the answer is "human cannot detect, or care", then time what happens and get some real data.

What is the project?

If the answer is this is timing critical, the 8051 is probable not the right solution. The one 8051 project I was on, the human could not tell the difference between a second or two. It was fast enough for the system.

Personally, I am an AVR freak, and there are plenty of arduino clones out there (on small PCBs) that it is my goto choice. If I really need the horsepower, I will go with an arm, like a teensy 3.x

I hope this helps.

[ - ]
Reply by jorickDecember 7, 2018

I scraped the internet and found a discussion at http://www.8051projects.net/t9247/general-help-gui... where a zip file containing a number of 8051 math functions can be found, http://www.8051projects.net/files/public/120989391...  You may find the multiplication and division functions enlightening.

The multibyte multiplication uses multiple MUL instructions to do the job, but the multibyte division just uses shifting.

Although the 8-bit MUL and DIV instructions take only 4 instruction cycles each, multibyte math is always costly, especially with division.

There are shortcuts you can take but these reduce accuracy.  For example, dividing by 3 can be done by multiplying by 85 and dropping the least significant byte, but the result will be 0.3% too low.

[ - ]
Reply by CustomSargeDecember 7, 2018

Howdy, can't call it efficient, but when I need decimal operations on a binary, I convert first. To add a percentage operation on a 16 bit I'd:

> allocate & zero 5 count registers, 1/decade

> copy value to temp register, subtract 10000, if positive result, increment reg#5, subtract 10000 from value, repeat until negative result and exit before value sub.

> do same with 1000 & reg#4, then 100 & reg#3, etc.

> regs 5-1 are now decimal of value.

> for integer percent, convert to decimal (if not already there).

> use the percent decimal registers as counters to loop / add reg#x back into value.

> for fractional percentage, yes multiplying up to integer is required. But, you're only doing this once.

> this seems like a laborious process, but all the operations are fairly low cycle count and it's easy to write & debug.

I wrote this to calc the 16 bit number percentage of a 16 bit. It's a framing hammer on a finishing nail, but it works.  Good Hunting...   <<<)))

[ - ]
Reply by mik3caDecember 7, 2018
>> I wrote this to calc the 16 bit number percentage of a 16 bit. It's a framing hammer on a finishing nail, but it works.  Good Hunting...   <<<)))

code? where can I find it?

[ - ]
Reply by CustomSargeDecember 7, 2018

My code was for a customer's project and in Freescale (NXP) 68HC9S08 assembler, so not any help. For your task it's just convert to decimal then add back in those values for the percent required. As mentioned elsewhere it uses repeating subtraction instead of division and addition instead of multiply. 

[ - ]
Reply by Bob11December 7, 2018

On a small micro like an 8051 the details are all-important. That's because the general way to do what you're trying to do is to use floating point math or to use a Q-format decimal number (see Wikipedia for Q-format). Both take a lot of time and resources on an 8051. Often stating the exact problem allows for approximations and optimizations, and that's the only way you're going to get speed out of an 8051. For example, if decimal percentages are OK (ie 1% to 99%) it's much faster to just use the percentage as an index into a 99-element lookup table to fetch the appropriate multiplier (and of course testing for the fast special cases of 0% and 100%).

Keep in mind that adding two numbers requires one extra bit, eg, adding two 8-bit numbers requires 9 bits (the carry bit is used for this). However, multiplication requires twice the number of bits, eg, two 8-bit numbers multiplied together require 16 bits to hold the result. The exception is when the numbers are exponentially encoded, like floating-point for example. However, floating point math requires a lot of slow bit-twiddling when there's no FPU.

In a later reply you state you need up to 1E6 as a number, which is where your 24-bit number comes from. If you also need a fractional percent as in your example, you should look at something like Q24.8. That will require 32 bits for the numbers and 64 to hold the temporary results, but you won't lose any precision. With your percent value example of '25.64' you're implying you need to keep a lot of precision. In any case fractional math operations are going to be slow on an 8051. A more precise statement of the boundaries of the problem domain and the required precision might generate some replies with ideas for shortcuts. I'd also recommend the book "Hacker's Delight", which is full of mathematical tricks and shortcuts for computer math.

[ - ]
Reply by dbrion06December 7, 2018

Do you need an accurate result?

For instance, if you decide to add a fraction of 1024 (instead of a fraction of 100) division is straightforward (shifting by 10).... Remains multiplications (and 2.4% loss of accuracy can be handed by a 100 element table)

[ - ]
Reply by mik3caDecember 7, 2018

The part of my project I need this for is a bank system in a game. What needs to happen is that at any given time, one decides to borrow any amount of cash up to $999.99 and the bank will add a surcharge. the user will be notified of this surcharge as a percentage.

The rest of the system I can figure out myself, its just adding the surcharge to the borrowed amount I'm having difficulty with, just because the surcharge amount is not a fixed number, its based on the borrowed amount.

For example, If the user chooses to borrow $2.88 and the bank fee posted is 2.1% then the total borrowed amount will be $2.94. ($2.88 + $0.06048 (but I'll round to the nearest cent so its $0.06)).

[ - ]
Reply by matthewbarrDecember 7, 2018

Normally one would write code like this in C, you'd do the math with floats and convert back to integer. The overhead is not bad, particularly if you avoid division, eg. multiply by 0.01 rather than divide by 100.0, always a good habit.

If you are bound and determined to nut this out in assembler with integer math it appears you got trapped in some sort of a worm hole and it is 1980 where you are, and congratulations on proving that it is possible to travel backwards in time! :-)

One way folks used to represent numbers for these kinds of exercises when writing everything in assembler was in BCD (binary coded decimal). Each decimal digit 0-9 is represented as a 4-bit value, packed BCD gives you two decimal digits per byte, so in 24 bits you can easily represent 999.99. A 4-bit digit shift implements a multiply or divide by 10.

Now you have to do BCD arithmetic to take your percentage. Lets drop the decimal point and consider that we have an integer penny value in BCD, so $456.78 becomes 0x04_56_78. To compute percentage values with 0.1% precision, eg. 0.1%, 0.2% and so on, you can use 10%, 1% and 0.1% of your original value.

You have a spare BCD digit on the left, I suggest using this and a guard byte (two digits) in the LS position to achieve 3 overall LS guard digits. This allows you to do all your math as 32-bit packed BCD arithmetic with no loss of precision.

For example, lets take 34.7% of $987.65. Represent the integer pennies value as 0x98_76_50_00. The 10%, 1% and 0.1% values are respectively 0x09_87_65_00, 0x00_98_76_50 and 0x00_09_87_65.

To compute your percentage you'd add 10% 3 times, 1% 4 times and 0.1% 7 times. This all adds up to 0x34_27_14_55. Your answer is 0x03_42_71 but you need to pay attention to those three guard digits 4_55. If they were >= 5_00 you should round up to 0x03_42_72, but in this case you shouldn't.

Note that $987.65 * 0.347 = $342.71455 which is $342.71 correctly rounded.

You'll need a 32-bit packed BCD addition function. It should add the digit positions pair-wise starting from the LS digit, computing the BCD result and keep track of carry-in to the next most significant digit position.

So this is really pretty simple. Generate the three partial products, accumulate each of them a variable number of times based on the digits in your percentage target, then shift and round the result. This requires only integer shift and addition, no lookup tables.

The only rub you may run into is your input and output format. If you can convert text to BCD and vice versa directly, this is pretty trivial. If you have to accept and return 24-bit binary values, you will also need to create binary-to-BCD and vice versa conversion routines along with your BCD add function. These aren't difficult and don't involve multiplication or division, and you can probably find BCD addition and BCD<->binary conversion examples on the web. The initial part of CustomSarge's solution is essentially a binary to BCD conversion.

Again, things really aren't done with BCD any more. The overhead for a little floating point multiplication just isn't that great, the time to code the float solution in C/C++ is negligible, the code is much more portable, and you have far fewer range and precision limitations.

[ - ]
Reply by mik3caDecember 7, 2018

Ok, the multiplication I figured out and I'm doing it as follows:

Lets call X the input 24-bit value, 
and Y the multiplier, and Z the answer.
if the LSB is set in Y, then X is added to Z.
Then Y gets shifted right by 1 and X is shifted left by 1.
Process continues until the input value is zero and the carry is zero.

Now for division, I did figure it out but I could use improvement since I'm always dividing by 100...

Here's my algorithm so far that's slowish but works:

Let Y be a 24-bit value from ram to initially
equal 11001000:00000000:00000000b 
(equals 100 already shifted over to save clock cycles) 
set N to 17 (number of shifts to make that large number = 100 unshifted)
Let Z be the 24-bit answer which is initially set to 0.
Let X be the 24-bit number we want to divide by 100.
Then in the loop...
shift Y to the right once
result = X - Y
only store result as X if there's no carry.
shift the inverted carry value left by 1 into the answer.
repeat loop 17 times

I borrowed part of my idea from http://8052mcu.com/div16

But now I need to do the division faster somehow because in my code I'm using 3 subb's and 3 temporary storage variables and somehow I think I could get away with one temporary variable if I did my division in reverse somehow. So I attempted this and got massive failures:

Set N (loop count) to 24 (24-bit)
Set X (8-bit value to check for 100) to 0.
Set Y to original 24-bit number
Set Z = 24-bit result which initially = 0.
Inside loop...
clear carry
shift Y right by 1 (to carry) **
shift X left by 1 and include carry in MSB **
clear carry
subtract 100 from X
if carry isn't set then make X the result of subtraction
Shift carry left into Z
repeat 24 times to complete loop
What could I be doing wrong?
[ - ]
Reply by dbrion06December 7, 2018

Do you need to do divisions?

I do not know, but your bank might have fixed interest rates (say, 2.1%). and you could decide to store the interests  for a given quantity (say 65536 * 1$ * 2.1 == 137625.6 ca  137626) -this may work for a list of interest rates).

Then suppose you borrow xx $ from your bank: you multiply xx by  137626 , then you "divide" by 65536: but dividing by a power of 2 is just shifting (in this case, it amounts to mere copies of two bytes, way faster than successive; like, in the decimal word, when you divide by 10, 100 ....)  

[ - ]
Reply by mik3caDecember 7, 2018

The interest rates are also variable

[ - ]
Reply by mr_banditDecember 7, 2018

Write the algorithm in C, run it on a PC, and print out the results of each step. Pick some obvious sets of values and run the C code. That way you will see the algorithm work in a friendly environment.

As an example, I had a fairly complex state machine to control a power supply and battery backup for a medical device. It took me a month to design it and code it in C. The target was a PIC16C73, with limited dbug facilities. Even worst, the C compiler for the PIC was a CCS C compiler, which has some limitations due to the PIC architecture.

I coded it up and ran it on a PC. I was able to debug it fairly quickly (a week), and the port to the PIC was pretty painless. I had it working in 2 days on the target HW. (I had a SPI interface to a Coldfire, and was not sure of who needed to go first. I bit-banged a 300 baud UART, looked at the protocol dance, and had that working in an hour.)

(BTW, the CCS C compiler for the PICs (PIC32 IIRC) with a real stack (more advanced than the PIC16C73) is a POS. I had a choice of changing to a gcc compiler, and decided to stick with the CCS because the conversion would take 2 weeks. Instead, I LOST 2 weeks to the bugs in the CCS POS. It could not reliably do a 32-bit divide, so I coded it up as I describe in this thread.)