Forums

What are the odds? (Random Numbers)

Started by MaxMaxfield 1 month ago20 replieslatest reply 1 month ago62 views

I'm going to be writing a short column on the Arduino's random() function. Remember that I'm a hardware designer by trade, so I'm fighting my way through the software. Also, after talking to a few embedded friends, my impression is that some of them use random numbers a lot, while others use them rarely, if al all -- there don;t seem to be many in the middle (do you agree?).

I'm also discovering all sorts of things about random numbers that are new to me, such as the fact that the C/C++ rand() function returns an integer value between 0 and RAND_MAX where the size of RAND_MAX can (ironically) randomly vary from one machine to another. By comparison, the rand() function in Excel generates a real number between 0 and 1 -- also this function is volatile, so it generates a different value every time you open that spreadsheet.

So... I was wondering if you had any interesting nuggets of knowledge or tidbits of trivia you would care to share on the topic of random numbers?

[ - ]
Reply by jms_nhFebruary 1, 2021

Oh, boy. "Random" numbers are an often misunderstood topic. Here are the best places I'd go to improve your understanding of pseudorandom number generators (PRNG):

- Numerical Recipes (chapter 7)

- Knuth's "The Art of Computer Programming" (vol 2 seminumerical algorithms)

I know there's at least one other key reference on PRNGs but I can't think of it off the top of my head.

The built-in rand() function should be avoided for any serious applications. If you read https://en.cppreference.com/w/cpp/numeric/random/r... it says

rand() is not recommended for serious random-number generation needs. It is recommended to use C++11's random number generation facilities to replace rand(). (since C++11) 

The modern random number generators include:

- https://en.wikipedia.org/wiki/Mersenne_Twister

https://en.wikipedia.org/wiki/Xorshift and related generators -- Sebastian Vigna has a good summary of these and how they compare to other generators

- PCG

One of George Marsaglia's papers I found really insightful but I can't remember which one, and unfortunately his website on FSU was removed after his death.


The other important point is that you really need to understand what you need as a consumer of "random" numbers. Are they supposed to be uniformly generated? Or from some other distribution like Gaussian? Do you care about statistical randomness flaws? Do you care about cryptographic randomness -- that is, no information can be gained about future generated numbers from past generated numbers? Do they need to be "true" random numbers, in which case a hardware source must be used to gather entropy?

This is one topic where half-measures can be dangerous.

[ - ]
Reply by MaxMaxfieldFebruary 1, 2021

"This is one topic where half-measures can be dangerous." I totally agree -- I'm just using them to control pixels wandering around my 12x12 ping pong ball array https://www.youtube.com/channel/UCQVqp_L4hKqF1uZ3t...  In my column, I'll make sure to note that you have to really know what you are doing if you are using them for serious applications. And thanks for all the references.


[ - ]
Reply by MaxMaxfieldFebruary 1, 2021

Do you know of any funny stories involving problems with RNGs and/or the use thereof?

[ - ]
Reply by waydanFebruary 2, 2021

I recall an interesting story from NPR’s Planet Money about cracking slot machines which used a poor random number generator.

https://www.npr.org/sections/money/2017/05/24/529865107/episode-773-slot-flaw-scofflaws

[ - ]
Reply by MaxMaxfieldFebruary 2, 2021

I'd forgotten about that -- thanks for reminding me -- Max

[ - ]
Reply by gregha04756February 1, 2021

For what it's worth, the "Numerical Recipes" book includes a chapter on Random Number generation.

[ - ]
Reply by jms_nhFebruary 1, 2021

You beat me to it, that's the first place I'd go. More in another comment I'm writing as a direct reply.

[ - ]
Reply by MaxMaxfieldFebruary 1, 2021

Thanks Greg

[ - ]
Reply by DKWatsonFebruary 1, 2021

For interest, I once used an algorithm for encryption based on a deck of playing cards. This method was used during WWII as soldiers often carried cards and would not likely be thought of as spies. The decks were seeded using the bridge hands often published in newspapers around the world. It was important that each participant maintain the same order in each deck to encrypt/decrypt. If caught, they could simply disperse the deck and it would be (nearly) impossible to reconstruct it. Each use of the deck entailed a reshuffle according to the algorithm which used the message itself as the new key. If memory serves (not a guarantee) this was called 'Solitaire' and can be Googled. In context, if you dealt a unique bridge hand (all 52 cards) each second since the beginning of time (ie. the big bang), you wouldn't yet be half way through the possible combinations. Seems like there might be room in there for some RNG.

[ - ]
Reply by MaxMaxfieldFebruary 1, 2021

This is the first I've heard of this -- very interesting -- thanks for sharing.

[ - ]
Reply by ve3idFebruary 1, 2021

Interesting that you should mention solitaire.  I am addicted to solitaire, or 'patience' as it is called in the English-speaking world. Standard game in linux is kpat.

I have noticed a strange thing in my years of playing.  Once, I got as far as 8000 winning games in a row before the winchester crashed.  I was on a roll and most of the time got the hands out in 80-85 moves, about three or four percent of the time less than 80 moves.

After replacing the disk, I continued in the same general number of moves to get it out.

Then I moved to a different CPU.  It seems that then, the hands I was dealt were much harder, and I rarely got it out in less than 100 moves.  I rare time, I would get it down below 90, maybe one or two percent.  Only once in the last year have I got it out in less than 80.

It seems to me that the randomiser that is creating the deals is affected by the number of processors - originally I had only one, moved to two and now 8.

I haven't had time to try any experiments to prove this, and I also am a hardware man as well.

Btw, I only place when waiting for a compile, sysgen, or download.  And of course when I absent-mindedly use windows for a change!

[ - ]
Reply by MatthewEshlemanFebruary 1, 2021

I'll just do the silly response today:

https://xkcd.com/221/


https://dilbert.com/strip/2001-10-25

For serious info:

https://www.random.org/

:-)

Best regards,

Matthew

https://covemountainsoftware.com/



[ - ]
Reply by MaxMaxfieldFebruary 1, 2021
I knew the Dilbert one -- I should have guessed xkcd would have something to say -- thanks for the link to random.org
[ - ]
Reply by igendelFebruary 1, 2021

There's an Arduino riddle I posted once in my blog: if you produce lots and lots of (pseudo-)random numbers in the range 0 to 1,431,655,765, what percentage of them will fall between 0 and 715,827,883 (which is half the previous number)? ;-)


Also, back in 2015 I found a bug in the implementation of Arduino's randomSeed() function; if you send it the value 0, it does nothing at all. I reported it but it wasn't fixed - at least until I grew tired of checking.

[ - ]
Reply by MaxMaxfieldFebruary 1, 2021

Re your question -- my knee-jerk reaction would be 50% -- but I'm guessing you are going to explain the error of my ways :-)

Re seeding with a value of 0 doing nothing -- that suggests the Arduino is using a Linear Feedback Shift Register (LFSR) to generate its random numbers -- when would make sense because this is low-impact in terms of memory and clock cycles. Tell me more...

[ - ]
Reply by jms_nhFebruary 1, 2021
ouch, LFSRs suck as a general PRNG source; they are very good for individual random bits. As a PRNG there is too much correlation between successive values.
[ - ]
Reply by MaxMaxfieldFebruary 1, 2021
I thought that selecting tap points so as to ensure maximal displacement worked reasonably well
[ - ]
Reply by jms_nhFebruary 1, 2021

No, selection of the polynomial (tap points in a hardware implementation) as a primitive polynomial just lets you ensure that the LFSR is a maximal-length sequence (2^N - 1 covering all state patterns except zero).

Correlation between samples is a separate issue, and will suck no matter what polynomial is used. It's important for any PRNG to be a good source of independent and identically distributed (IID) random samples.

See https://www.embeddedrelated.com/showarticle/1121.php

[ - ]
Reply by igendelFebruary 1, 2021

Indeed, the correct value is hopelessly not 50% - it's actually 2/3 of the produced numbers being below that midpoint. This is caused by careless use of the modulo operator. Arduino's random() first generates a random integer in the range 0 to RAND_MAX, then returns the remainder from dividing that by the maximum requested by the user. 

Imagine you take all the numbers 0-6 (inclusive), and you take all the remainders from dividing them by 4. This'll give you 0,1,2,3,0,1,2 - you see the result "3" has half the chance of appearing as any other number! Same principle works in my riddle, just on a much larger scale.

Regarding randomSeed(), I don't remember if it's using LFSR or another method, but either way 0 is a problematic seed. In "regular" C++ they took it into account by switching implicitly to another seed, if I remember correctly. But with Arduino, they totally ignore the request, altogether; meaning that if you re-seed several times with 0, you won't get the same sequences! [Edit: checked now using a test program, this issue still isn't fixed]

[ - ]
Reply by MaxMaxfieldFebruary 1, 2021
I just confirmed this myself -- Max