I'm going to be writing a short column on the Arduino's random() function. Remember that I'm a hardware designer by trade, so I'm fighting my way through the software. Also, after talking to a few embedded friends, my impression is that some of them use random numbers a lot, while others use them rarely, if al all -- there don;t seem to be many in the middle (do you agree?).
I'm also discovering all sorts of things about random numbers that are new to me, such as the fact that the C/C++ rand() function returns an integer value between 0 and RAND_MAX where the size of RAND_MAX can (ironically) randomly vary from one machine to another. By comparison, the rand() function in Excel generates a real number between 0 and 1 -- also this function is volatile, so it generates a different value every time you open that spreadsheet.
So... I was wondering if you had any interesting nuggets of knowledge or tidbits of trivia you would care to share on the topic of random numbers?
Oh, boy. "Random" numbers are an often misunderstood topic. Here are the best places I'd go to improve your understanding of pseudorandom number generators (PRNG):
- Numerical Recipes (chapter 7)
- Knuth's "The Art of Computer Programming" (vol 2 seminumerical algorithms)
I know there's at least one other key reference on PRNGs but I can't think of it off the top of my head.
The built-in rand() function should be avoided for any serious applications. If you read https://en.cppreference.com/w/cpp/numeric/random/r... it says
rand() is not recommended for serious random-number generation needs. It is recommended to use C++11's random number generation facilities to replace rand(). (since C++11)
The modern random number generators include:
- https://en.wikipedia.org/wiki/Xorshift and related generators -- Sebastian Vigna has a good summary of these and how they compare to other generators
One of George Marsaglia's papers I found really insightful but I can't remember which one, and unfortunately his website on FSU was removed after his death.
The other important point is that you really need to understand what you need as a consumer of "random" numbers. Are they supposed to be uniformly generated? Or from some other distribution like Gaussian? Do you care about statistical randomness flaws? Do you care about cryptographic randomness -- that is, no information can be gained about future generated numbers from past generated numbers? Do they need to be "true" random numbers, in which case a hardware source must be used to gather entropy?
This is one topic where half-measures can be dangerous.
"This is one topic where half-measures can be dangerous." I totally agree -- I'm just using them to control pixels wandering around my 12x12 ping pong ball array https://www.youtube.com/channel/UCQVqp_L4hKqF1uZ3t... In my column, I'll make sure to note that you have to really know what you are doing if you are using them for serious applications. And thanks for all the references.
Do you know of any funny stories involving problems with RNGs and/or the use thereof?
I recall an interesting story from NPR’s Planet Money about cracking slot machines which used a poor random number generator.
I'd forgotten about that -- thanks for reminding me -- Max
For what it's worth, the "Numerical Recipes" book includes a chapter on Random Number generation.
You beat me to it, that's the first place I'd go. More in another comment I'm writing as a direct reply.
For interest, I once used an algorithm for encryption based on a deck of playing cards. This method was used during WWII as soldiers often carried cards and would not likely be thought of as spies. The decks were seeded using the bridge hands often published in newspapers around the world. It was important that each participant maintain the same order in each deck to encrypt/decrypt. If caught, they could simply disperse the deck and it would be (nearly) impossible to reconstruct it. Each use of the deck entailed a reshuffle according to the algorithm which used the message itself as the new key. If memory serves (not a guarantee) this was called 'Solitaire' and can be Googled. In context, if you dealt a unique bridge hand (all 52 cards) each second since the beginning of time (ie. the big bang), you wouldn't yet be half way through the possible combinations. Seems like there might be room in there for some RNG.
This is the first I've heard of this -- very interesting -- thanks for sharing.
Interesting that you should mention solitaire. I am addicted to solitaire, or 'patience' as it is called in the English-speaking world. Standard game in linux is kpat.
I have noticed a strange thing in my years of playing. Once, I got as far as 8000 winning games in a row before the winchester crashed. I was on a roll and most of the time got the hands out in 80-85 moves, about three or four percent of the time less than 80 moves.
After replacing the disk, I continued in the same general number of moves to get it out.
Then I moved to a different CPU. It seems that then, the hands I was dealt were much harder, and I rarely got it out in less than 100 moves. I rare time, I would get it down below 90, maybe one or two percent. Only once in the last year have I got it out in less than 80.
It seems to me that the randomiser that is creating the deals is affected by the number of processors - originally I had only one, moved to two and now 8.
I haven't had time to try any experiments to prove this, and I also am a hardware man as well.
Btw, I only place when waiting for a compile, sysgen, or download. And of course when I absent-mindedly use windows for a change!
There's an Arduino riddle I posted once in my blog: if you produce lots and lots of (pseudo-)random numbers in the range 0 to 1,431,655,765, what percentage of them will fall between 0 and 715,827,883 (which is half the previous number)? ;-)
Also, back in 2015 I found a bug in the implementation of Arduino's randomSeed() function; if you send it the value 0, it does nothing at all. I reported it but it wasn't fixed - at least until I grew tired of checking.
Re your question -- my knee-jerk reaction would be 50% -- but I'm guessing you are going to explain the error of my ways :-)
Re seeding with a value of 0 doing nothing -- that suggests the Arduino is using a Linear Feedback Shift Register (LFSR) to generate its random numbers -- when would make sense because this is low-impact in terms of memory and clock cycles. Tell me more...
No, selection of the polynomial (tap points in a hardware implementation) as a primitive polynomial just lets you ensure that the LFSR is a maximal-length sequence (2^N - 1 covering all state patterns except zero).
Correlation between samples is a separate issue, and will suck no matter what polynomial is used. It's important for any PRNG to be a good source of independent and identically distributed (IID) random samples.
Indeed, the correct value is hopelessly not 50% - it's actually 2/3 of the produced numbers being below that midpoint. This is caused by careless use of the modulo operator. Arduino's random() first generates a random integer in the range 0 to RAND_MAX, then returns the remainder from dividing that by the maximum requested by the user.
Imagine you take all the numbers 0-6 (inclusive), and you take all the remainders from dividing them by 4. This'll give you 0,1,2,3,0,1,2 - you see the result "3" has half the chance of appearing as any other number! Same principle works in my riddle, just on a much larger scale.
Regarding randomSeed(), I don't remember if it's using LFSR or another method, but either way 0 is a problematic seed. In "regular" C++ they took it into account by switching implicitly to another seed, if I remember correctly. But with Arduino, they totally ignore the request, altogether; meaning that if you re-seed several times with 0, you won't get the same sequences! [Edit: checked now using a test program, this issue still isn't fixed]