On 1/16/2023 10:41 PM, antispam@math.uni.wroc.pl wrote:
> Don Y <blockedofcourse@foo.invalid> wrote:
>> On 1/12/2023 7:28 PM, antispam@math.uni.wroc.pl wrote:
>>>> Build a random number generator with a long, relatively prime period.
>>>> Fill memory with successive values from that RNG. (The primeness
>>>> ensures the pattern doesn't repeat at intervals that might be
>>>> related to the addressing decoder)
>>>>
>>>> Reseed the RNG and run it, again -- this time checking values
>>>> read from sequential locations against the computed random number.
>>>> The first fault indicates an unexpected decoder error (i.e.,
>>>> the address space "wrapping" at that value), a fault in the
>>>> memory (I use this technique as a quick memory test) *or*
>>>> the end of physical memory.
>>>
>>> Long period is easy:
>>
>> "Long" isn't the critical aspect. What you want is a period that is
>> "relatively prime" wrt the likely address decoding. So, a write of
>> "random value X" to location N can't appear at any LIKELY "bad decode"
>> or that address, elsehwere in the address space you are probing.
>
> I do not see "relatively prime" as really helpful. I mean:
> I would use only tiny part of period, so for purpose of memory
> testing it does not matter too much.
A period that fits neatly into powers of two will miss problems.
Loc Value
0 01
1 11
2 00
3 10
4 01
5 11
6 00
7 10
If the address bit that identifies 0-3 from 4-7 is broken,
you will never know.
By contrast:
Loc Value
0 01
1 11
2 00
3 01
4 11
5 00
6 01
7 11
The values that are returned by a "stuck address bit"
wouldn't agree with your *expectations* of those values.
Because the period "3" is relatively prime wrt the different
powers of two used in the addressing.
>>> I used a counter, actually 3 versions
>>> of counter. One version of counter used large increment which
>>> ensured that all bytes were changing quickly (with low increments
>>> there were long streches were high bytes were the same).
>>
>> A generic random number generator doesn't confine changes
>> to "local regions" of the value. E.g.,
>> 00000101
>> 00001010
>> 00010100
>> 00101001
>> 01010011
>> 10100110
>> The goal being that bits/bytes don't repeat "regularly"
>> and that every bit sees some activity, in any set of
>> consecutively generated values.
>
> Do you realize that popular PRNG-s are perfectly regular using
> relatively simple rules? Better play tricks so that is
> hard to detect this regularity. My first two counters
> had problem that low order bits were changing with small
> periods, third one counted modulo a prime. When viewed
> as numbers, once you knew the rule it was perfectly
> regular (and probably guessing the rule would be not
Counters are bad sources of "random numbers".
Note the 8 bit example immediately above. *You* can
predict the MSBs without knowing the generator polynomial.
(you can't predict the LSBs, though). But, the memory can't
"predict" any of this.
If the second time through, the RNG is seeded differently:
00110101
01101010
11010100
10101001
01010011
10100110
then each address "sees" a different pattern from the first
pass so if the first memory value was "stuck" at, e.g., "00000101",
this would appear on the second pass when you *expected* it
to be 00110101.
> that hard). At bit level carries and modulo meant
> that logic working on separate bits or small groups of
> bits would have trouble following/predicting the
> sequence.
>
>>> I also tried 3 lousy random number generators,
>>> but in this problem I do not think that much more testing is
>>> needed: AFAICS to pass my test with less memory chip would need
>>> quite complicated address remapping working at bit level.
>>
>> Quite possible. I was merely offering a tool that one can
>> fall back on in these sorts of problems. I use this to
>> size and grossly test memory at POST; some number of passes
>> of writes with LATER read-verifies to ensure data is
>> retained and there are no decode failures (e.g., caused
>> by passing the end of physical memory).
>>
>> I learned this trick from a friend from school, in the video
>> (arcade) game heyday -- you don't have a lot of resources
>> *or* time to do comprehensive tests. Yet, you have to have
>> a reasonable level of confidence that the game is working
>> properly (folks get mad if you accept their coins and
>> don't deliver the product/experience!)
>
> If I what I wrote sounded negative, let me say that I in
> general have doubts about efficiency of many tests. For
> example, I waited on PC-s doing POST. Yet, I saw failing
> POST maybe one or two times and IIRC errors were so gross
> that very simple and much faster test should catch them.
> OTOH I have seem machines which passed POST but failed more
> comprehensive memory test. And machines that in use exhibited
> what looked like memory errors but passed available tests
> (I "cured" few by underclocking them).
If you want to stress memory, then you play games with Vcc
and temperature -- as well as access *patterns* (e.g.,
checkerboard to maximize noise within the array, look for
read-disturb errors, etc.).
What you are looking to do, at POST, is to detect gross errors.
Or, *here*, to detect where the memory exists and doesn't
exist (to "size" it).
A deployed device doesn't have the luxury of being able to
comprehensively test itself or its components.
Do you know how many ECC errors are being thrown and corrected
in your PC? At what level would you lose confidence in the
memory's ability to retain data "correctly"?
[This ignoring the fact that modern processors use ECC internally
to correct *their* errors!]
> In software context I saw projects that religiously added
> tests to have "100% test coverage", yet those tests were
> too weak to catch major breakage. OTOH I also had
> situations were _new_ approach to testing allowed to
> identify and consequently fix several bugs.
If you want to test a MCU-related device, you are best
advised to purchase a library designed by the device's
manufacturer. *They* know the patterns that will reveal
the most information on the health of the device. These
are neither inexpensive nor universally available (as
there are so many different devices in production)
> One approach to testing which is reasonably effective
> is to first identify likely "failure modes" and invent
> tests that detects specific failure mode. In this case
> possible "failure mode" was some address scrambling
> scheme. In fact there is some kind of "scrambling",
> namely chip seem to use incomplete address decoding.
> Now, for scrambling at word level already simplest
> counter would detect it.
No, it wouldn't. With 8 bits in a tested unit, any
wrap that occurs modulo 256 will not see a fault in any
of the address lines that distinguish between addresses
0-255 vs 256-511 vs 512-767...
> That still leaves possiblity
> of scrambling at byte or bit level. In particular,
> only carries prevent periodicity with period 256 and
> limited number of un-aliased extra cells possibly
> could handle places where carries occur. Second test
> had much more carries, making this highly unlikely.
> In third counter no bit position was periodic and
> carries were in different places than second test.
> So scrambling scheme that passes all 3 counter
> test is getting more weird. Now, address decoding
> lies on performence critical path. MCU manufactur
> needs good reason to put extra logic there. Skipping
> gates and using incomplete decoding is cheap. I am
> not sure if there is good reason to add any scrambling
> beyond incomplete decoding. But certainly there is
> no reason to add several complicated scrambling circuits
> working differently on different byte (or bit) planes.
>
> Also, goal of my testing was to find out if extra memory
> is there. Once you accept that there is 32kB of memory
> natural question is why manufactures says that there is
> 20KB. One possible answer is that manufactures does not
> test extra memory. So one may further ask if extra
> memory works reliably. Answering this last question
> goes well beyond goal of my testing, it is much bigger
> project and to that matter even if it works reliably for
> me it does not prove that it will work reliably for
> other people.
Using what you know to be a counterfeit device is the
first mistake.
>>> I double that lousy random number generators will be of
>>> much use in future. But writing a good one is more work,
>>> and even linking to some has disadvantage of size: memory
>>> taken by test program was excluded from modification so
>>> I wanted test program to be as small as possible. My program
>>> was 680 bytes which together with varibles and stack comfortably
>>> fit in 1kB of RAM...
>>
>> The RNG isn't a big piece of code. And, can be parameterized
>> to fit your future needs.
>
> Mersenne twister which is present in several libraries has
> rather large state and long code. I am not sure how large
> exactly it is but my impression was that its state consists
> of hundreds of 64-bit integers...
>
> And concerning needs, FCM32 can efficiently do 64-bit arithmetic.
> Already Cortex-M0 will have suboptimal performance with 64-bit
> numbers. And on 8-bitters or processors like smallest MSP430
> which lack hardware multiplication arithmetic on bigger numbers
> can be quite expensive. Hence, PRNG which works fast on big
> machines could be quite slow on small machines. Up to now
> I did not reall need PRNG on small machines, so I postponed
> implementing one. If/when I have need I will know which
> compromises/tradeoffs are appropriate for given application.
You only need simple arithmetic/logic operators to implement
random number generators. When really interested in (provable)
"randomness", your entropy source is the bigger issue. And,
proving that there are no side-channel attacks that can bias
it.
I.e., how many times a steel ball hits a particular target isn't
very random -- nor is it beyond the CONTROL of the party who
is likely to benefit from influencing the RNG.
1 2 3 4 5 is as good a lottery number as any -- perhaps moreso
as there will be folks who *believe* a number like that CAN'T
win (so, if it does, there will be fewer folks you'll have
to share the jackpot with!)