timestamp in ms and 64-bit counter| page 4

Reply by David Brown ●February 8, 20202020-02-08

On 08/02/2020 18:41, Rick C wrote:
> On Saturday, February 8, 2020 at 10:24:33 AM UTC-5, David Brown
> wrote:
>> On 07/02/2020 23:03, Jim Jackson wrote:
>>>>> A 32 bit counter incremented at 1 kHz will roll over every 3
>>>>> years or so.
>>>> 
>>>> 2 ^ 32 / (24 * 60 * 60 * 1000) = 49.71026962... (days)
>>> 
>>> Ah yes. Early versions of Windows NT. Crashed if they had an
>>> uptime of 49 and a bit days - I wonder why :-)
>>> 
>>> Mind you getting early NT to stay up that long without crashing
>>> or needing a reboot was bloody difficult.
>>> 
>> 
>> I believe it was Windows 95 that had this problem - and it was not 
>> discovered until about 2005, because no one had kept Windows 95
>> running for 49 days.
> 
> 49 days?  You mean 49 minutes?

Oh, Win95 OSR 2 was not /too/ bad.  It could keep going long enough to
play a game or too.  (For actual work, I moved from Win3.1 to OS/2,
until NT 4.0 came out.)

> 
> 
>> Maybe early NT had a similar fault, of course.  But people /did/
>> have NT running for long uptimes from very early on, so such a bug
>> would have been found fairly quickly.
> 
> Never used NT, but I used W2k and it was great!  W2k was widely
> pirated so MS started a phone home type of licensing with XP which
> was initially not well received, but over time became accepted.  Now
> people reminisce about the halcyon days of XP.

Did you not use NT 4.0 ?  It was quite solid.  W2K was also good, but XP 
took a few service packs before it became reliable enough for serious use.

> 
> Networking under W2k required a lot of manual setting up.  But it was
> not hard to do.  A web site, World of Windows Networking made it easy
> until it was bought and ruined with advertising and low quality
> content.

I don't remember any significant issues with networking with W2K.  I see 
a lot more now with Win10, and even Win7 when people have forgotten to 
turn off the automatic updates.

> 
> Now I have trouble just getting to Win10 computers to share a file
> directory.
> 

Indeed.  And a recent update just stopped local names on the network 
working properly with DNS - Windows 10 has just decided that you need to 
use the full name (with the ".network.net" or whatever added), or you 
have to manually force it to use that suffix automatically.  The DNS and 
DHCP setup we have has been working happily for many years - I don't 
know what MS have done to screw it up in the latest Win 10 and Win 7 
updates.

Reply by Kent Dickey ●February 8, 20202020-02-08

In article <5jC%F.78169$8Y7.67931@fx05.iad>,
Richard Damon  <Richard@Damon-Family.org> wrote:
>On 2/8/20 12:03 PM, Kent Dickey wrote:
>> Shown more explicitly, the following are all valid states (let's assume
>> ticks_high is 0, read_low32() just ticked to 0xffff_fffe):
>> 
>> Time            read_low32()            ticks_high
>> -------------------------------------------------
>> 0               0xffff_fffe             0
>> 1ms             0xffff_ffff             0
>> 1.99999ms       0xffff_ffff             0
>> 2ms             0x0000_0000             0
>> Interrupt is sent and is now pending
>> 2ms+delta       0x0000_0000             1
>> 
>> The issue is: what is "delta", and can other code (including your GetTick()
>> function) run between "2ms" and "2ms+delta"?  And the answer is almost
>> assuredly "yes".  This is a problem.
>
>But, as long as the timing is such that we can not do BOTH the 
>read_low32() and the read of ticks_high in that delta, we can't get the 
>wrong number.
>
>This is somewhat a function of the processor, and how much the 
>instruction pipeline 'skids' when an interrupt occurs. The processor 
>that he mentioned, A STM32L4R9, which uses an M4 processor, doesn't have 
>this much of a skid, so that can't be a problem unless you do something 
>foolish like disable the interrupts while doing the sequence.

The interrupt skid matters for how large the window is, but the problem
happens even if the "skid" was 0.

Look at it this way: the hardware counter logic is something like:

	always @(posedge clk) begin
		if(do_inc) begin
			cntr += 1;
			if(cntr == 0) begin
				interrupt = 1;
			end
		end
	end

Then at cycle 0 cntr=ffff_ffff and do_inc=0.  At cycle 1, do_inc=1 and cntr=0
and interrupt=1.

In that cycle, software could read cntr=0.  The interrupt CANNOT have taken
place yet since interrupts aren't instaneous--the signal hasn't even made it
to the interrupt controller yet, it's just this clock module has decided to
request an interrupt.  (The ARM GIC support asynchronous interrupts, so it
takes several clocks just for it to register the interrupt).

This is always somewhat a function of the processor, but the problem is
inherent to all CPUs.  A simple 6502 or 8086 or whatever has the same problem
and cannot fix it easily either.

The hardware cannot get this case right without some extreme craziness.  That
would be a pre-interrupt detection circuit, prepared to drive the interrupt
early so the CPU reacts in time.

The right way to look at it--hardware interrupts are delayed tens or
hundreds of cycles always from when you think they happen to when you receive
it.  Then you'll get your algorithms right.

Kent

Reply by Ed Prochak ●February 8, 20202020-02-08

On Thursday, February 6, 2020 at 1:02:49 PM UTC-5, Rick C wrote:
> On Thursday, February 6, 2020 at 7:43:35 AM UTC-5, pozz wrote:
> > I need a timestamp in millisecond in linux epoch. It is a number that 
> > doesn't fit in a 32-bits number.
> > 
> > I'm using a 32-bit MCU (STM32L4R9...) so I don't have a 64-bits hw 
> > counter. I need to create a mixed sw/hw 64-bits counter. It's very 
> > simple, I configure a 32-bits hw timer to run at 1kHz and increment an 
> > uint32_t variable in timer overflow ISR.
> > 
> > Now I need to implement a GetTick() function that returns a uint64_t. I 
> > know it could be difficult, because of race conditions. One solutions is 
> > to disable interrupts, but I remember another solution.
> > 
> > extern volatile uint32_t ticks_high;
> > 
> > uint64_t
> > GetTick(void)
> > {
> >    uint32_t h1 = ticks_high;
> >    uint32_t l1 = hwcnt_get();
> >    uint32_t h2 = ticks_high;
> > 
> >    if (h1 == h2) return ((uint64_t)h1 << 32) | l1;
> >    else          return ((uint64_t)h1 << 32);
> > }
> > 
> > Is it correct in single-tasking? In the else branch, I decided to set 
> > the low part to zero. I think it's acceptable, because if h1!=h2, hw 
> > counter has just wrapped-around, so it is 0... maybe 1.
> > 
> > What about preemptive multi-tasking? What happens if GetTick() is 
> > preempted by another higher-priority task that calls GetTick()?
> > 
> > I think it's better to fix the else branch, because the higher-priority 
> > task could take more than a few milliseconds, so the previous assumption 
> > that hw counter is 0 (maybe 1) can be incorrect. The fix could be:
> > 
> > uint64_t
> > GetTick(void)
> > {
> >    uint32_t h1 = ticks_high;
> >    uint32_t l1 = hwcnt_get();
> >    uint32_t h2 = ticks_high;
> > 
> >    if (h1 == h2) return ((uint64_t)h1 << 32) | l1;
> >    else          return ((uint64_t)h1 << 32) | hwcnt_get();
> > }
> 
> All this seems to be more complex than it needs to be.  You guys are
> focused on limitations when things happen fast.  Do you know how
> slow things can happen? 
> 
> A 32 bit counter incremented at 1 kHz will roll over every 3 years
> or so.  If you can assure that the GetTick is called once every
> 3 years simpler code can be used.
> 
> Have a 64 bit counter value which is the 32 bit counter incremented
> by the 1kHz interrupt and another 32 bit counter (the high part of
> the 64 bits) which is incremented when needed in the GetTick code. 
> 
> uint64_t
> GetTick(void)
> {
>    static uint32_t ticks_high;
>    uint32_t ticks_hw= hwcnt_get();
>    static uint32_t ticks_last;
> 
>    if (ticks_last > ticks_hw)  ticks_high++;
>    ticks_last = ticks_hw;
>    return ((uint64_t)ticks_high << 32) | ticks_hw;
> }
> 
> I'm not so conversant in C and I'm not familiar with the
> conventions of using time variables.  Clearly the time will
> need to be initialized by some means and ticks_high would
> need to be initialized to correspond to the current time/date,
> unless this is a run time variable only tracking time since boot up.  
> 
> Is the "call this code at least once in 3 years" requirement reasonable?
>  In the systems I design that would not be a problem. 
> 
> -- 
> 
>   Rick C.
> 
>   - Get 1,000 miles of free Supercharging
>   - Tesla referral code - https://ts.la/richard11209

Good points, Rick, but this conversation has me wonder:

 Why use a design that is handling the high 32bits in the
 application layer and the low 32bits separately in at the ISR?

Apparently you are using this only for interval timing?
If you are looking to maintain calendar time, then you will
need to store the high 32bits as Rick mentioned.

restoring the high 32bits from nonvolatile storage is a boot up
issue, and storing the value may require work outside the ISR.
But is required only once in 3 years as Rick pointed out.
But you would have some drift anyway since you have no way to
measure the time the system is down.

Ed

Reply by Rick C ●February 8, 20202020-02-08

On Saturday, February 8, 2020 at 2:33:47 PM UTC-5, Kent Dickey wrote:
> In article <5jC%F.78169$8Y7.67931@fx05.iad>,
> Richard Damon  <Richard@Damon-Family.org> wrote:
> >On 2/8/20 12:03 PM, Kent Dickey wrote:
> >> Shown more explicitly, the following are all valid states (let's assume
> >> ticks_high is 0, read_low32() just ticked to 0xffff_fffe):
> >> 
> >> Time            read_low32()            ticks_high
> >> -------------------------------------------------
> >> 0               0xffff_fffe             0
> >> 1ms             0xffff_ffff             0
> >> 1.99999ms       0xffff_ffff             0
> >> 2ms             0x0000_0000             0
> >> Interrupt is sent and is now pending
> >> 2ms+delta       0x0000_0000             1
> >> 
> >> The issue is: what is "delta", and can other code (including your GetTick()
> >> function) run between "2ms" and "2ms+delta"?  And the answer is almost
> >> assuredly "yes".  This is a problem.
> >
> >But, as long as the timing is such that we can not do BOTH the 
> >read_low32() and the read of ticks_high in that delta, we can't get the 
> >wrong number.
> >
> >This is somewhat a function of the processor, and how much the 
> >instruction pipeline 'skids' when an interrupt occurs. The processor 
> >that he mentioned, A STM32L4R9, which uses an M4 processor, doesn't have 
> >this much of a skid, so that can't be a problem unless you do something 
> >foolish like disable the interrupts while doing the sequence.
> 
> The interrupt skid matters for how large the window is, but the problem
> happens even if the "skid" was 0.
> 
> Look at it this way: the hardware counter logic is something like:
> 
> 	always @(posedge clk) begin
> 		if(do_inc) begin
> 			cntr += 1;
> 			if(cntr == 0) begin
> 				interrupt = 1;
> 			end
> 		end
> 	end
> 
> Then at cycle 0 cntr=ffff_ffff and do_inc=0.  At cycle 1, do_inc=1 and cntr=0
> and interrupt=1.
> 
> In that cycle, software could read cntr=0.  The interrupt CANNOT have taken
> place yet since interrupts aren't instaneous--the signal hasn't even made it
> to the interrupt controller yet, it's just this clock module has decided to
> request an interrupt.  (The ARM GIC support asynchronous interrupts, so it
> takes several clocks just for it to register the interrupt).
> 
> This is always somewhat a function of the processor, but the problem is
> inherent to all CPUs.  A simple 6502 or 8086 or whatever has the same problem
> and cannot fix it easily either.
> 
> The hardware cannot get this case right without some extreme craziness.  That
> would be a pre-interrupt detection circuit, prepared to drive the interrupt
> early so the CPU reacts in time.
> 
> The right way to look at it--hardware interrupts are delayed tens or
> hundreds of cycles always from when you think they happen to when you receive
> it.  Then you'll get your algorithms right.
> 
> Kent

That's why stuff that is hard on a CPU is so easy in an FPGA.  Even if I have a soft core in my FPGA design I know the clock cycle timing of the CPU and it doesn't have the many cycles of delay to processing an interrupt.  On my design the next clock after an interrupt is asserted the CPU is fetching the first instruction of the interrupt handler as well as saving the return address and status register.  

Commercial CPUs provide a bunch of hard to use features like interrupts with priority, etc. because programmers think of the CPU cycles as being a precious commodity and so want to make a single CPU do many things.  The CPU is often much smaller than the memory on the die and all of it is inordinately inexpensive really.  

On an FPGA it is easier to just provide the 64 bit counter in the first place.  lol  But if you want, it is easy to make the counter interrupt the CPU before the counter rolls over and the software can increment the upper half at the correct time as well as make the full 64 bits an atomic read. 

This thread is a perfect example of why I prefer FPGAs for most applications. 

-- 

  Rick C.

  ++ Get 1,000 miles of free Supercharging
  ++ Tesla referral code - https://ts.la/richard11209

Reply by Robert Wessel ●February 8, 20202020-02-08

On Sat, 08 Feb 2020 18:37:35 +0200, upsidedown@downunder.com wrote:

>On Sat, 8 Feb 2020 16:24:30 +0100, David Brown
><david.brown@hesbynett.no> wrote:
>
>>On 07/02/2020 23:03, Jim Jackson wrote:
>>>>> A 32 bit counter incremented at 1 kHz will roll over every 3 years or so.
>>>>
>>>> 2 ^ 32 / (24 * 60 * 60 * 1000) = 49.71026962... (days)
>>> 
>>> Ah yes. Early versions of Windows NT. Crashed if they had an uptime
>>> of 49 and a bit days - I wonder why :-)
>>> 
>>> Mind you getting early NT to stay up that long without crashing or needing
>>> a reboot was bloody difficult.
>>> 
>>
>>I believe it was Windows 95 that had this problem - and it was not 
>>discovered until about 2005, because no one had kept Windows 95 running 
>>for 49 days.
>
>That is a more believable explanation.
>
>
>>Maybe early NT had a similar fault, of course.  But people /did/ have NT 
>>running for long uptimes from very early on, so such a bug would have 
>>been found fairly quickly.
>
>Both VAX/VMS as well as Windows NT use 100 ns as the basic unit for
>time of the day timing. 
>
>On Windows NT on single processor the interrupt rate was 100 Hz, on
>muliprocessor 64 Hz.
>
>Some earlier Windows versions used 55 Hz (or was it 55 ms) clock
>interrupt rate, so I really don't understand from where the 1 ms clock
>tick or 49 days is from.


The "tick count" in the (Win32) OS was always 1000Hz (as reported by
GetTickCount(), for example).  The physical ticks were massaged to
correctly update that count.

Reply by Robert Wessel ●February 8, 20202020-02-08

On Sat, 08 Feb 2020 11:03:23 -0600, kegs@provalid.com (Kent Dickey)
wrote:

>In article <r1h1lj$i5f$1@dont-email.me>, pozz  <pozzugno@gmail.com> wrote:
>>I need a timestamp in millisecond in linux epoch. It is a number that 
>>doesn't fit in a 32-bits number.
>>
>>I'm using a 32-bit MCU (STM32L4R9...) so I don't have a 64-bits hw 
>>counter. I need to create a mixed sw/hw 64-bits counter. It's very 
>>simple, I configure a 32-bits hw timer to run at 1kHz and increment an 
>>uint32_t variable in timer overflow ISR.
>>
>>Now I need to implement a GetTick() function that returns a uint64_t. I 
>>know it could be difficult, because of race conditions. One solutions is 
>>to disable interrupts, but I remember another solution.
>
>This is actually a very tricky problem.  I believe it is not possible to
>solve it with the constraints you have laid out above.  David Brown's solution
>in his GetTick() function is correct, but it doesn't discuss why.
>
>If you have a valid 64-bit counter which you can only reference 32-bits at
>a time (which I'll make functions, read_high32() and read_low32(), but these
>can be hardware registers, volatile globals, or real functions), then an
>algorithm to read it reliably is basically your original algorithm:
>
>uint64_t
>GetTick()
>{
>        old_high32 = read_high32();
>        while(1) {
>                low32 = read_low32();
>                new_high32 = read_high32();
>                if(new_high32 == old_high32) {
>                        return ((uint64_t)new_high32 << 32) | low32;
>                }
>                old_high32 = new_high32;
>        }
>}
>
>This code does not need to mask interrupts, and it works on multiple CPUs.
>This works even if interrupts occur at any point for any duration, even
>if the code is interrupted for more than 49 days.
>
>However, you don't have a valid 64-bit counter you can only read 32-bits at a
>time.  You have a free-running hardware counter which read_low32() returns.
>It counts up every 1ms, and eventually wraps from 0xffff_ffff to 0x0000_0000
>and causes an interrupt (which lots of people have helpfully calculated at
>about 49 days).  Let's assume that interrupt calls this handler:
>
>volatile uint32_t ticks_high = 0;
>void
>timer_wrap_interrupt()
>{
>        ticks_high++;
>}
>
>where by convention only this code will write to ticks_high (this is a very
>important limitation).  And so my function read_high32() is simply:
>{ return ticks_high; }.
>
>Unfortunately, with this design, I believe it is not possible to implement
>a GetTick() function which does not sometimes fail to return a correct time.
>There is a fundamental race between the interrupt and the timer value rolling
>to 0 which software cannot account for.
>
>The problem is it's possible for software to read the HW counter and see it
>has rolled over from 0xffff_ffff to 0 BEFORE the interrupt occurs which
>increments ticks_high.  This is an inherent race: the timer wraps to 0, and
>signals an interrupt.  It's possible, even if for only a few cycles, to
>read the register and see the zero before the interrupt is taken.
>
>Shown more explicitly, the following are all valid states (let's assume
>ticks_high is 0, read_low32() just ticked to 0xffff_fffe):
>
>Time            read_low32()            ticks_high
>-------------------------------------------------
>0               0xffff_fffe             0
>1ms             0xffff_ffff             0
>1.99999ms       0xffff_ffff             0
>2ms             0x0000_0000             0
>Interrupt is sent and is now pending
>2ms+delta       0x0000_0000             1
>
>The issue is: what is "delta", and can other code (including your GetTick()
>function) run between "2ms" and "2ms+delta"?  And the answer is almost
>assuredly "yes".  This is a problem.
>
>The GetTick() routine above can read g_high32==0, read_low32()==0, and then
>g_high32==0 again at around time 2ms+small_amount, and return 0, even though
>a cycle or two ago, read_low32() returned 0xffff_ffff.  So time appears to
>jump backwards 49 days when this happens.
>
>There are a variety of solutions to this problem, but they all involve
>extra work and ignoring the 32-bit rollover interrupt.  So, remove
>timer_wrap_interrupt(), and then do:
>
>1) Have a single GetTick() routine, which is single-tasking (by
>disabling interrupts, or a mutex if there are multiple processors).
>This requires something to call GetTick() at least once every 49 days
>(worst case).  This is basically the Rich C./David Brown solution, but
>they don't mention that you need to remove the interrupt on 32-bit overflow.
>
>2) Use a higher interrupt rate.  For instance, if we can take the interrupt
>when read_low32() has carry from bit 28 to bit 29, then we can piece together
>code which can work as long as GetTick() isn't delayed by more than 3-4 days.
>This require GetTick() to change using code given under #4 below.
>
>3) Forget the hardware counter: just take an interrupt every 1ms, and
>increment a global variable uint64_t ticks64 on each interrupt, and then
>GetTick just returns ticks64.  This only works if the CPU hardware supports
>atomic 64-bit accesses.  It's not generally possible to write C code for a
>32-bit processor which can guarantee 64-bit atomic ops, so it's best to have
>the interrupt handler deal with two 32-bit variables ticks_low and
>ticks_high, and then you still need the GetTicks() to have a while loop to
>read the two variables.
>
>4) Use a regular existing interrupt which occurs at any rate, as long as it's
>well over 1ms, and well under 49 days.  Let's assume you have a 1-second
>interrupt.  This can be asynchronous to the 1ms timer.  In that interrupt
>handler, you sample the 32-bit hardware counter, and if you notice it
>wrapping (previous read value > new value), increment ticks_high.
>You need to update the global volatile variable ticks_low as well as the
>current hw count.  And this interrupt handler needs to be the only code
>changing ticks_low and ticks_high.  Then, GetTick() does the following:
>
>        uint32_t local_ticks_low, local_ticks_high;
>        [ while loop to read valid ticks_low and ticks_high value into the
>                local_* variables ]
>        uint64_t ticks64 = ((uint64_t)local_ticks_high << 32) | local_ticks_low;
>        ticks64 += (int32_t)(read_low32() - local_ticks_low);
>        return ticks64;
>
>Basically, we return the ticks64 from the last regular interrupt, which could
>be 1 second ago, and we add in the small delta from reading the hw counter.
>Again, this requires the 1-second interrupt to be guaranteed to happen before
>we get close to 49 days since the last 1-second interrupt (if it's really
>a 1-second interrupt, it easily meets that criteria.  If you try to pick
>something irregular, like a keypress interrupt, then that won't work).  It
>does not depend on the exact rate of the interrupt at all.
>
>I wrote it above with extra safety--It subtracts two 32-bit unsigned variables,
>gets a 32-bit unsigned result, treats that as a 32-bit signed result, and adds
>that to the 64-bit unsigned ticks count.  It's not strictly necessary to do
>the 32-bit signed result cast: it just makes the code more robust in case
>the HW timer moves backwards slightly.  Imagine some code tries to adjust the
>current timer value by setting it backwards slightly (say, some code trying
>to calibrate the timer with the RTC or something).  Without the cast to
>32-bit signed int, this slight backwards move would result in ticks64
>jumping ahead 49 days, which would be bad.  In C, this is pretty easy, but it
>should be carefully commented so no one removes any important casts.


If you have an atomic 32-bit read, and a 32-bit compare-and-swap, it's
not too hard.  I had to deal with this in Windows back before the OS
grew a GetTickCount64().

The basic idea is to store a 32-bit extension word, with* 8 bits that
you match to the top of system tick, plus 24 bit of extension (IOW,
you'll get a 56-bit tick out of the deal).

Basically, you get the system tick, then read the extension word, and
compare the high bytes.  If they're the same (IOW, the extension word
was set in the same epoch as the system tick is currently in), just
concatenate the low 24 bits of the extension word to the left of the
system tick, and return that.

If the high bytes are different, determine if there's been a rollover
(high byte of system tick lower than high byte of extension word).
Then update the extension word value appropriately (unconditionally
copy the system tick high byte to the extension word high byte, and
increment the low 24 bits if there was a rollover).  CAS that updated
value back to extension word in memory (ignore failures).  Loop to the
start of the process to re-get the extended tick.

The one thing this requires is that this routine get polled at least
once every ~49 days.


*You can fiddle the exact parameters here a bit, this is just what I
did.

Reply by Dimiter_Popoff ●February 8, 20202020-02-08

On 2/8/2020 22:46, Robert Wessel wrote:
> On Sat, 08 Feb 2020 11:03:23 -0600, kegs@provalid.com (Kent Dickey)
> wrote:
> 
>> In article <r1h1lj$i5f$1@dont-email.me>, pozz  <pozzugno@gmail.com> wrote:
>>> I need a timestamp in millisecond in linux epoch. It is a number that
>>> doesn't fit in a 32-bits number.
>>>
>>> I'm using a 32-bit MCU (STM32L4R9...) so I don't have a 64-bits hw
>>> counter. I need to create a mixed sw/hw 64-bits counter. It's very
>>> simple, I configure a 32-bits hw timer to run at 1kHz and increment an
>>> uint32_t variable in timer overflow ISR.
>>>
>>> Now I need to implement a GetTick() function that returns a uint64_t. I
>>> know it could be difficult, because of race conditions. One solutions is
>>> to disable interrupts, but I remember another solution.
>>
>> This is actually a very tricky problem.  I believe it is not possible to
>> solve it with the constraints you have laid out above.  David Brown's solution
>> in his GetTick() function is correct, but it doesn't discuss why.
>>
>> If you have a valid 64-bit counter which you can only reference 32-bits at
>> a time (which I'll make functions, read_high32() and read_low32(), but these
>> can be hardware registers, volatile globals, or real functions), then an
>> algorithm to read it reliably is basically your original algorithm:
>>
>> uint64_t
>> GetTick()
>> {
>>         old_high32 = read_high32();
>>         while(1) {
>>                 low32 = read_low32();
>>                 new_high32 = read_high32();
>>                 if(new_high32 == old_high32) {
>>                         return ((uint64_t)new_high32 << 32) | low32;
>>                 }
>>                 old_high32 = new_high32;
>>         }
>> }
>>
>> This code does not need to mask interrupts, and it works on multiple CPUs.
>> This works even if interrupts occur at any point for any duration, even
>> if the code is interrupted for more than 49 days.
>>
>> However, you don't have a valid 64-bit counter you can only read 32-bits at a
>> time.  You have a free-running hardware counter which read_low32() returns.
>> It counts up every 1ms, and eventually wraps from 0xffff_ffff to 0x0000_0000
>> and causes an interrupt (which lots of people have helpfully calculated at
>> about 49 days).  Let's assume that interrupt calls this handler:
>>
>> volatile uint32_t ticks_high = 0;
>> void
>> timer_wrap_interrupt()
>> {
>>         ticks_high++;
>> }
>>
>> where by convention only this code will write to ticks_high (this is a very
>> important limitation).  And so my function read_high32() is simply:
>> { return ticks_high; }.
>>
>> Unfortunately, with this design, I believe it is not possible to implement
>> a GetTick() function which does not sometimes fail to return a correct time.
>> There is a fundamental race between the interrupt and the timer value rolling
>> to 0 which software cannot account for.
>>
>> The problem is it's possible for software to read the HW counter and see it
>> has rolled over from 0xffff_ffff to 0 BEFORE the interrupt occurs which
>> increments ticks_high.  This is an inherent race: the timer wraps to 0, and
>> signals an interrupt.  It's possible, even if for only a few cycles, to
>> read the register and see the zero before the interrupt is taken.
>>
>> Shown more explicitly, the following are all valid states (let's assume
>> ticks_high is 0, read_low32() just ticked to 0xffff_fffe):
>>
>> Time            read_low32()            ticks_high
>> -------------------------------------------------
>> 0               0xffff_fffe             0
>> 1ms             0xffff_ffff             0
>> 1.99999ms       0xffff_ffff             0
>> 2ms             0x0000_0000             0
>> Interrupt is sent and is now pending
>> 2ms+delta       0x0000_0000             1
>>
>> The issue is: what is "delta", and can other code (including your GetTick()
>> function) run between "2ms" and "2ms+delta"?  And the answer is almost
>> assuredly "yes".  This is a problem.
>>
>> The GetTick() routine above can read g_high32==0, read_low32()==0, and then
>> g_high32==0 again at around time 2ms+small_amount, and return 0, even though
>> a cycle or two ago, read_low32() returned 0xffff_ffff.  So time appears to
>> jump backwards 49 days when this happens.
>>
>> There are a variety of solutions to this problem, but they all involve
>> extra work and ignoring the 32-bit rollover interrupt.  So, remove
>> timer_wrap_interrupt(), and then do:
>>
>> 1) Have a single GetTick() routine, which is single-tasking (by
>> disabling interrupts, or a mutex if there are multiple processors).
>> This requires something to call GetTick() at least once every 49 days
>> (worst case).  This is basically the Rich C./David Brown solution, but
>> they don't mention that you need to remove the interrupt on 32-bit overflow.
>>
>> 2) Use a higher interrupt rate.  For instance, if we can take the interrupt
>> when read_low32() has carry from bit 28 to bit 29, then we can piece together
>> code which can work as long as GetTick() isn't delayed by more than 3-4 days.
>> This require GetTick() to change using code given under #4 below.
>>
>> 3) Forget the hardware counter: just take an interrupt every 1ms, and
>> increment a global variable uint64_t ticks64 on each interrupt, and then
>> GetTick just returns ticks64.  This only works if the CPU hardware supports
>> atomic 64-bit accesses.  It's not generally possible to write C code for a
>> 32-bit processor which can guarantee 64-bit atomic ops, so it's best to have
>> the interrupt handler deal with two 32-bit variables ticks_low and
>> ticks_high, and then you still need the GetTicks() to have a while loop to
>> read the two variables.
>>
>> 4) Use a regular existing interrupt which occurs at any rate, as long as it's
>> well over 1ms, and well under 49 days.  Let's assume you have a 1-second
>> interrupt.  This can be asynchronous to the 1ms timer.  In that interrupt
>> handler, you sample the 32-bit hardware counter, and if you notice it
>> wrapping (previous read value > new value), increment ticks_high.
>> You need to update the global volatile variable ticks_low as well as the
>> current hw count.  And this interrupt handler needs to be the only code
>> changing ticks_low and ticks_high.  Then, GetTick() does the following:
>>
>>         uint32_t local_ticks_low, local_ticks_high;
>>         [ while loop to read valid ticks_low and ticks_high value into the
>>                 local_* variables ]
>>         uint64_t ticks64 = ((uint64_t)local_ticks_high << 32) | local_ticks_low;
>>         ticks64 += (int32_t)(read_low32() - local_ticks_low);
>>         return ticks64;
>>
>> Basically, we return the ticks64 from the last regular interrupt, which could
>> be 1 second ago, and we add in the small delta from reading the hw counter.
>> Again, this requires the 1-second interrupt to be guaranteed to happen before
>> we get close to 49 days since the last 1-second interrupt (if it's really
>> a 1-second interrupt, it easily meets that criteria.  If you try to pick
>> something irregular, like a keypress interrupt, then that won't work).  It
>> does not depend on the exact rate of the interrupt at all.
>>
>> I wrote it above with extra safety--It subtracts two 32-bit unsigned variables,
>> gets a 32-bit unsigned result, treats that as a 32-bit signed result, and adds
>> that to the 64-bit unsigned ticks count.  It's not strictly necessary to do
>> the 32-bit signed result cast: it just makes the code more robust in case
>> the HW timer moves backwards slightly.  Imagine some code tries to adjust the
>> current timer value by setting it backwards slightly (say, some code trying
>> to calibrate the timer with the RTC or something).  Without the cast to
>> 32-bit signed int, this slight backwards move would result in ticks64
>> jumping ahead 49 days, which would be bad.  In C, this is pretty easy, but it
>> should be carefully commented so no one removes any important casts.
> 
> 
> If you have an atomic 32-bit read, and a 32-bit compare-and-swap, it's
> not too hard.  I had to deal with this in Windows back before the OS
> grew a GetTickCount64().
> 
> The basic idea is to store a 32-bit extension word, with* 8 bits that
> you match to the top of system tick, plus 24 bit of extension (IOW,
> you'll get a 56-bit tick out of the deal).
> 
> Basically, you get the system tick, then read the extension word, and
> compare the high bytes.  If they're the same (IOW, the extension word
> was set in the same epoch as the system tick is currently in), just
> concatenate the low 24 bits of the extension word to the left of the
> system tick, and return that.
> 
> If the high bytes are different, determine if there's been a rollover
> (high byte of system tick lower than high byte of extension word).
> Then update the extension word value appropriately (unconditionally
> copy the system tick high byte to the extension word high byte, and
> increment the low 24 bits if there was a rollover).  CAS that updated
> value back to extension word in memory (ignore failures).  Loop to the
> start of the process to re-get the extended tick.
> 
> The one thing this requires is that this routine get polled at least
> once every ~49 days.
> 
> 
> *You can fiddle the exact parameters here a bit, this is just what I
> did.
> 

I think he refers to the scenario where the lower 32 bits are a
hardware timer register and the upper are interrupt incremented by
software somewhere.
It is possible that you read the lower 32 bits as ffffffff then
the upper 32 bits *after* the transition ffffffff -> 0 and the
interrupt processing; there is no sensible way of knowing if that
occurred based only on reading these two 32 bit values.

If 1ms is the target the solution is easy, just do IRQ every
1 ms and handle both longwords in the IRQ service routine (while
masked). IIRC the OP suggested that in his original post.

Reading the 64 bits using 32 bit reads then is trivial, e.g. this is
how the timebase registers are read on 32 bit power (read upper, read
lower, read upper again, if changed do it all over again), I think
Kent referred to that in his previous post as well, probably more
have mentioned it on the thread.

Dimiter

======================================================
Dimiter Popoff, TGI             http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/

Reply by Les Cargill ●February 8, 20202020-02-08

pozz wrote:
> Il 07/02/2020 16:49, Bernd Linsel ha scritto:
>> David Brown wrote:
>>> On 07/02/2020 09:27, pozz wrote:
>>>
>>> That is a useful thought.&nbsp; It is very important to write code in a way
>>> that it can be tested.&nbsp; And even then, remember that testing can only
>>> prove the /presence/ of bugs, never to prove their /absence/.
>>>
>>> Another trick during testing is to speed up the timers.&nbsp; If you can make
>>> the 1 kHz timer run at 1 MHz for testing, you'll get similar benefits.
>>>
>>
>> The Linux approach is still better:
>> Initialize the timer count variable with a value just some seconds 
>> before it wraps.
> 
> This helps if the bug is deterministic. If it isn't and it doesn't 
> happen after startup at the first wrap-around, it takes 49 days to have 
> another possibility to see the bug.
> 
> 
This is why it's important to be able to seed the timer counter.

-- 
Les Cargill

Reply by Robert Wessel ●February 9, 20202020-02-09

On Sat, 8 Feb 2020 23:18:28 +0200, Dimiter_Popoff <dp@tgi-sci.com>
wrote:

>On 2/8/2020 22:46, Robert Wessel wrote:
>> On Sat, 08 Feb 2020 11:03:23 -0600, kegs@provalid.com (Kent Dickey)
>> wrote:
>> 
>>> In article <r1h1lj$i5f$1@dont-email.me>, pozz  <pozzugno@gmail.com> wrote:
>>>> I need a timestamp in millisecond in linux epoch. It is a number that
>>>> doesn't fit in a 32-bits number.
>>>>
>>>> I'm using a 32-bit MCU (STM32L4R9...) so I don't have a 64-bits hw
>>>> counter. I need to create a mixed sw/hw 64-bits counter. It's very
>>>> simple, I configure a 32-bits hw timer to run at 1kHz and increment an
>>>> uint32_t variable in timer overflow ISR.
>>>>
>>>> Now I need to implement a GetTick() function that returns a uint64_t. I
>>>> know it could be difficult, because of race conditions. One solutions is
>>>> to disable interrupts, but I remember another solution.
>>>
>>> This is actually a very tricky problem.  I believe it is not possible to
>>> solve it with the constraints you have laid out above.  David Brown's solution
>>> in his GetTick() function is correct, but it doesn't discuss why.
>>>
>>> If you have a valid 64-bit counter which you can only reference 32-bits at
>>> a time (which I'll make functions, read_high32() and read_low32(), but these
>>> can be hardware registers, volatile globals, or real functions), then an
>>> algorithm to read it reliably is basically your original algorithm:
>>>
>>> uint64_t
>>> GetTick()
>>> {
>>>         old_high32 = read_high32();
>>>         while(1) {
>>>                 low32 = read_low32();
>>>                 new_high32 = read_high32();
>>>                 if(new_high32 == old_high32) {
>>>                         return ((uint64_t)new_high32 << 32) | low32;
>>>                 }
>>>                 old_high32 = new_high32;
>>>         }
>>> }
>>>
>>> This code does not need to mask interrupts, and it works on multiple CPUs.
>>> This works even if interrupts occur at any point for any duration, even
>>> if the code is interrupted for more than 49 days.
>>>
>>> However, you don't have a valid 64-bit counter you can only read 32-bits at a
>>> time.  You have a free-running hardware counter which read_low32() returns.
>>> It counts up every 1ms, and eventually wraps from 0xffff_ffff to 0x0000_0000
>>> and causes an interrupt (which lots of people have helpfully calculated at
>>> about 49 days).  Let's assume that interrupt calls this handler:
>>>
>>> volatile uint32_t ticks_high = 0;
>>> void
>>> timer_wrap_interrupt()
>>> {
>>>         ticks_high++;
>>> }
>>>
>>> where by convention only this code will write to ticks_high (this is a very
>>> important limitation).  And so my function read_high32() is simply:
>>> { return ticks_high; }.
>>>
>>> Unfortunately, with this design, I believe it is not possible to implement
>>> a GetTick() function which does not sometimes fail to return a correct time.
>>> There is a fundamental race between the interrupt and the timer value rolling
>>> to 0 which software cannot account for.
>>>
>>> The problem is it's possible for software to read the HW counter and see it
>>> has rolled over from 0xffff_ffff to 0 BEFORE the interrupt occurs which
>>> increments ticks_high.  This is an inherent race: the timer wraps to 0, and
>>> signals an interrupt.  It's possible, even if for only a few cycles, to
>>> read the register and see the zero before the interrupt is taken.
>>>
>>> Shown more explicitly, the following are all valid states (let's assume
>>> ticks_high is 0, read_low32() just ticked to 0xffff_fffe):
>>>
>>> Time            read_low32()            ticks_high
>>> -------------------------------------------------
>>> 0               0xffff_fffe             0
>>> 1ms             0xffff_ffff             0
>>> 1.99999ms       0xffff_ffff             0
>>> 2ms             0x0000_0000             0
>>> Interrupt is sent and is now pending
>>> 2ms+delta       0x0000_0000             1
>>>
>>> The issue is: what is "delta", and can other code (including your GetTick()
>>> function) run between "2ms" and "2ms+delta"?  And the answer is almost
>>> assuredly "yes".  This is a problem.
>>>
>>> The GetTick() routine above can read g_high32==0, read_low32()==0, and then
>>> g_high32==0 again at around time 2ms+small_amount, and return 0, even though
>>> a cycle or two ago, read_low32() returned 0xffff_ffff.  So time appears to
>>> jump backwards 49 days when this happens.
>>>
>>> There are a variety of solutions to this problem, but they all involve
>>> extra work and ignoring the 32-bit rollover interrupt.  So, remove
>>> timer_wrap_interrupt(), and then do:
>>>
>>> 1) Have a single GetTick() routine, which is single-tasking (by
>>> disabling interrupts, or a mutex if there are multiple processors).
>>> This requires something to call GetTick() at least once every 49 days
>>> (worst case).  This is basically the Rich C./David Brown solution, but
>>> they don't mention that you need to remove the interrupt on 32-bit overflow.
>>>
>>> 2) Use a higher interrupt rate.  For instance, if we can take the interrupt
>>> when read_low32() has carry from bit 28 to bit 29, then we can piece together
>>> code which can work as long as GetTick() isn't delayed by more than 3-4 days.
>>> This require GetTick() to change using code given under #4 below.
>>>
>>> 3) Forget the hardware counter: just take an interrupt every 1ms, and
>>> increment a global variable uint64_t ticks64 on each interrupt, and then
>>> GetTick just returns ticks64.  This only works if the CPU hardware supports
>>> atomic 64-bit accesses.  It's not generally possible to write C code for a
>>> 32-bit processor which can guarantee 64-bit atomic ops, so it's best to have
>>> the interrupt handler deal with two 32-bit variables ticks_low and
>>> ticks_high, and then you still need the GetTicks() to have a while loop to
>>> read the two variables.
>>>
>>> 4) Use a regular existing interrupt which occurs at any rate, as long as it's
>>> well over 1ms, and well under 49 days.  Let's assume you have a 1-second
>>> interrupt.  This can be asynchronous to the 1ms timer.  In that interrupt
>>> handler, you sample the 32-bit hardware counter, and if you notice it
>>> wrapping (previous read value > new value), increment ticks_high.
>>> You need to update the global volatile variable ticks_low as well as the
>>> current hw count.  And this interrupt handler needs to be the only code
>>> changing ticks_low and ticks_high.  Then, GetTick() does the following:
>>>
>>>         uint32_t local_ticks_low, local_ticks_high;
>>>         [ while loop to read valid ticks_low and ticks_high value into the
>>>                 local_* variables ]
>>>         uint64_t ticks64 = ((uint64_t)local_ticks_high << 32) | local_ticks_low;
>>>         ticks64 += (int32_t)(read_low32() - local_ticks_low);
>>>         return ticks64;
>>>
>>> Basically, we return the ticks64 from the last regular interrupt, which could
>>> be 1 second ago, and we add in the small delta from reading the hw counter.
>>> Again, this requires the 1-second interrupt to be guaranteed to happen before
>>> we get close to 49 days since the last 1-second interrupt (if it's really
>>> a 1-second interrupt, it easily meets that criteria.  If you try to pick
>>> something irregular, like a keypress interrupt, then that won't work).  It
>>> does not depend on the exact rate of the interrupt at all.
>>>
>>> I wrote it above with extra safety--It subtracts two 32-bit unsigned variables,
>>> gets a 32-bit unsigned result, treats that as a 32-bit signed result, and adds
>>> that to the 64-bit unsigned ticks count.  It's not strictly necessary to do
>>> the 32-bit signed result cast: it just makes the code more robust in case
>>> the HW timer moves backwards slightly.  Imagine some code tries to adjust the
>>> current timer value by setting it backwards slightly (say, some code trying
>>> to calibrate the timer with the RTC or something).  Without the cast to
>>> 32-bit signed int, this slight backwards move would result in ticks64
>>> jumping ahead 49 days, which would be bad.  In C, this is pretty easy, but it
>>> should be carefully commented so no one removes any important casts.
>> 
>> 
>> If you have an atomic 32-bit read, and a 32-bit compare-and-swap, it's
>> not too hard.  I had to deal with this in Windows back before the OS
>> grew a GetTickCount64().
>> 
>> The basic idea is to store a 32-bit extension word, with* 8 bits that
>> you match to the top of system tick, plus 24 bit of extension (IOW,
>> you'll get a 56-bit tick out of the deal).
>> 
>> Basically, you get the system tick, then read the extension word, and
>> compare the high bytes.  If they're the same (IOW, the extension word
>> was set in the same epoch as the system tick is currently in), just
>> concatenate the low 24 bits of the extension word to the left of the
>> system tick, and return that.
>> 
>> If the high bytes are different, determine if there's been a rollover
>> (high byte of system tick lower than high byte of extension word).
>> Then update the extension word value appropriately (unconditionally
>> copy the system tick high byte to the extension word high byte, and
>> increment the low 24 bits if there was a rollover).  CAS that updated
>> value back to extension word in memory (ignore failures).  Loop to the
>> start of the process to re-get the extended tick.
>> 
>> The one thing this requires is that this routine get polled at least
>> once every ~49 days.
>> 
>> 
>> *You can fiddle the exact parameters here a bit, this is just what I
>> did.
>> 
>
>I think he refers to the scenario where the lower 32 bits are a
>hardware timer register and the upper are interrupt incremented by
>software somewhere.
>It is possible that you read the lower 32 bits as ffffffff then
>the upper 32 bits *after* the transition ffffffff -> 0 and the
>interrupt processing; there is no sensible way of knowing if that
>occurred based only on reading these two 32 bit values.


The point of the described algorithm is specifically to detect and
handle the case where the base (short) timer value overflows, but
without locks, or requiring more than a single additional word in
storage.

Reply by ●February 9, 20202020-02-09

On Sat, 08 Feb 2020 14:30:53 -0600, Robert Wessel
<robertwessel2@yahoo.com> wrote:

>On Sat, 08 Feb 2020 18:37:35 +0200, upsidedown@downunder.com wrote:
>
>>On Sat, 8 Feb 2020 16:24:30 +0100, David Brown
>><david.brown@hesbynett.no> wrote:
>>
>>>On 07/02/2020 23:03, Jim Jackson wrote:
>>>>>> A 32 bit counter incremented at 1 kHz will roll over every 3 years or so.
>>>>>
>>>>> 2 ^ 32 / (24 * 60 * 60 * 1000) = 49.71026962... (days)
>>>> 
>>>> Ah yes. Early versions of Windows NT. Crashed if they had an uptime
>>>> of 49 and a bit days - I wonder why :-)
>>>> 
>>>> Mind you getting early NT to stay up that long without crashing or needing
>>>> a reboot was bloody difficult.
>>>> 
>>>
>>>I believe it was Windows 95 that had this problem - and it was not 
>>>discovered until about 2005, because no one had kept Windows 95 running 
>>>for 49 days.
>>
>>That is a more believable explanation.
>>
>>
>>>Maybe early NT had a similar fault, of course.  But people /did/ have NT 
>>>running for long uptimes from very early on, so such a bug would have 
>>>been found fairly quickly.
>>
>>Both VAX/VMS as well as Windows NT use 100 ns as the basic unit for
>>time of the day timing. 
>>
>>On Windows NT on single processor the interrupt rate was 100 Hz, on
>>muliprocessor 64 Hz.
>>
>>Some earlier Windows versions used 55 Hz (or was it 55 ms) clock
>>interrupt rate, so I really don't understand from where the 1 ms clock
>>tick or 49 days is from.
>
>
>The "tick count" in the (Win32) OS was always 1000Hz (as reported by
>GetTickCount(), for example).  The physical ticks were massaged to
>correctly update that count.

With multimedia timers enabled that may be the case, however, without
them, the sleep time granulation is 10 ms, which would indicate a 100
Hz update rate. Also SetTimeAdjustment() win32 call assumes 100 Hz (or
64 Hz) update rate.