timestamp in ms and 64-bit counter| page 6

Reply by Richard Damon ●February 9, 20202020-02-09

On 2/9/20 6:55 PM, pozz wrote:
> Il 08/02/2020 18:03, Kent Dickey ha scritto:

>> 1) Have a single GetTick() routine, which is single-tasking (by
>> disabling interrupts, or a mutex if there are multiple processors).
>> This requires something to call GetTick() at least once every 49 days
>> (worst case).&#4294967295; This is basically the Rich C./David Brown solution, but
>> they don't mention that you need to remove the interrupt on 32-bit 
>> overflow.
> 
> I think you mentioned to disable interrupts to avoid any preemption from 
> RTOS scheduler, effectively blocking scheduler at all.
> However I know it's a bad idea to enable/disable interrupts "manually" 
> with an RTOS.

The use of SHORT critical sections based on disabling interrupts is 
almost never an issue, and most RTOSes that I know have that ability. 
They are often given a name based on entering/exiting a critical section 
as opposed to enable/disable the interrupts, in part to remind you that 
they need to be well paired and the region short.

Reply by Rick C ●February 10, 20202020-02-10

On Sunday, February 9, 2020 at 5:27:39 PM UTC-5, pozz wrote:
> Il 06/02/2020 19:02, Rick C ha scritto:
>  > [...]
> > Is the "call this code at least once in 3 years" requirement reasonable?  In the systems I design that would not be a problem.
> 
> Of course this limitation isn't usually a real problem.
> 
> However there could be some situation where GetTick() is called after 49 
> days. For example, you can have an IoT device that starts sending data 
> (with timestamps) after the user make a request. And 
> timestamps/GetTick() is used only in the routine that sends data.
> 
> Maybe the user, after purchasing, is excited of this gadget and make the 
> request multiple times every day. After some weeks, he could forget to 
> have this gadget and maybe remember of it only after many days...

Yeah, and the reliability requirements of such consumer goods are typically not so stiff, so if they are not being used for months on end a malfunction is not unexpected.  I recall having routers that needed a power cycle every week or two.  I have a wifi extender that seems to work pretty well, but has needed to be power cycled a few times a year.  Once in 3 years is one thing.  Once every 49 days is another. 

Still, such a requirement in this case doesn't really solve any problems it seems.  So no need to worry with it. 

-- 

  Rick C.

  --- Get 1,000 miles of free Supercharging
  --- Tesla referral code - https://ts.la/richard11209

Reply by ●February 10, 20202020-02-10

On Sun, 9 Feb 2020 23:34:26 +0100, pozz <pozzugno@gmail.com> wrote:

>Il 07/02/2020 10:43, David Brown ha scritto:
>> On 07/02/2020 01:29, Rick C wrote:
>>> On Thursday, February 6, 2020 at 4:35:56 PM UTC-5, David Brown
>>> wrote:
>>>> On 06/02/2020 19:02, Rick C wrote:
>> 
>>>
>>>> I think it's quite likely that the code already has a 1 KHz
>>>> interrupt routine doing other things, so incrementing a "high"
>>>> counter there would not be an issue.
>>>
>>> Actually, there is no need for a 1 kHz interrupt.  I believe the OP
>>> has a hardware counter ticking at 1 kHz so the overflow event would
>>> be 49 days.  There may be a faster interrupt for some other purpose,
>>> but not needed for the hardware timer which may well be run on a low
>>> power oscillator and backup battery.
>> 
>> We don't know that - the OP hasn't told us all the details.  An
>> interrupt that hits every millisecond (or some other regular time), used
>> as part of the global timebase and for executing regular functions, is
>> very common.  Maybe he has something like this, maybe not.
>
>I think FreeRTOS is already configured to have a fast interrupt, 
>something similar to 1ms. I suspect it is used to check if some tasks, 
>blocked waiting the expiration of a timer, must be activated.
>
>My first idea is to implement the 64-bits ms-resolution timestamp 
>counter as a completely different than OS ticks, but I think I could add 
>some code to OS ticks interrupt.

Nice !

With a 64 bit counter with 1 ms resolution you can easily record any
event since the days of dinosaurs.

Using VAX/VMS or Windows NT 100 ns resolution, a 64 bit counter can be
used to represent  a 60 000 year long period. If one sets the zero
time at JD=1 Julian date 1 (in 4714 BCE) so any historical events can
be represented with 100 ns resolutions.

Many processors have clock cycle counters. On x86 architectures, there
is a 64 bit Time Stamp Counter register, which is updated every clock
cycle. Even on a 4 GHz CPU, the counter rolls over after 136 years.
Thus the counter is barely sufficient to handle the counts during the
processor lifetime.

Reply by Robert Wessel ●February 10, 20202020-02-10

On Mon, 10 Feb 2020 00:55:13 +0100, pozz <pozzugno@gmail.com> wrote:

>Il 08/02/2020 18:03, Kent Dickey ha scritto:
>> [...]
>> Unfortunately, with this design, I believe it is not possible to implement
>> a GetTick() function which does not sometimes fail to return a correct time.
>> There is a fundamental race between the interrupt and the timer value rolling
>> to 0 which software cannot account for.
>
>Good point, Kent. Thank you for your post that helps to fix some 
>critical bugs.
>
>You're right, ISRs aren't executed immediately after the relative event 
>occurred. We should think ISR code runs after many cycles the interrupt 
>event.
>
>
>> 1) Have a single GetTick() routine, which is single-tasking (by
>> disabling interrupts, or a mutex if there are multiple processors).
>> This requires something to call GetTick() at least once every 49 days
>> (worst case).  This is basically the Rich C./David Brown solution, but
>> they don't mention that you need to remove the interrupt on 32-bit overflow.
>
>I think you mentioned to disable interrupts to avoid any preemption from 
>RTOS scheduler, effectively blocking scheduler at all.
>However I know it's a bad idea to enable/disable interrupts "manually" 
>with an RTOS.
>Maybe the mutex for GetTick() is a better idea, something similar to this:
>
>uint64_t
>GetTick(void)
>{
>    mutex_take();
>
>    static uint32_t ticks_high;
>    uint32_t ticks_hw = hwcnt_get();
>    static uint32_t ticks_last;
>
>    if (ticks_last > ticks_hw)  ticks_high++;
>    ticks_last = ticks_hw;
>    mutex_give();
>
>    return ((uint64_t)ticks_high << 32) | ticks_hw;
>}
>
>> 2) Use a higher interrupt rate.  For instance, if we can take the interrupt
>> when read_low32() has carry from bit 28 to bit 29, then we can piece together
>> code which can work as long as GetTick() isn't delayed by more than 3-4 days.
>> This require GetTick() to change using code given under #4 below.
>> 
>> 3) Forget the hardware counter: just take an interrupt every 1ms, and
>> increment a global variable uint64_t ticks64 on each interrupt, and then
>> GetTick just returns ticks64.  This only works if the CPU hardware supports
>> atomic 64-bit accesses.  It's not generally possible to write C code for a
>> 32-bit processor which can guarantee 64-bit atomic ops, so it's best to have
>> the interrupt handler deal with two 32-bit variables ticks_low and
>> ticks_high, and then you still need the GetTicks() to have a while loop to
>> read the two variables.
>
>What about?
>
>static volatile uint64_t ticks64;
>void timer_isr(void) {
>   ticks64++;
>}
>uint64_t GetTick(void) {
>   uint64_t t1 = ticks64;
>   uint64_t t2;
>   while((t2 = ticks64) - t1 > 100) {
>     t1 = t2;
>   }
>   return t2;
>}
>
>If dangerous things happen (ISR executes during GetTick), t2-t1 is a 
>very big number. 100ms represent the worst case max duration of 
>ISRs/tasks that could preempt/interrupt GetTick. We could increase 100 
>even more.


As I mentioned elsewhere in the thread, if you have an atomic 32-bit
read, and a 32-bit CAS, you can do this without locks pretty simply.

Reply by Dimiter_Popoff ●February 11, 20202020-02-11

On 2/11/2020 2:40, Robert Wessel wrote:
> On Mon, 10 Feb 2020 00:55:13 +0100, pozz <pozzugno@gmail.com> wrote:
> 
>> Il 08/02/2020 18:03, Kent Dickey ha scritto:
>>> [...]
>>> Unfortunately, with this design, I believe it is not possible to implement
>>> a GetTick() function which does not sometimes fail to return a correct time.
>>> There is a fundamental race between the interrupt and the timer value rolling
>>> to 0 which software cannot account for.
>>
>> Good point, Kent. Thank you for your post that helps to fix some
>> critical bugs.
>>
>> You're right, ISRs aren't executed immediately after the relative event
>> occurred. We should think ISR code runs after many cycles the interrupt
>> event.
>>
>>
>>> 1) Have a single GetTick() routine, which is single-tasking (by
>>> disabling interrupts, or a mutex if there are multiple processors).
>>> This requires something to call GetTick() at least once every 49 days
>>> (worst case).  This is basically the Rich C./David Brown solution, but
>>> they don't mention that you need to remove the interrupt on 32-bit overflow.
>>
>> I think you mentioned to disable interrupts to avoid any preemption from
>> RTOS scheduler, effectively blocking scheduler at all.
>> However I know it's a bad idea to enable/disable interrupts "manually"
>> with an RTOS.
>> Maybe the mutex for GetTick() is a better idea, something similar to this:
>>
>> uint64_t
>> GetTick(void)
>> {
>>     mutex_take();
>>
>>     static uint32_t ticks_high;
>>     uint32_t ticks_hw = hwcnt_get();
>>     static uint32_t ticks_last;
>>
>>     if (ticks_last > ticks_hw)  ticks_high++;
>>     ticks_last = ticks_hw;
>>     mutex_give();
>>
>>     return ((uint64_t)ticks_high << 32) | ticks_hw;
>> }
>>
>>> 2) Use a higher interrupt rate.  For instance, if we can take the interrupt
>>> when read_low32() has carry from bit 28 to bit 29, then we can piece together
>>> code which can work as long as GetTick() isn't delayed by more than 3-4 days.
>>> This require GetTick() to change using code given under #4 below.
>>>
>>> 3) Forget the hardware counter: just take an interrupt every 1ms, and
>>> increment a global variable uint64_t ticks64 on each interrupt, and then
>>> GetTick just returns ticks64.  This only works if the CPU hardware supports
>>> atomic 64-bit accesses.  It's not generally possible to write C code for a
>>> 32-bit processor which can guarantee 64-bit atomic ops, so it's best to have
>>> the interrupt handler deal with two 32-bit variables ticks_low and
>>> ticks_high, and then you still need the GetTicks() to have a while loop to
>>> read the two variables.
>>
>> What about?
>>
>> static volatile uint64_t ticks64;
>> void timer_isr(void) {
>>    ticks64++;
>> }
>> uint64_t GetTick(void) {
>>    uint64_t t1 = ticks64;
>>    uint64_t t2;
>>    while((t2 = ticks64) - t1 > 100) {
>>      t1 = t2;
>>    }
>>    return t2;
>> }
>>
>> If dangerous things happen (ISR executes during GetTick), t2-t1 is a
>> very big number. 100ms represent the worst case max duration of
>> ISRs/tasks that could preempt/interrupt GetTick. We could increase 100
>> even more.
> 
> 
> As I mentioned elsewhere in the thread, if you have an atomic 32-bit
> read, and a 32-bit CAS, you can do this without locks pretty simply.
> 

And I replied without having understood what you meant :-). Sorry about
that.

Dimiter

Reply by Rick C ●February 12, 20202020-02-12

On Monday, February 10, 2020 at 7:40:10 PM UTC-5, robert...@yahoo.com wrote:
> 
> As I mentioned elsewhere in the thread, if you have an atomic 32-bit
> read, and a 32-bit CAS, you can do this without locks pretty simply.

I did a search but it didn't turn up.  What's a CAS??? 

-- 

  Rick C.

  --+ Get 1,000 miles of free Supercharging
  --+ Tesla referral code - https://ts.la/richard11209

Reply by David Brown ●February 12, 20202020-02-12

On 12/02/2020 16:26, Rick C wrote:
> On Monday, February 10, 2020 at 7:40:10 PM UTC-5, robert...@yahoo.com wrote:
>>
>> As I mentioned elsewhere in the thread, if you have an atomic 32-bit
>> read, and a 32-bit CAS, you can do this without locks pretty simply.
> 
> I did a search but it didn't turn up.  What's a CAS???
> 

Compare-and-swap.  It is a common instruction for use in multi-threading 
systems as a building block for atomic accesses and lock-free algorithms 
(and for implementing locks):

<https://en.wikipedia.org/wiki/Compare-and-swap>

It corresponds roughly to the C code, executed atomically :

	bool cas(uint32_t * p, uint32_t old, uint32_t new) {
		if (*p == old) {
			*p = new;
			return true;
		} else {
			return false;
		}
	}

It is useful, but has its limits (the wikipedia page describes some, if 
you are interested).  In cases like this, it could be useful.

However, the OP is using an ARM - and like most (but not all) RISC cpus, 
ARM does not have a CAS instruction.  Instead, it has a load-link and 
store-conditional pair, which is more powerful and flexible than CAS but 
a little harder to use.

Reply by Rick C ●February 12, 20202020-02-12

On Wednesday, February 12, 2020 at 12:49:20 PM UTC-5, David Brown wrote:
> On 12/02/2020 16:26, Rick C wrote:
> > On Monday, February 10, 2020 at 7:40:10 PM UTC-5, robert...@yahoo.com wrote:
> >>
> >> As I mentioned elsewhere in the thread, if you have an atomic 32-bit
> >> read, and a 32-bit CAS, you can do this without locks pretty simply.
> > 
> > I did a search but it didn't turn up.  What's a CAS???
> > 
> 
> Compare-and-swap.  It is a common instruction for use in multi-threading 
> systems as a building block for atomic accesses and lock-free algorithms 
> (and for implementing locks):
> 
> <https://en.wikipedia.org/wiki/Compare-and-swap>
> 
> 
> It corresponds roughly to the C code, executed atomically :
> 
> 	bool cas(uint32_t * p, uint32_t old, uint32_t new) {
> 		if (*p == old) {
> 			*p = new;
> 			return true;
> 		} else {
> 			return false;
> 		}
> 	}
> 
> It is useful, but has its limits (the wikipedia page describes some, if 
> you are interested).  In cases like this, it could be useful.
> 
> However, the OP is using an ARM - and like most (but not all) RISC cpus, 
> ARM does not have a CAS instruction.  Instead, it has a load-link and 
> store-conditional pair, which is more powerful and flexible than CAS but 
> a little harder to use.

Someone was on my case about a self designed CPU not having some instruction that is essential for multitasking.  Would this be the instruction?  I'm not sure I understand.  When you say *p == old, where is old kept?  Is there really a stored value of old or is this a way of saying *p /= new???  In that case the code could be... 

bool cas(uint32_t * p, uint32_t old, uint32_t new) {
	if (*p == old) {
		*p = new;
		return true;
	} else {
		*p = new;
		return false;
	}
}

I write this because in my basic architecture memory is read/written on opposite phases of the CPU clock and all instructions are one clock cycle.  The write is predetermined in the first phase of the clock, so the CPU can't have a RMW cycle.  It can have a W/R cycle where the read data is the old data before the write.  As long at the write is always done it can do the above in a single, non interruptible cycle... not that I'm contemplating performing multitasking.  The code is more complex than warranted for a 600 LUT CPU.  Just add another CPU.  lol 

Giving what you wrote more thought it seems pretty clear it has to be implemented the way you have it written.  

I should it look up and learn something, lol. 

-- 

  Rick C.

  -+- Get 1,000 miles of free Supercharging
  -+- Tesla referral code - https://ts.la/richard11209

Reply by Robert Wessel ●February 12, 20202020-02-12

On Wed, 12 Feb 2020 13:14:58 -0800 (PST), Rick C
<gnuarm.deletethisbit@gmail.com> wrote:

>On Wednesday, February 12, 2020 at 12:49:20 PM UTC-5, David Brown wrote:
>> On 12/02/2020 16:26, Rick C wrote:
>> > On Monday, February 10, 2020 at 7:40:10 PM UTC-5, robert...@yahoo.com wrote:
>> >>
>> >> As I mentioned elsewhere in the thread, if you have an atomic 32-bit
>> >> read, and a 32-bit CAS, you can do this without locks pretty simply.
>> > 
>> > I did a search but it didn't turn up.  What's a CAS???
>> > 
>> 
>> Compare-and-swap.  It is a common instruction for use in multi-threading 
>> systems as a building block for atomic accesses and lock-free algorithms 
>> (and for implementing locks):
>> 
>> <https://en.wikipedia.org/wiki/Compare-and-swap>
>> 
>> 
>> It corresponds roughly to the C code, executed atomically :
>> 
>> 	bool cas(uint32_t * p, uint32_t old, uint32_t new) {
>> 		if (*p == old) {
>> 			*p = new;
>> 			return true;
>> 		} else {
>> 			return false;
>> 		}
>> 	}
>> 
>> It is useful, but has its limits (the wikipedia page describes some, if 
>> you are interested).  In cases like this, it could be useful.
>> 
>> However, the OP is using an ARM - and like most (but not all) RISC cpus, 
>> ARM does not have a CAS instruction.  Instead, it has a load-link and 
>> store-conditional pair, which is more powerful and flexible than CAS but 
>> a little harder to use.
>
>Someone was on my case about a self designed CPU not having some instruction that is essential for multitasking.  Would this be the instruction?  I'm not sure I understand.  When you say *p == old, where is old kept?  Is there really a stored value of old or is this a way of saying *p /= new???  In that case the code could be... 
>
>bool cas(uint32_t * p, uint32_t old, uint32_t new) {
>	if (*p == old) {
>		*p = new;
>		return true;
>	} else {
>		*p = new;
>		return false;
>	}
>}
>
>I write this because in my basic architecture memory is read/written on opposite phases of the CPU clock and all instructions are one clock cycle.  The write is predetermined in the first phase of the clock, so the CPU can't have a RMW cycle.  It can have a W/R cycle where the read data is the old data before the write.  As long at the write is always done it can do the above in a single, non interruptible cycle... not that I'm contemplating performing multitasking.  The code is more complex than warranted for a 600 LUT CPU.  Just add another CPU.  lol 
>
>Giving what you wrote more thought it seems pretty clear it has to be implemented the way you have it written.  
>
>I should it look up and learn something, lol. 


No, the idea is to not update the word in memory unless it hasn't been
changed.  The classic example is using CAS to add an item to a linked
list.  You read the head pointer (that has to happen atomically, but
on most CPUs that just requires that it be aligned), construct the new
first element (most crucially the next pointer), and then if the head
pointer is unchanged, you can replace it with a pointer to the new
first item.

If the values are not equal, you don't want to update the head pointer
or you'll trash the linked list.  In that case you retry the insertion
operation using the new head pointer.

CAS is intended to be safe to use to make that update, as it's atomic
- the read of the value in memory, the compare to the old value, and
the conditional update form an atomic block, and can't be interrupted
or messed with by other CPUs in the system.

CAS is pretty easy to simulate with LL/SC.  In some cases you'd be
better off adjusting the algorithm to better use LL/SC.  In this case
it depends on how you're accessing the low word of the timer.  If you
have only a single threaded of execution, you can fake CAS by
disabling interrupts.

What ISA is this for?

Reply by Rick C ●February 12, 20202020-02-12

On Wednesday, February 12, 2020 at 4:44:13 PM UTC-5, robert...@yahoo.com wrote:
> On Wed, 12 Feb 2020 13:14:58 -0800 (PST), Rick C
> <gnuarm.deletethisbit@gmail.com> wrote:
> 
> >On Wednesday, February 12, 2020 at 12:49:20 PM UTC-5, David Brown wrote:
> >> On 12/02/2020 16:26, Rick C wrote:
> >> > On Monday, February 10, 2020 at 7:40:10 PM UTC-5, robert...@yahoo.com wrote:
> >> >>
> >> >> As I mentioned elsewhere in the thread, if you have an atomic 32-bit
> >> >> read, and a 32-bit CAS, you can do this without locks pretty simply.
> >> > 
> >> > I did a search but it didn't turn up.  What's a CAS???
> >> > 
> >> 
> >> Compare-and-swap.  It is a common instruction for use in multi-threading 
> >> systems as a building block for atomic accesses and lock-free algorithms 
> >> (and for implementing locks):
> >> 
> >> <https://en.wikipedia.org/wiki/Compare-and-swap>
> >> 
> >> 
> >> It corresponds roughly to the C code, executed atomically :
> >> 
> >> 	bool cas(uint32_t * p, uint32_t old, uint32_t new) {
> >> 		if (*p == old) {
> >> 			*p = new;
> >> 			return true;
> >> 		} else {
> >> 			return false;
> >> 		}
> >> 	}
> >> 
> >> It is useful, but has its limits (the wikipedia page describes some, if 
> >> you are interested).  In cases like this, it could be useful.
> >> 
> >> However, the OP is using an ARM - and like most (but not all) RISC cpus, 
> >> ARM does not have a CAS instruction.  Instead, it has a load-link and 
> >> store-conditional pair, which is more powerful and flexible than CAS but 
> >> a little harder to use.
> >
> >Someone was on my case about a self designed CPU not having some instruction that is essential for multitasking.  Would this be the instruction?  I'm not sure I understand.  When you say *p == old, where is old kept?  Is there really a stored value of old or is this a way of saying *p /= new???  In that case the code could be... 
> >
> >bool cas(uint32_t * p, uint32_t old, uint32_t new) {
> >	if (*p == old) {
> >		*p = new;
> >		return true;
> >	} else {
> >		*p = new;
> >		return false;
> >	}
> >}
> >
> >I write this because in my basic architecture memory is read/written on opposite phases of the CPU clock and all instructions are one clock cycle.  The write is predetermined in the first phase of the clock, so the CPU can't have a RMW cycle.  It can have a W/R cycle where the read data is the old data before the write.  As long at the write is always done it can do the above in a single, non interruptible cycle... not that I'm contemplating performing multitasking.  The code is more complex than warranted for a 600 LUT CPU.  Just add another CPU.  lol 
> >
> >Giving what you wrote more thought it seems pretty clear it has to be implemented the way you have it written.  
> >
> >I should it look up and learn something, lol. 
> 
> 
> No, the idea is to not update the word in memory unless it hasn't been
> changed.  The classic example is using CAS to add an item to a linked
> list.  You read the head pointer (that has to happen atomically, but
> on most CPUs that just requires that it be aligned), construct the new
> first element (most crucially the next pointer), and then if the head
> pointer is unchanged, you can replace it with a pointer to the new
> first item.
> 
> If the values are not equal, you don't want to update the head pointer
> or you'll trash the linked list.  In that case you retry the insertion
> operation using the new head pointer.
> 
> CAS is intended to be safe to use to make that update, as it's atomic
> - the read of the value in memory, the compare to the old value, and
> the conditional update form an atomic block, and can't be interrupted
> or messed with by other CPUs in the system.
> 
> CAS is pretty easy to simulate with LL/SC.  In some cases you'd be
> better off adjusting the algorithm to better use LL/SC.  In this case
> it depends on how you're accessing the low word of the timer.  If you
> have only a single threaded of execution, you can fake CAS by
> disabling interrupts.

Ok, this is more clear now.  Wikipedia explains LL/SC pretty well.  This is actually for multiple CPUs as much as multitasking.  While you can just disable interrupts (assuming you can live with the interrupt latency issues) to make this work with a single CPU, if you are sharing the data structure with other CPUs the bus requires locking while these multiple transactions are happening.  I assume the CPU has a signal to indicate a locked operation is happening to prevent other accesses from getting in and mucking up the works.  

Is there a way to emulate this locking using semaphores?  Someone I know is a big fan of Propeller CPUs which share memory and I don't know if they have such an instruction.  They share memory by interleaving access. 

> What ISA is this for?

Custom stack processor, related to the Forth VM.  When designing FPGAs I want a CPU will deterministic timing, so 1 instruction = 1 clock cycle works well.  Interrupt latency is zero or one depending on how you count it.  Next cycle after an unmasked interrupt is asserted fetches the first instruction of the IRQ routine.  

The CPU is not pipelined but the registers are aligned through the architecture to make it decode-execute/fetch rather than fetch-decode-execute.  The fetch only depends on flags and instruction decode so it happens in parallel with the execute as far as timing is concerned.  Someone insisted this was pipelined design because of these parallel parts. 

It's nothing special, YAMC (Yet Another MISC CPU).  I've never spent the time to optimize the design for speed.  Instead I did some work to trying to hybridize the stack design with register-like access to the stack to minimize stack juggling.  Once that happened, the number of instructions for the test case I was using (an IRQ for DDS calculations) dropped by either a third or half, I forget which.  The big stumbling block for me is coming up with software to help write code for it.  lol 

-- 

  Rick C.

  -++ Get 1,000 miles of free Supercharging
  -++ Tesla referral code - https://ts.la/richard11209