On 07/08/14 04:32, Randy Yates wrote:> Tom Gardner <spamjunk@blueyonder.co.uk> writes: > >> On 06/08/14 22:31, Randy Yates wrote: >>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>> >>>> On 06/08/14 20:56, Jack wrote: >>>>> Paul Rubin <no.email@nospam.invalid> wrote: >>>>> >>>>>> Rob Gaddi <rgaddi@technologyhighland.invalid> writes: >>>>>>> How do you guarantee microsecond level response from Python (and I >>>>>>> assume Linux)? >>>>>> >>>>>> Linux has a realtime scheduler but guaranteeing microsecond response is >>>>>> not realistic because of nondeterministic cache misses and that sort of >>>>>> thing. For soft realtime maybe it's feasible. Milliseconds are easier >>>>>> than microseconds of course. >>>>> >>>>> or you use something like Linux RTAI that gives you hard real time. >>>> >>>> .. providing, of course, the processor neither instruction nor >>>> data caches. If either are present then the ratio of mean:max >>>> latency rapidly becomes very significant. >>>> >>>> Even a 486 with its tiny caches showed a 10:1 interrupt latency >>>> depending on what was/wasn't in the caches. (IIRC that was measured >>>> with a tiny kernel, certainly nothing like the size/complexity >>>> of a linux kernel) >>> >>> Aren't interrupt routines in some permanently-cached portion of the MMU? >> >> No, and once an MMU is involved all the paging information >> might or might not be cached. Double whammy. > > So you're telling me that Intel made a processor that, by design, could > not service interrupts in a deterministic fashion? Hard to believe. > > Is that also the case for the present-day Intel architectures?Also look at the XMOS multicore processors, which have attracted multiple rounds of significant VC funding and can be obtained for a pittance fromlarge distributors such as DigiKey. Their development environment will tell you how many microseconds each task takes.
Linux question -- how to tell if serial port in /dev is for real?
Started by ●August 4, 2014
Reply by ●August 7, 20142014-08-07
Reply by ●August 7, 20142014-08-07
On 07/08/14 08:42, pozzugno wrote:> Il 06/08/2014 04:38, Tim Wescott ha scritto:> On Tue, 05 Aug 2014 > 16:15:44 -0700, Paul Rubin wrote: >> >>> Tim Wescott <tim@seemywebsite.really> writes: >>>> All of the desktop serial-port stuff I've done in the last decade has >>>> been in support of embedded work, ... >>>> So I'm constrained to C or C++. >>> >>> If this is about embedded Linux, Python works great for that. >> >> Will Python run on an ARM Cortex M0 with 64k of ROM and 8K of RAM? >> >> With room left over for actual application code? > > Linux on ARM Cortex M0? Fantastic... could you give us more details > about the board? Is it a custom board? Are you able to run a full Linux > (no uclinux) on a M0-based board? >I think you've got things a bit mixed up here. No one is running Linux on an M0 (though people /have/ run Linux on M3 and M4 cores, albeit the nommu version - what used to be called ucLinux before the nommu support was integrated into the mainline kernel). The thread here has got somewhat confusing, because people tried to help Tim before he had given us his full requirements. When he wanted to do serial stuff on a Linux system, several responses suggested Python because it is much easier than doing it in C. When he said it had to be portable to embedded systems, Python was still a suggestion since it works fine in embedded Linux systems. But it turns out that he wants to code in C so that it can be easily used on small non-Linux embedded systems (so that he can test and debug the code on the PC, then run it on a Cortex M0 - not a bad plan). And while it is /possible/ to run a cut-down and limited Python on an M0, I think it is unlikely to be a good choice in practice! There is a lesson to be learned here in making your requirements and constraints explicit from the start - it's a lesson we have all "learned" many times, and all forgotten just as often :-) If you are really keen, then I'd imagine this setup could be ported to an M0: <http://dangerousprototypes.com/2012/03/29/running-linux-on-a-8bit-avr/>
Reply by ●August 7, 20142014-08-07
On Thu, 07 Aug 2014 08:37:26 +0100, Tom Gardner <spamjunk@blueyonder.co.uk> wrote:>On 07/08/14 04:36, Randy Yates wrote: >> Randy Yates <yates@digitalsignallabs.com> writes: >> >>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>> >>>> On 06/08/14 22:31, Randy Yates wrote: >>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>> >>>>>> On 06/08/14 20:56, Jack wrote: >>>>>>> Paul Rubin <no.email@nospam.invalid> wrote: >>>>>>> >>>>>>>> Rob Gaddi <rgaddi@technologyhighland.invalid> writes: >>>>>>>>> How do you guarantee microsecond level response from Python (and I >>>>>>>>> assume Linux)? >>>>>>>> >>>>>>>> Linux has a realtime scheduler but guaranteeing microsecond response is >>>>>>>> not realistic because of nondeterministic cache misses and that sort of >>>>>>>> thing. For soft realtime maybe it's feasible. Milliseconds are easier >>>>>>>> than microseconds of course. >>>>>>> >>>>>>> or you use something like Linux RTAI that gives you hard real time. >>>>>> >>>>>> .. providing, of course, the processor neither instruction nor >>>>>> data caches. If either are present then the ratio of mean:max >>>>>> latency rapidly becomes very significant. >>>>>> >>>>>> Even a 486 with its tiny caches showed a 10:1 interrupt latency >>>>>> depending on what was/wasn't in the caches. (IIRC that was measured >>>>>> with a tiny kernel, certainly nothing like the size/complexity >>>>>> of a linux kernel) >>>>> >>>>> Aren't interrupt routines in some permanently-cached portion of the MMU? >>>> >>>> No, and once an MMU is involved all the paging information >>>> might or might not be cached. Double whammy. >>> >>> So you're telling me that Intel made a processor that, by design, could >>> not service interrupts in a deterministic fashion? Hard to believe. >>> >>> Is that also the case for the present-day Intel architectures? >> >> I should add that real-time operation is therefore not possible on such >> processors, regardless of what operating system is used. This just >> doesn't sound right to me... > >That depends on your requirements. Soft realtime certainly is >possible. For hard realtime then you will have to determine the >mean:max latency and "derate" the processor appropriately. > >As I noted, you needed 10:1 for the i486, and I have >no idea whatsoever what you need for a current Intel >processor. > >The problem is not confined to Intel; it *must* occur wherever >there are caches. After all, the whole point of caches is to >speed up things *on average*, so by definition there must be >some sequences that perform worse than average. > >Your job, for hard realtime systems, is to determine the >pessimal sequence :) (Optimal sequence be damned!)In most systems, various caches (data, instruction, MMU TLB) can be disabled or at least frequently invalidated, so you get the worst case performance. Hard real-time specify deadlines so that the execution time _must_ be in 100 % cases below a certain limit. As long as that requirement is met, the actual execution time could be 1 % or 99 % of that deadline time. The only benefit of a very low execution time is that you may say some power :-). In hard real-time environment, one would not use busy loops the create some specific amount of time delay, so it does not matter, how many cycles the processor executes with or without cache. The only interesting thing is that the worst case execution time is _below_ the deadline time.
Reply by ●August 7, 20142014-08-07
On 07/08/14 10:18, upsidedown@downunder.com wrote:> On Thu, 07 Aug 2014 08:37:26 +0100, Tom Gardner > <spamjunk@blueyonder.co.uk> wrote: > >> On 07/08/14 04:36, Randy Yates wrote: >>> Randy Yates <yates@digitalsignallabs.com> writes: >>> >>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>> >>>>> On 06/08/14 22:31, Randy Yates wrote: >>>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>>> >>>>>>> On 06/08/14 20:56, Jack wrote: >>>>>>>> Paul Rubin <no.email@nospam.invalid> wrote: >>>>>>>> >>>>>>>>> Rob Gaddi <rgaddi@technologyhighland.invalid> writes: >>>>>>>>>> How do you guarantee microsecond level response from Python (and I >>>>>>>>>> assume Linux)? >>>>>>>>> >>>>>>>>> Linux has a realtime scheduler but guaranteeing microsecond response is >>>>>>>>> not realistic because of nondeterministic cache misses and that sort of >>>>>>>>> thing. For soft realtime maybe it's feasible. Milliseconds are easier >>>>>>>>> than microseconds of course. >>>>>>>> >>>>>>>> or you use something like Linux RTAI that gives you hard real time. >>>>>>> >>>>>>> .. providing, of course, the processor neither instruction nor >>>>>>> data caches. If either are present then the ratio of mean:max >>>>>>> latency rapidly becomes very significant. >>>>>>> >>>>>>> Even a 486 with its tiny caches showed a 10:1 interrupt latency >>>>>>> depending on what was/wasn't in the caches. (IIRC that was measured >>>>>>> with a tiny kernel, certainly nothing like the size/complexity >>>>>>> of a linux kernel) >>>>>> >>>>>> Aren't interrupt routines in some permanently-cached portion of the MMU? >>>>> >>>>> No, and once an MMU is involved all the paging information >>>>> might or might not be cached. Double whammy. >>>> >>>> So you're telling me that Intel made a processor that, by design, could >>>> not service interrupts in a deterministic fashion? Hard to believe. >>>> >>>> Is that also the case for the present-day Intel architectures? >>> >>> I should add that real-time operation is therefore not possible on such >>> processors, regardless of what operating system is used. This just >>> doesn't sound right to me... >> >> That depends on your requirements. Soft realtime certainly is >> possible. For hard realtime then you will have to determine the >> mean:max latency and "derate" the processor appropriately. >> >> As I noted, you needed 10:1 for the i486, and I have >> no idea whatsoever what you need for a current Intel >> processor. >> >> The problem is not confined to Intel; it *must* occur wherever >> there are caches. After all, the whole point of caches is to >> speed up things *on average*, so by definition there must be >> some sequences that perform worse than average. >> >> Your job, for hard realtime systems, is to determine the >> pessimal sequence :) (Optimal sequence be damned!) > > In most systems, various caches (data, instruction, MMU TLB) can be > disabled or at least frequently invalidated, so you get the worst case > performance.Disabling resolves the problem, "frequent invalidation" merely allows you to falsely convince yourself that the problem is resolved. If disabled, it would be cheaper (cost, power) not to have the cache in the first place.> Hard real-time specify deadlines so that the execution time _must_ be > in 100 % cases below a certain limit. As long as that requirement is > met, the actual execution time could be 1 % or 99 % of that deadline > time. The only benefit of a very low execution time is that you may > say some power :-).That need proof, not assertion!> The only interesting thing is that the worst case execution time is > _below_ the deadline time.Of course. Now /prove/ the worst case timing when caches are operating. If they aren't operating then they are a waste of money, power and development time (ensuring they are all disbled).
Reply by ●August 7, 20142014-08-07
On 07/08/14 12:35, Tom Gardner wrote:> On 07/08/14 10:18, upsidedown@downunder.com wrote: >> On Thu, 07 Aug 2014 08:37:26 +0100, Tom Gardner >> <spamjunk@blueyonder.co.uk> wrote: >> >>> On 07/08/14 04:36, Randy Yates wrote: >>>> Randy Yates <yates@digitalsignallabs.com> writes: >>>> >>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>> >>>>>> On 06/08/14 22:31, Randy Yates wrote: >>>>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>>>> >>>>>>>> On 06/08/14 20:56, Jack wrote: >>>>>>>>> Paul Rubin <no.email@nospam.invalid> wrote: >>>>>>>>> >>>>>>>>>> Rob Gaddi <rgaddi@technologyhighland.invalid> writes: >>>>>>>>>>> How do you guarantee microsecond level response from Python >>>>>>>>>>> (and I >>>>>>>>>>> assume Linux)? >>>>>>>>>> >>>>>>>>>> Linux has a realtime scheduler but guaranteeing microsecond >>>>>>>>>> response is >>>>>>>>>> not realistic because of nondeterministic cache misses and >>>>>>>>>> that sort of >>>>>>>>>> thing. For soft realtime maybe it's feasible. Milliseconds >>>>>>>>>> are easier >>>>>>>>>> than microseconds of course. >>>>>>>>> >>>>>>>>> or you use something like Linux RTAI that gives you hard real >>>>>>>>> time. >>>>>>>> >>>>>>>> .. providing, of course, the processor neither instruction nor >>>>>>>> data caches. If either are present then the ratio of mean:max >>>>>>>> latency rapidly becomes very significant. >>>>>>>> >>>>>>>> Even a 486 with its tiny caches showed a 10:1 interrupt latency >>>>>>>> depending on what was/wasn't in the caches. (IIRC that was measured >>>>>>>> with a tiny kernel, certainly nothing like the size/complexity >>>>>>>> of a linux kernel) >>>>>>> >>>>>>> Aren't interrupt routines in some permanently-cached portion of >>>>>>> the MMU? >>>>>> >>>>>> No, and once an MMU is involved all the paging information >>>>>> might or might not be cached. Double whammy. >>>>> >>>>> So you're telling me that Intel made a processor that, by design, >>>>> could >>>>> not service interrupts in a deterministic fashion? Hard to believe. >>>>> >>>>> Is that also the case for the present-day Intel architectures? >>>> >>>> I should add that real-time operation is therefore not possible on such >>>> processors, regardless of what operating system is used. This just >>>> doesn't sound right to me... >>> >>> That depends on your requirements. Soft realtime certainly is >>> possible. For hard realtime then you will have to determine the >>> mean:max latency and "derate" the processor appropriately. >>> >>> As I noted, you needed 10:1 for the i486, and I have >>> no idea whatsoever what you need for a current Intel >>> processor. >>> >>> The problem is not confined to Intel; it *must* occur wherever >>> there are caches. After all, the whole point of caches is to >>> speed up things *on average*, so by definition there must be >>> some sequences that perform worse than average. >>> >>> Your job, for hard realtime systems, is to determine the >>> pessimal sequence :) (Optimal sequence be damned!) >> >> In most systems, various caches (data, instruction, MMU TLB) can be >> disabled or at least frequently invalidated, so you get the worst case >> performance. > > Disabling resolves the problem, "frequent invalidation" > merely allows you to falsely convince yourself that the > problem is resolved. > > If disabled, it would be cheaper (cost, power) not to > have the cache in the first place. > > >> Hard real-time specify deadlines so that the execution time _must_ be >> in 100 % cases below a certain limit. As long as that requirement is >> met, the actual execution time could be 1 % or 99 % of that deadline >> time. The only benefit of a very low execution time is that you may >> say some power :-). > > That need proof, not assertion! > > >> The only interesting thing is that the worst case execution time is >> _below_ the deadline time. > > Of course. Now /prove/ the worst case timing when caches > are operating. > > If they aren't operating then they are a waste of money, power and > development time (ensuring they are all disbled). >You can usually get a pretty solid idea of the worst case cache timing, especially if it is write-through (with write-back, you could have many "dirty" lines that need written before flushing). Reading via a cache may mean an extra couple of clock cycles to handle matching and flushing, before the actual memory read. And it will typically mean something like 4 times as much data is read to fill the cache line even though you just request one read. Using such numbers, you can work out an absolute worst case cost for the cache if every single memory access is independent, scattered about memory, and causes a cache flush - say, all memory reads take four times as long as without cache. Then you can do your deadline testing on that basis, perhaps by reducing the memory clock to 25% (or the whole system clock if the memory clock is not independent) when testing with caches disabled. Another thing to remember in all this is that you do not have to prove that your deadlines will be reached in 100% of cases. Perhaps 99.999% is good enough, or perhaps you need 7 nines. But your task is never to aim for "perfect" - it is to be "good enough". If you can provide statistical evidence that it is more likely for the user to be killed by a meteorite than for a deadline to be missed, then that is often good enough for the job. Of course you must be careful doing this sort of thing - but there is always a balance to be struck between the reliability of a system and the cost.
Reply by ●August 7, 20142014-08-07
On 07/08/14 11:53, David Brown wrote:> On 07/08/14 12:35, Tom Gardner wrote: >> On 07/08/14 10:18, upsidedown@downunder.com wrote: >>> On Thu, 07 Aug 2014 08:37:26 +0100, Tom Gardner >>> <spamjunk@blueyonder.co.uk> wrote: >>> >>>> On 07/08/14 04:36, Randy Yates wrote: >>>>> Randy Yates <yates@digitalsignallabs.com> writes: >>>>> >>>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>>> >>>>>>> On 06/08/14 22:31, Randy Yates wrote: >>>>>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>>>>> >>>>>>>>> On 06/08/14 20:56, Jack wrote: >>>>>>>>>> Paul Rubin <no.email@nospam.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> Rob Gaddi <rgaddi@technologyhighland.invalid> writes: >>>>>>>>>>>> How do you guarantee microsecond level response from Python >>>>>>>>>>>> (and I >>>>>>>>>>>> assume Linux)? >>>>>>>>>>> >>>>>>>>>>> Linux has a realtime scheduler but guaranteeing microsecond >>>>>>>>>>> response is >>>>>>>>>>> not realistic because of nondeterministic cache misses and >>>>>>>>>>> that sort of >>>>>>>>>>> thing. For soft realtime maybe it's feasible. Milliseconds >>>>>>>>>>> are easier >>>>>>>>>>> than microseconds of course. >>>>>>>>>> >>>>>>>>>> or you use something like Linux RTAI that gives you hard real >>>>>>>>>> time. >>>>>>>>> >>>>>>>>> .. providing, of course, the processor neither instruction nor >>>>>>>>> data caches. If either are present then the ratio of mean:max >>>>>>>>> latency rapidly becomes very significant. >>>>>>>>> >>>>>>>>> Even a 486 with its tiny caches showed a 10:1 interrupt latency >>>>>>>>> depending on what was/wasn't in the caches. (IIRC that was measured >>>>>>>>> with a tiny kernel, certainly nothing like the size/complexity >>>>>>>>> of a linux kernel) >>>>>>>> >>>>>>>> Aren't interrupt routines in some permanently-cached portion of >>>>>>>> the MMU? >>>>>>> >>>>>>> No, and once an MMU is involved all the paging information >>>>>>> might or might not be cached. Double whammy. >>>>>> >>>>>> So you're telling me that Intel made a processor that, by design, >>>>>> could >>>>>> not service interrupts in a deterministic fashion? Hard to believe. >>>>>> >>>>>> Is that also the case for the present-day Intel architectures? >>>>> >>>>> I should add that real-time operation is therefore not possible on such >>>>> processors, regardless of what operating system is used. This just >>>>> doesn't sound right to me... >>>> >>>> That depends on your requirements. Soft realtime certainly is >>>> possible. For hard realtime then you will have to determine the >>>> mean:max latency and "derate" the processor appropriately. >>>> >>>> As I noted, you needed 10:1 for the i486, and I have >>>> no idea whatsoever what you need for a current Intel >>>> processor. >>>> >>>> The problem is not confined to Intel; it *must* occur wherever >>>> there are caches. After all, the whole point of caches is to >>>> speed up things *on average*, so by definition there must be >>>> some sequences that perform worse than average. >>>> >>>> Your job, for hard realtime systems, is to determine the >>>> pessimal sequence :) (Optimal sequence be damned!) >>> >>> In most systems, various caches (data, instruction, MMU TLB) can be >>> disabled or at least frequently invalidated, so you get the worst case >>> performance. >> >> Disabling resolves the problem, "frequent invalidation" >> merely allows you to falsely convince yourself that the >> problem is resolved. >> >> If disabled, it would be cheaper (cost, power) not to >> have the cache in the first place. >> >> >>> Hard real-time specify deadlines so that the execution time _must_ be >>> in 100 % cases below a certain limit. As long as that requirement is >>> met, the actual execution time could be 1 % or 99 % of that deadline >>> time. The only benefit of a very low execution time is that you may >>> say some power :-). >> >> That need proof, not assertion! >> >> >>> The only interesting thing is that the worst case execution time is >>> _below_ the deadline time. >> >> Of course. Now /prove/ the worst case timing when caches >> are operating. >> >> If they aren't operating then they are a waste of money, power and >> development time (ensuring they are all disbled). >> > > You can usually get a pretty solid idea of the worst case cache timing, > especially if it is write-through (with write-back, you could have many > "dirty" lines that need written before flushing). Reading via a cache > may mean an extra couple of clock cycles to handle matching and > flushing, before the actual memory read. And it will typically mean > something like 4 times as much data is read to fill the cache line even > though you just request one read. Using such numbers, you can work out > an absolute worst case cost for the cache if every single memory access > is independent, scattered about memory, and causes a cache flush - say, > all memory reads take four times as long as without cache.And presuming the processor manufacturer doesn't "improve" the chip, and purchasing gets exactly the same one for the next batch, etc.> Then you can do your deadline testing on that basis, perhaps by reducing > the memory clock to 25% (or the whole system clock if the memory clock > is not independent) when testing with caches disabled.I strongly suspect that a factor of 4 is way too optimistic; even the i486 showed a factor of 10. /Demonstrating/ (i.e. not merely asserting) that a factor of 1000 or 10000 or 100000 is appropriate is extraordinarily difficult. Much easier to not have the issue in the first place.> Another thing to remember in all this is that you do not have to prove > that your deadlines will be reached in 100% of cases. Perhaps 99.999% > is good enough, or perhaps you need 7 nines. But your task is never to > aim for "perfect" - it is to be "good enough". If you can provide > statistical evidence that it is more likely for the user to be killed by > a meteorite than for a deadline to be missed, then that is often good > enough for the job. Of course you must be careful doing this sort of > thing - but there is always a balance to be struck between the > reliability of a system and the cost.With hardware synchronisers you really want to ensure metastability failure rates of 10^12 or better! There is always a tension between "the best is the enemy of the good" and "having a Ford Pinto discussion". I'm satisfied if people realise and understand the downsides to caches before they make the correct decision for their requirements. ... and don't use "i686 etc" in the same context as "hard realtime" :)
Reply by ●August 7, 20142014-08-07
On Thu, 07 Aug 2014 11:35:48 +0100, Tom Gardner <spamjunk@blueyonder.co.uk> wrote:>On 07/08/14 10:18, upsidedown@downunder.com wrote: >> On Thu, 07 Aug 2014 08:37:26 +0100, Tom Gardner >> <spamjunk@blueyonder.co.uk> wrote: >> >>> On 07/08/14 04:36, Randy Yates wrote: >>>> Randy Yates <yates@digitalsignallabs.com> writes: >>>> >>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>> >>>>>> On 06/08/14 22:31, Randy Yates wrote: >>>>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>>>> >>>>>>>> On 06/08/14 20:56, Jack wrote: >>>>>>>>> Paul Rubin <no.email@nospam.invalid> wrote: >>>>>>>>> >>>>>>>>>> Rob Gaddi <rgaddi@technologyhighland.invalid> writes: >>>>>>>>>>> How do you guarantee microsecond level response from Python (and I >>>>>>>>>>> assume Linux)? >>>>>>>>>> >>>>>>>>>> Linux has a realtime scheduler but guaranteeing microsecond response is >>>>>>>>>> not realistic because of nondeterministic cache misses and that sort of >>>>>>>>>> thing. For soft realtime maybe it's feasible. Milliseconds are easier >>>>>>>>>> than microseconds of course. >>>>>>>>> >>>>>>>>> or you use something like Linux RTAI that gives you hard real time. >>>>>>>> >>>>>>>> .. providing, of course, the processor neither instruction nor >>>>>>>> data caches. If either are present then the ratio of mean:max >>>>>>>> latency rapidly becomes very significant. >>>>>>>> >>>>>>>> Even a 486 with its tiny caches showed a 10:1 interrupt latency >>>>>>>> depending on what was/wasn't in the caches. (IIRC that was measured >>>>>>>> with a tiny kernel, certainly nothing like the size/complexity >>>>>>>> of a linux kernel) >>>>>>> >>>>>>> Aren't interrupt routines in some permanently-cached portion of the MMU? >>>>>> >>>>>> No, and once an MMU is involved all the paging information >>>>>> might or might not be cached. Double whammy. >>>>> >>>>> So you're telling me that Intel made a processor that, by design, could >>>>> not service interrupts in a deterministic fashion? Hard to believe. >>>>> >>>>> Is that also the case for the present-day Intel architectures? >>>> >>>> I should add that real-time operation is therefore not possible on such >>>> processors, regardless of what operating system is used. This just >>>> doesn't sound right to me... >>> >>> That depends on your requirements. Soft realtime certainly is >>> possible. For hard realtime then you will have to determine the >>> mean:max latency and "derate" the processor appropriately. >>> >>> As I noted, you needed 10:1 for the i486, and I have >>> no idea whatsoever what you need for a current Intel >>> processor. >>> >>> The problem is not confined to Intel; it *must* occur wherever >>> there are caches. After all, the whole point of caches is to >>> speed up things *on average*, so by definition there must be >>> some sequences that perform worse than average. >>> >>> Your job, for hard realtime systems, is to determine the >>> pessimal sequence :) (Optimal sequence be damned!) >> >> In most systems, various caches (data, instruction, MMU TLB) can be >> disabled or at least frequently invalidated, so you get the worst case >> performance. > >Disabling resolves the problem,Of course, this is the preferred way of doing it. However, if this is not possible,>"frequent invalidation" >merely allows you to falsely convince yourself that the >problem is resolved.If "frequent" is defined as insert "invalidate" command between every actual machine instruction, shouldn't that be enough ?>If disabled, it would be cheaper (cost, power) not to >have the cache in the first place.If the processor is just running that HRT task and nothing else, that would be the case. However, there are often other non-RT task such as (l)user interfaces that could use the processing power not needed by the HRT task.>> Hard real-time specify deadlines so that the execution time _must_ be >> in 100 % cases below a certain limit. As long as that requirement is >> met, the actual execution time could be 1 % or 99 % of that deadline >> time. The only benefit of a very low execution time is that you may >> say some power :-). > >That need proof, not assertion!Let me put some numeric values to clarify the thing. Assuming there are exactly 1000 evenly spaced interrupts each second, i.e. the interrupt must be served within 1000 us. As long as the worst case (cache disabled etc.) service time is less than 990 us, this system is OK. However, if the processing only takes 100 us on average, you do not get Brownie points for that, not at least from the HRT community. However, if the interrupt service routine average CPU usage is only 100 us (with caches enabled), the CPU usage is only 10 %, so 90 % of the CPU capacity is available for non-RT tasks, such as user interfaces.>> The only interesting thing is that the worst case execution time is >> _below_ the deadline time. > >Of course. Now /prove/ the worst case timing when caches >are operating.Are you saying that there are braindead processors that are slower when caches are enabled compared to situations in which all caches are disabled ? I guess that must be quite pathological cases :-).>If they aren't operating then they are a waste of money, power and >development time (ensuring they are all disbled).It all depends on if you have some non-RT work that can be executed in the NULL task.
Reply by ●August 7, 20142014-08-07
On Thu, 07 Aug 2014 12:53:24 +0200, David Brown <david.brown@hesbynett.no> wrote:> >Another thing to remember in all this is that you do not have to prove >that your deadlines will be reached in 100% of cases.If it is not 100 %, then it is not hard real time.>Perhaps 99.999% >is good enough, or perhaps you need 7 nines. But your task is never to >aim for "perfect" - it is to be "good enough".That is by definition soft real time.>If you can provide >statistical evidence that it is more likely for the user to be killed by >a meteorite than for a deadline to be missed, then that is often good >enough for the job. Of course you must be careful doing this sort of >thing - but there is always a balance to be struck between the >reliability of a system and the cost.If you really need HRT, you need to keep the system as simple as possible, in order to do meaningful worst case calculations.
Reply by ●August 7, 20142014-08-07
On 07/08/14 13:59, Tom Gardner wrote:> On 07/08/14 11:53, David Brown wrote: >> On 07/08/14 12:35, Tom Gardner wrote: >>> On 07/08/14 10:18, upsidedown@downunder.com wrote: >>>> On Thu, 07 Aug 2014 08:37:26 +0100, Tom Gardner >>>> <spamjunk@blueyonder.co.uk> wrote: >>>> >>>>> On 07/08/14 04:36, Randy Yates wrote: >>>>>> Randy Yates <yates@digitalsignallabs.com> writes: >>>>>> >>>>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>>>> >>>>>>>> On 06/08/14 22:31, Randy Yates wrote: >>>>>>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>>>>>> >>>>>>>>>> On 06/08/14 20:56, Jack wrote: >>>>>>>>>>> Paul Rubin <no.email@nospam.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> Rob Gaddi <rgaddi@technologyhighland.invalid> writes: >>>>>>>>>>>>> How do you guarantee microsecond level response from Python >>>>>>>>>>>>> (and I >>>>>>>>>>>>> assume Linux)? >>>>>>>>>>>> >>>>>>>>>>>> Linux has a realtime scheduler but guaranteeing microsecond >>>>>>>>>>>> response is >>>>>>>>>>>> not realistic because of nondeterministic cache misses and >>>>>>>>>>>> that sort of >>>>>>>>>>>> thing. For soft realtime maybe it's feasible. Milliseconds >>>>>>>>>>>> are easier >>>>>>>>>>>> than microseconds of course. >>>>>>>>>>> >>>>>>>>>>> or you use something like Linux RTAI that gives you hard real >>>>>>>>>>> time. >>>>>>>>>> >>>>>>>>>> .. providing, of course, the processor neither instruction nor >>>>>>>>>> data caches. If either are present then the ratio of mean:max >>>>>>>>>> latency rapidly becomes very significant. >>>>>>>>>> >>>>>>>>>> Even a 486 with its tiny caches showed a 10:1 interrupt latency >>>>>>>>>> depending on what was/wasn't in the caches. (IIRC that was >>>>>>>>>> measured >>>>>>>>>> with a tiny kernel, certainly nothing like the size/complexity >>>>>>>>>> of a linux kernel) >>>>>>>>> >>>>>>>>> Aren't interrupt routines in some permanently-cached portion of >>>>>>>>> the MMU? >>>>>>>> >>>>>>>> No, and once an MMU is involved all the paging information >>>>>>>> might or might not be cached. Double whammy. >>>>>>> >>>>>>> So you're telling me that Intel made a processor that, by design, >>>>>>> could >>>>>>> not service interrupts in a deterministic fashion? Hard to believe. >>>>>>> >>>>>>> Is that also the case for the present-day Intel architectures? >>>>>> >>>>>> I should add that real-time operation is therefore not possible on >>>>>> such >>>>>> processors, regardless of what operating system is used. This just >>>>>> doesn't sound right to me... >>>>> >>>>> That depends on your requirements. Soft realtime certainly is >>>>> possible. For hard realtime then you will have to determine the >>>>> mean:max latency and "derate" the processor appropriately. >>>>> >>>>> As I noted, you needed 10:1 for the i486, and I have >>>>> no idea whatsoever what you need for a current Intel >>>>> processor. >>>>> >>>>> The problem is not confined to Intel; it *must* occur wherever >>>>> there are caches. After all, the whole point of caches is to >>>>> speed up things *on average*, so by definition there must be >>>>> some sequences that perform worse than average. >>>>> >>>>> Your job, for hard realtime systems, is to determine the >>>>> pessimal sequence :) (Optimal sequence be damned!) >>>> >>>> In most systems, various caches (data, instruction, MMU TLB) can be >>>> disabled or at least frequently invalidated, so you get the worst case >>>> performance. >>> >>> Disabling resolves the problem, "frequent invalidation" >>> merely allows you to falsely convince yourself that the >>> problem is resolved. >>> >>> If disabled, it would be cheaper (cost, power) not to >>> have the cache in the first place. >>> >>> >>>> Hard real-time specify deadlines so that the execution time _must_ be >>>> in 100 % cases below a certain limit. As long as that requirement is >>>> met, the actual execution time could be 1 % or 99 % of that deadline >>>> time. The only benefit of a very low execution time is that you may >>>> say some power :-). >>> >>> That need proof, not assertion! >>> >>> >>>> The only interesting thing is that the worst case execution time is >>>> _below_ the deadline time. >>> >>> Of course. Now /prove/ the worst case timing when caches >>> are operating. >>> >>> If they aren't operating then they are a waste of money, power and >>> development time (ensuring they are all disbled). >>> >> >> You can usually get a pretty solid idea of the worst case cache timing, >> especially if it is write-through (with write-back, you could have many >> "dirty" lines that need written before flushing). Reading via a cache >> may mean an extra couple of clock cycles to handle matching and >> flushing, before the actual memory read. And it will typically mean >> something like 4 times as much data is read to fill the cache line even >> though you just request one read. Using such numbers, you can work out >> an absolute worst case cost for the cache if every single memory access >> is independent, scattered about memory, and causes a cache flush - say, >> all memory reads take four times as long as without cache. > > And presuming the processor manufacturer doesn't "improve" the chip, > and purchasing gets exactly the same one for the next batch, etc.True - but that's the same regardless of the chip, the cache, and anything else involved.> > >> Then you can do your deadline testing on that basis, perhaps by reducing >> the memory clock to 25% (or the whole system clock if the memory clock >> is not independent) when testing with caches disabled. > > I strongly suspect that a factor of 4 is way too optimistic; > even the i486 showed a factor of 10. /Demonstrating/ (i.e. not merely > asserting) that a factor of 1000 or 10000 or 100000 is appropriate > is extraordinarily difficult. Much easier to not have the issue > in the first place.Certainly the factor will be higher with desktop-oriented cpus. A factor of 4 is realistic for embedded microcontrollers with caches. So this method would be reasonable for a Cortex M4 + cache - but out of the question for trying to run hard real-time on a modern x86 cpu. Using a large "safety factor" does not relieve you from the task of picking an appropriate architecture for the job in hand.> > >> Another thing to remember in all this is that you do not have to prove >> that your deadlines will be reached in 100% of cases. Perhaps 99.999% >> is good enough, or perhaps you need 7 nines. But your task is never to >> aim for "perfect" - it is to be "good enough". If you can provide >> statistical evidence that it is more likely for the user to be killed by >> a meteorite than for a deadline to be missed, then that is often good >> enough for the job. Of course you must be careful doing this sort of >> thing - but there is always a balance to be struck between the >> reliability of a system and the cost. > > With hardware synchronisers you really want to ensure metastability > failure rates of 10^12 or better!And there are situations when even that is not good enough. This is one of the reasons why you pick the architecture that suits the job - for the highest determinism, you go closest to the hardware, or at least the simplest possible software.> > There is always a tension between "the best is the enemy of the good" > and "having a Ford Pinto discussion". > > I'm satisfied if people realise and understand the downsides > to caches before they make the correct decision for their requirements. > > ... and don't use "i686 etc" in the same context as "hard > realtime" :) >Agreed!
Reply by ●August 7, 20142014-08-07
David Brown <david.brown@hesbynett.no> writes:> On 07/08/14 12:35, Tom Gardner wrote: >> On 07/08/14 10:18, upsidedown@downunder.com wrote: >>> On Thu, 07 Aug 2014 08:37:26 +0100, Tom Gardner >>> <spamjunk@blueyonder.co.uk> wrote: >>> >>>> On 07/08/14 04:36, Randy Yates wrote: >>>>> Randy Yates <yates@digitalsignallabs.com> writes: >>>>> >>>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>>> >>>>>>> On 06/08/14 22:31, Randy Yates wrote: >>>>>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>>>>> >>>>>>>>> On 06/08/14 20:56, Jack wrote: >>>>>>>>>> Paul Rubin <no.email@nospam.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> Rob Gaddi <rgaddi@technologyhighland.invalid> writes: >>>>>>>>>>>> How do you guarantee microsecond level response from Python >>>>>>>>>>>> (and I >>>>>>>>>>>> assume Linux)? >>>>>>>>>>> >>>>>>>>>>> Linux has a realtime scheduler but guaranteeing microsecond >>>>>>>>>>> response is >>>>>>>>>>> not realistic because of nondeterministic cache misses and >>>>>>>>>>> that sort of >>>>>>>>>>> thing. For soft realtime maybe it's feasible. Milliseconds >>>>>>>>>>> are easier >>>>>>>>>>> than microseconds of course. >>>>>>>>>> >>>>>>>>>> or you use something like Linux RTAI that gives you hard real >>>>>>>>>> time. >>>>>>>>> >>>>>>>>> .. providing, of course, the processor neither instruction nor >>>>>>>>> data caches. If either are present then the ratio of mean:max >>>>>>>>> latency rapidly becomes very significant. >>>>>>>>> >>>>>>>>> Even a 486 with its tiny caches showed a 10:1 interrupt latency >>>>>>>>> depending on what was/wasn't in the caches. (IIRC that was measured >>>>>>>>> with a tiny kernel, certainly nothing like the size/complexity >>>>>>>>> of a linux kernel) >>>>>>>> >>>>>>>> Aren't interrupt routines in some permanently-cached portion of >>>>>>>> the MMU? >>>>>>> >>>>>>> No, and once an MMU is involved all the paging information >>>>>>> might or might not be cached. Double whammy. >>>>>> >>>>>> So you're telling me that Intel made a processor that, by design, >>>>>> could >>>>>> not service interrupts in a deterministic fashion? Hard to believe. >>>>>> >>>>>> Is that also the case for the present-day Intel architectures? >>>>> >>>>> I should add that real-time operation is therefore not possible on such >>>>> processors, regardless of what operating system is used. This just >>>>> doesn't sound right to me... >>>> >>>> That depends on your requirements. Soft realtime certainly is >>>> possible. For hard realtime then you will have to determine the >>>> mean:max latency and "derate" the processor appropriately. >>>> >>>> As I noted, you needed 10:1 for the i486, and I have >>>> no idea whatsoever what you need for a current Intel >>>> processor. >>>> >>>> The problem is not confined to Intel; it *must* occur wherever >>>> there are caches. After all, the whole point of caches is to >>>> speed up things *on average*, so by definition there must be >>>> some sequences that perform worse than average. >>>> >>>> Your job, for hard realtime systems, is to determine the >>>> pessimal sequence :) (Optimal sequence be damned!) >>> >>> In most systems, various caches (data, instruction, MMU TLB) can be >>> disabled or at least frequently invalidated, so you get the worst case >>> performance. >> >> Disabling resolves the problem, "frequent invalidation" >> merely allows you to falsely convince yourself that the >> problem is resolved. >> >> If disabled, it would be cheaper (cost, power) not to >> have the cache in the first place. >> >> >>> Hard real-time specify deadlines so that the execution time _must_ be >>> in 100 % cases below a certain limit. As long as that requirement is >>> met, the actual execution time could be 1 % or 99 % of that deadline >>> time. The only benefit of a very low execution time is that you may >>> say some power :-). >> >> That need proof, not assertion! >> >> >>> The only interesting thing is that the worst case execution time is >>> _below_ the deadline time. >> >> Of course. Now /prove/ the worst case timing when caches >> are operating. >> >> If they aren't operating then they are a waste of money, power and >> development time (ensuring they are all disbled). >> > > You can usually get a pretty solid idea of the worst case cache timing, > especially if it is write-through (with write-back, you could have many > "dirty" lines that need written before flushing). Reading via a cache > may mean an extra couple of clock cycles to handle matching and > flushing, before the actual memory read. And it will typically mean > something like 4 times as much data is read to fill the cache line even > though you just request one read. Using such numbers, you can work out > an absolute worst case cost for the cache if every single memory access > is independent, scattered about memory, and causes a cache flush - say, > all memory reads take four times as long as without cache.But is it DETERMINISTIC? Can you say WITH CERTAINTY what that time will be? Potential scenario: A system has swapped physical memory to the hard drive. How long does it take to read it back? Who the hell knows?!? 1. One mfr's hard drive is slower than another. 2. The drive may have gone to sleep and will require ~100x longer than normal. 3. The sleep-to-wakeup time of the drive is changing as it ages. 4. etc.> Then you can do your deadline testing on that basis, perhaps by reducing > the memory clock to 25% (or the whole system clock if the memory clock > is not independent) when testing with caches disabled. > > > Another thing to remember in all this is that you do not have to prove > that your deadlines will be reached in 100% of cases. Perhaps 99.999% > is good enough, or perhaps you need 7 nines. But your task is never to > aim for "perfect" - it is to be "good enough". If you can provide > statistical evidence that it is more likely for the user to be killed by > a meteorite than for a deadline to be missed, then that is often good > enough for the job. Of course you must be careful doing this sort of > thing - but there is always a balance to be struck between the > reliability of a system and the cost.I agree with the other poster that this, then, is not hard real-time. -- Randy Yates Digital Signal Labs http://www.digitalsignallabs.com







