EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Linux question -- how to tell if serial port in /dev is for real?

Started by Tim Wescott August 4, 2014
On 08/08/14 23:08, Niklas Holsti wrote:
> On 14-08-09 00:50 , Tom Gardner wrote: >> I've only pointed out that, in order to have hard realtime >> operation it is /necessary/ to avoid caches. > > No, static WCET analysis works well for several forms of caches. Airbus > jets use cached processors in their flight control systems, with WCET > analysis tools.
Interesting. Do these tools operate at one or all levels of the "stack", i.e. at the instruction/data/memory/cache/multiprocessor levels up to the effects of any software FIFOs used to communicate between tasks/processors?
> But as has been said repeatedly in this thread, modern PCs have many > other sources of hard-to-predict execution time. Some may be amenable to > static WCET analysis, but I don't know of any off-the-shelf tools for it.
Static analysis in other areas has been shown to have significant limitations. I would presume, without evidence, that there is no reason to believe they will be more successful in this area.
>>> There is no practical way to give a 100% guaranteed response time on a >>> modern PC even when using a RTOS due to the complexity and the number >>> of unknowns in a PC. Derating by the observed worst case >>> timing plus a significant margin can be an option if meeting the >>> deadlines 99.9999% of the time is acceptable. >> >> Quite right. >> >> The only difficulty is in adequately demonstrating that your >> chosen derating factor is sufficient to satisfy your objectives. > > Some of the people working on "probabilistic WCET analysis" claim that > the mathematical tools of "extreme-value statistics" can provide that > demonstration. I am not convinced, but I may be wrong.
I wonder what presumptions they make for their extrapolations, and whether the extrapolations are provably valid. The engineering field is littered with academics that presumed normal distributions only to discover that the real world is non-linear. Classic example is "rogue waves"; ships used to be designed to cope with "100 year waves" - but then real-world measurements indicated far more 100 years waves than are theoretically possible. Yes, the sailor old wives tales of rogue waves were dismissed as mere "bar stories" by academics, but now the stories are shown to be accurate. Alternatively I've come across many salesman claiming 99.999% uptime (i.e. downtime ~5mins/year) - but no salesman being able to back up that claim with hard evidence.
On Fri, 08 Aug 2014 23:11:34 +0200, Dombo <dombo@disposable.invalid>
wrote:

>Op 07-Aug-14 12:35, Tom Gardner schreef: >> On 07/08/14 10:18, upsidedown@downunder.com wrote: >>> On Thu, 07 Aug 2014 08:37:26 +0100, Tom Gardner >>> <spamjunk@blueyonder.co.uk> wrote: >>> >>>> On 07/08/14 04:36, Randy Yates wrote: >>>>> Randy Yates <yates@digitalsignallabs.com> writes: >>>>> >>>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>>> >>>>>>> On 06/08/14 22:31, Randy Yates wrote: >>>>>>>> Tom Gardner <spamjunk@blueyonder.co.uk> writes: >>>>>>>> >>>>>>>>> On 06/08/14 20:56, Jack wrote: >>>>>>>>>> Paul Rubin <no.email@nospam.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> Rob Gaddi <rgaddi@technologyhighland.invalid> writes: >>>>>>>>>>>> How do you guarantee microsecond level response from Python >>>>>>>>>>>> (and I >>>>>>>>>>>> assume Linux)? >>>>>>>>>>> >>>>>>>>>>> Linux has a realtime scheduler but guaranteeing microsecond >>>>>>>>>>> response is >>>>>>>>>>> not realistic because of nondeterministic cache misses and >>>>>>>>>>> that sort of >>>>>>>>>>> thing. For soft realtime maybe it's feasible. Milliseconds >>>>>>>>>>> are easier >>>>>>>>>>> than microseconds of course. >>>>>>>>>> >>>>>>>>>> or you use something like Linux RTAI that gives you hard real >>>>>>>>>> time. >>>>>>>>> >>>>>>>>> .. providing, of course, the processor neither instruction nor >>>>>>>>> data caches. If either are present then the ratio of mean:max >>>>>>>>> latency rapidly becomes very significant. >>>>>>>>> >>>>>>>>> Even a 486 with its tiny caches showed a 10:1 interrupt latency >>>>>>>>> depending on what was/wasn't in the caches. (IIRC that was measured >>>>>>>>> with a tiny kernel, certainly nothing like the size/complexity >>>>>>>>> of a linux kernel) >>>>>>>> >>>>>>>> Aren't interrupt routines in some permanently-cached portion of >>>>>>>> the MMU? >>>>>>> No, and once an MMU is involved all the paging information >>>>>>> might or might not be cached. Double whammy. > >On Windows interrupt routines themselves must be located in non-paged >memory, but don't have to be present in the cache.
And since Windows doesn't page the page tables, that whole problem is avoided (Windows would have to lock the page tables pages necessary for the kernel if those were pageable). But you can still get multiple cache misses walking the page tables.
On Sat, 09 Aug 2014 00:59:32 +0300, Niklas Holsti
<niklas.holsti@tidorum.invalid> wrote:

>> Usually the main memory (or at least the memory interface bandwidth) >> is very slow compared to cache and processor cycles. If dynamic RAM is >> used, loading a cache line would typically mean >> 1 x RAS cycle + n x CAS cycles. Depending of memory bus width and >> hence the size of "n", this will take a while. By pessimistically >> assuming that any memory byte access would cause a full DRAM cycle, >> you should be on the safe side, compared to any speculative execution >> issues. > >Ok, but then the analysis assumes 100% cache miss rate, which can give a >hugely overestimated and probably useless WCET bound.
Surely this gives overestimated values and hence demand too expensive hardware. However, this assumes that all work is done in one (or more) hard-RT task(s), in which case you must ensure that the CPU load never approaches 100 % but in normal operation might be something between 10 % and 50 % on average. With a more sensible division of labor helps a lot between: * Interrupt service routines * Hard-RT task(s) * One or more soft-RT tasks * Optionally some non-RT tasks * A NULL task It does not matter if the HRT task momentarily use up to 90 % of the CPU time starving the soft-RT and non-RT tasks for a few hundred microseconds or even a few milliseconds (at least in non-RT tasks). These low priority tasks have on average 50 % or even 90 % of the CPU power available. Granted, you have to allow for the pre-emption delay, when the HRT needs to be started. Processors with a lot of internal registers will require a lot of time to save the context. It is also critical, how this NULL task is implemented, if it is doing some real job, like burning CPU cycles and blinking a LED, the context save requirement in HRT+NULL combination is the same as in HRT+SRT+nRT+NULL case. However, if the processor has something like a low power wait for interrupt instruction which saves the whole processor context before entering that wait, then the HRT+NULL is faster, since no more context saves needs to be done, when the next interrupt arrives and potentially activates the HRT task.
On 07.8.2014 &#1075;. 15:33, upsidedown@downunder.com wrote:>> ....
>> Of course. Now /prove/ the worst case timing when caches >> are operating. > > Are you saying that there are braindead processors that are slower > when caches are enabled compared to situations in which all caches are > disabled ? I guess that must be quite pathological cases :-).
ROFL, that would be pathological indeed. Once I burned the CPU of a netmca board (CPU being an mpc5200b) very subtly [that is the damage was subtle, my toying with the power on/off behaviour of the power supply while connected to all the circuitry less so ...]. The system would boot pretty far, begin some disk I/O and then fail. I am a stubborn person and investigated for a few days until I found out some write to a cacheline would not write correctly or something like that (years ago). So I tried booting with the cache disabled - to my surprise, it did work - and was many many times slower, 10+ times perhaps, generally unusable. Now back to IRQ latency. Of course it can be predicted on systems with a cache, e.g. the above mentioned system manages a few uS with no sweat at all, disk I/O, windows refresh, data acquisition etc. all running. The DDRAM is clocked at 133MHz; assuming each cacheline access will be a miss _and_ will take a line update and invalidation prior to loading the needed data is still in the nS range (15 cycles or so for 32 bytes). Then once the cacheline is loaded - 8 longwords - there will be subsequent hit accesses so behaviour will still be much faster than what it would be with the cache altogether disabled. This of course implies proper system layout from the beginning, e.g. you cannot afford to have page fault exceptions in an IRQ handler, processing these takes too long so it must run unmasked. BAT translation and OS support take care of this. Also on some power chips (like the one mentioned) one can lock part of the cache which the IRQ handler will use and have always a hit - but I have never had to resort to that much sweat. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On 08/08/14 17:04, Randy Yates wrote:
> upsidedown@downunder.com writes: > >> On Fri, 08 Aug 2014 01:09:06 -0700, Paul Rubin >> <no.email@nospam.invalid> wrote: >> >>> Niklas Holsti <niklas.holsti@tidorum.invalid> writes: >>>> This is attempted by static WCET (Worst-Case Execution-Time) analysis >>>> tools such as aiT from AbsInt (www.absint.com). >>>> Works IMO pretty well for instruction caches, less so for data caches >>> >>> We're talking about Linux, which means there's not just caches, but also >>> an MMU, preemptive multitasking, etc. I think microsecond HRT in this >>> environment is simply not on the menu. The Beaglebone Black has a pair >>> of realtime coprocessors built into the main CPU chip because of that. >> >> Most RT extensions are actually true RT kernels and you can put Linux, >> Windows etc. desktop operating systems into the NULL task to consume >> CPU cycles not needed by RT tasks. > > My first thought on this was, "Yeah! That's a cool way to crack this > nut." But what about the tasks in the NULL task (i.e., kernel tasks) > that disable interrupts? One of the requirements for hard real-time is > that there is an application-specific limit on the maximum time > interrupts can be disabled. >
There you have a difference with RT extensions to Windows and Linux. With Windows, the Windows kernel is the same as always - and the RT kernel has to deal with whatever nonsense it does such as disabling interrupts. With modern x86 cpus with virtualisation features, that's not too bad - it just means a bit of extra overhead. But it used to be a significant problem, and meant that Windows could affect the functionality of the RT tasks. With Linux, you just re-compile the kernel with a replacement for the interrupt disable macros that disables them within the Linux task, but leaves the hardware interrupts enabled for the RT kernel to use.
Op 09-Aug-14 0:08, Niklas Holsti schreef:
> On 14-08-09 00:50 , Tom Gardner wrote:
>> I've only pointed out that, in order to have hard realtime >> operation it is /necessary/ to avoid caches. > > No, static WCET analysis works well for several forms of caches. Airbus > jets use cached processors in their flight control systems, with WCET > analysis tools.
Back in the nineties I wrote software for a MIPS processor which caches had to be manually managed; the software had to tell which memory range to cache. The ability to lock down cache lines makes it possible to eliminate one source of non-determinism without sacrificing performance (at least for the time critical parts).
> But as has been said repeatedly in this thread, modern PCs have many > other sources of hard-to-predict execution time. Some may be amenable to > static WCET analysis, but I don't know of any off-the-shelf tools for it.
For such a tool to produce meaningful figures it must exactly know what components are in the PC, how these are configured, their implementation (specifications tend to be too inaccurate to accurately predict behavior) and how they interact with each other. Having been involved with development PC peripherals myself (admittedly a long time ago), my experience is that there are very few things you can safely assume about PC's. Hence if such a tool exists for PC's I would have very little confidence in it.
On 14-08-09 02:09 , Tom Gardner wrote:
> On 08/08/14 23:08, Niklas Holsti wrote: >> On 14-08-09 00:50 , Tom Gardner wrote: >>> I've only pointed out that, in order to have hard realtime >>> operation it is /necessary/ to avoid caches. >> >> No, static WCET analysis works well for several forms of caches. Airbus >> jets use cached processors in their flight control systems, with WCET >> analysis tools. > > Interesting. > > Do these tools operate at one or all levels of the "stack", i.e. at the > instruction/data/memory/cache/multiprocessor levels up to the effects > of any software FIFOs used to communicate between tasks/processors?
The typical or classical WCET tool analyses a single task or thread at a time and computes a WCET bound for the task assuming no interrupts or preemptions, but using a very precise model of the processor and its timing behaviour. Then a schedulability analysis tool such as MAST (http://mast.unican.es/) or SymTA/S (http://www.symtavision.com/) is used to find bounds on response times and end-to-end transactions, also for networks of tasks and processors. As has already been said, the effect of interrupts and preemptions on the state of caches and other stateful accelerator HW is a sore point in this traditonal two-stage analysis. The present solutions are far from ideal but can work. WCET analysis for multi-core systems is the subject of much academic research and experimentation, for example in the TACLe network (http://www.cost.eu/domains_actions/ict/Actions/IC1202). Some results have been reported, for example in the WCET Analysis Workshop series (http://www.uni-ulm.de/en/in/wcet2014/program.html) but I don't know of any commercial tool with multicore capability. XMOS have a WCET-analysis tool for the their multicores, but AFAIK it analyses the program in one core at a time.
>> But as has been said repeatedly in this thread, modern PCs have many >> other sources of hard-to-predict execution time. Some may be amenable to >> static WCET analysis, but I don't know of any off-the-shelf tools for it. > > Static analysis in other areas has been shown to have > significant limitations.
It solves some problems in some contexts, typically for critical systems where less exact methods are not as trusted and where SW developers are willing to constrain SW designs to make the analysis work.
> I would presume, without evidence, > that there is no reason to believe they will be more successful > in this area.
Another difficulty is that a traditional static-analysis WCET tool is built on exact knowledge of the processor architecture. For PC processors such knowledge is generally not published. However, there is a very interesting new processor architecture called "the Mill" being developed by Mill Computing (formerly known as Out-of-the-Box Computing), http://millcomputing.com/. The Mill is a high-performance processor architecture intended for general computing (including desk-top apps) but it is statically scheduled and in fact the way it stores and uses intermediate results depends on the statically known latencies of each operation. There is a chance that this architecture will be amenable to static WCET analysis (necessarily including analysis of Mill caches and some more special Mill features).
>>>> There is no practical way to give a 100% guaranteed response time on a >>>> modern PC even when using a RTOS due to the complexity and the number >>>> of unknowns in a PC. Derating by the observed worst case >>>> timing plus a significant margin can be an option if meeting the >>>> deadlines 99.9999% of the time is acceptable. >>> >>> Quite right. >>> >>> The only difficulty is in adequately demonstrating that your >>> chosen derating factor is sufficient to satisfy your objectives. >> >> Some of the people working on "probabilistic WCET analysis" claim that >> the mathematical tools of "extreme-value statistics" can provide that >> demonstration. I am not convinced, but I may be wrong. > > I wonder what presumptions they make for their extrapolations, > and whether the extrapolations are provably valid.
Indeed. They use a theorem similar to the Central Limit Theorem, saying that if a number of stochastic variables are independent, then the distribution of their maximum or minimum converges to a normal form as the number of variables goes to infinity. AIUI, the value distributions of the variables are not important; what is important is their independence, and the limit at infinity. See http://en.wikipedia.org/wiki/Extreme_value_theory. The early work that tried to apply extreme-value statistics to WCET analysis was IMO weakened by suspect independence assumptions. The present trend seems to be to force the HW to behave randomly -- for example, caches which choose eviction victims randomly -- and therefore ensure independence by construction. See, for example, http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=4123. I don't like the approach, and have not tracked the work closely. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On 14-08-09 07:48 , upsidedown@downunder.com wrote:
> On Sat, 09 Aug 2014 00:59:32 +0300, Niklas Holsti > <niklas.holsti@tidorum.invalid> wrote: > >>> Usually the main memory (or at least the memory interface bandwidth) >>> is very slow compared to cache and processor cycles. If dynamic RAM is >>> used, loading a cache line would typically mean >>> 1 x RAS cycle + n x CAS cycles. Depending of memory bus width and >>> hence the size of "n", this will take a while. By pessimistically >>> assuming that any memory byte access would cause a full DRAM cycle, >>> you should be on the safe side, compared to any speculative execution >>> issues. >> >> Ok, but then the analysis assumes 100% cache miss rate, which can give a >> hugely overestimated and probably useless WCET bound. > > Surely this gives overestimated values and hence demand too expensive > hardware. However, this assumes that all work is done in one (or more) > hard-RT task(s), in which case you must ensure that the CPU load never > approaches 100 % but in normal operation might be something between 10 > % and 50 % on average. > > With a more sensible division of labor helps a lot between:
Such a "sensible" division of labor is not always possible. It depends on the application.
> * Interrupt service routines > * Hard-RT task(s) > * One or more soft-RT tasks > * Optionally some non-RT tasks > * A NULL task > > It does not matter if the HRT task momentarily use up to 90 % of the > CPU time starving the soft-RT and non-RT tasks for a few hundred > microseconds or even a few milliseconds (at least in non-RT tasks). > These low priority tasks have on average 50 % or even 90 % of the CPU > power available.
Yes, sometimes one can soak up excess CPU power in non-HRT tasks. The question is how valuable those non-HRT tasks are for the application (i.e. the user), and if this value is high enough to pay for over-dimensioning the processor.
> Granted, you have to allow for the pre-emption delay, when the HRT > needs to be started. Processors with a lot of internal registers will > require a lot of time to save the context. > > It is also critical, how this NULL task is implemented, if it is doing > some real job, like burning CPU cycles and blinking a LED, the context > save requirement in HRT+NULL combination is the same as in > HRT+SRT+nRT+NULL case. > > However, if the processor has something like a low power wait for > interrupt instruction which saves the whole processor context before > entering that wait, then the HRT+NULL is faster, since no more context > saves needs to be done, when the next interrupt arrives and > potentially activates the HRT task.
An anecdote about low-power waits (not relevant to HRT): A space application I did SW work for, a decade ago, had a requirement to put the processor (an 80C32) in a low-power wait between task activations. When the system was assembled and tested, the processor's variable power drain was found to couple into the low-voltage analog part, which messed up the measurements. A NOP loop gave much better results -- stable power drain, no analog interference. Perhaps the board could have provided more decoupling... -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On 8/10/14, 3:18 PM, Niklas Holsti wrote:

> An anecdote about low-power waits (not relevant to HRT): A space > application I did SW work for, a decade ago, had a requirement to put > the processor (an 80C32) in a low-power wait between task activations. > When the system was assembled and tested, the processor's variable power > drain was found to couple into the low-voltage analog part, which messed > up the measurements. A NOP loop gave much better results -- stable power > drain, no analog interference. Perhaps the board could have provided > more decoupling... >
We had a similar problem. Unit powered by a single AA battery, some analog power electronics drawing directly off the battery, processor was put into low power sleep mode in the idle task, to wait for an interrupt with something to do. We had a problem with a low level, but noticeable, injection of noise in the analog output at the frequency of the system tick interrupt. The changing processor load was modulating the battery voltage, and the analog system didn't have enough power supply rejection.
On Sun, 10 Aug 2014 22:18:48 +0300, Niklas Holsti
<niklas.holsti@tidorum.invalid> wrote:

> >An anecdote about low-power waits (not relevant to HRT): A space >application I did SW work for, a decade ago, had a requirement to put >the processor (an 80C32) in a low-power wait between task activations. >When the system was assembled and tested, the processor's variable power >drain was found to couple into the low-voltage analog part, which messed >up the measurements. A NOP loop gave much better results -- stable power >drain, no analog interference. Perhaps the board could have provided >more decoupling...
A similar situation occurred with various Windows versions in mid 1990's. Windows NT 3.51 had always required quite good quality motherboards, so it had not problems when the low power NULL task with low power wait started execution. With MS-DOS and Windows 3.x quite bad quality motherboards were acceptable, however, when trying to use them with Windows 95 (originally with a low power wait state), the system crashed when exiting the wait state, due the current peak drawn. With too few bypass capacitors, there was a big voltage swing, causing the processor to malfunction. To combat this, the distribution version Win95 NULL task consisted of a busy loop and the processor dissipated the same power, regardless of amount of true work and the voltage swing remained within limit even for bad motherboards. Later on with better and more stable motherboards, the was a market for "power saver" applications, which consisted of a low power wait loop and the task was execution at a low priority, just above the busy loop real NULL task priority. When there was no application program running, the power saver application was constantly running the low power wait loop, preventing the execution of the real busy loop NULL task.
The 2026 Embedded Online Conference