On 14/01/15 15:41, Dimiter_Popoff wrote:
> On 14.1.2015 г. 16:05, David Brown wrote:
>> On 14/01/15 14:14, Dimiter_Popoff wrote:
>>> On 14.1.2015 г. 14:54, David Brown wrote:
>>>> On 14/01/15 13:14, Dimiter_Popoff wrote:
>>>>> On 14.1.2015 г. 13:42, Tom Gardner wrote:
>>>>>> On 14/01/15 02:11, Dimiter_Popoff wrote:
>>>>>>> So what is the guaranteed IRQ latency on your ARM core of choice
>>>>>>> running linux with some SATA drives, multiple windows, ethernet,
>>>>>>> some serial interfaces. Try to give some figure - please notice
>>>>>>> the word "guaranteed", I know how much the linux crowd prefers
>>>>>>> to talk "in general".
>>>>>>
>>>>>> Having L1/L2/L3 caches will instantly introduce a high variation
>>>>>> between the mean and max latencies. Even for i486s with their
>>>>>> minimal cache and no operating system, a 10:1 variability was
>>>>>> visible.
>>>>>
>>>>> Yes, though on some processors one has the ability to lock part of the
>>>>> L1 cache - which allows to have it dedicated to interrupts which can
>>>>> make things a lot tighter (by saving the necessity to update entire
>>>>> cachelines).
>>>>>
>>>>> Overall the latency variability obviously increases as processor
>>>>> sizes increase but then total execution times decrease, memories
>>>>> get faster etc. so the worst case latency can still be very low.
>>>>> On the 5200b which I use I have never needed to resort to any
>>>>> cache locks etc., all I do is just stay masked only as absolutely
>>>>> necessary.
>>>>>
>>>>>> Any variability to do with register saving will be completely
>>>>>> insignificant compared to the effects of caches. Unless, of
>>>>>> course, you are having to dump the entire hidden state of
>>>>>> an Itanic processor :)
>>>>>>
>>>>>
>>>>> Well we have not come to that obvious point yet I am afraid :-).
>>>>> Let us first have the figure on the worst-case linux IRQ latency
>>>>> I asked for then put into its context the try of ARM/linux
>>>>> devotees about lower latency by not having enough registers :-).
>>>>>
>>>>> Dimiter
>>>>>
>>>>
>>>> Neither you nor anyone else can give worst-case IRQ latencies for Linux
>>>> running on PPC, MIPS, ARM, x86 or anything else - there is too much
>>>> variation.
>>>
>>> This answer means it is infinite - nice figure in the context of
>>> saving a few registers, no doubt about that. Am I supposed to
>>> laugh or to cry.
>>
>> You are supposed to use something other than standard Linux (or Windows)
>> when you need hard real time. If you really need to use Linux and you
>> also really need real time, then you can use one of several real-time
>> extensions to Linux which will give you a high (compared to dedicated
>> RTOS's and more suitable hardware) but definitely not infinite maximum
>> latency.
>>
>> Of course, since you sell a real-time system which /does/ have
>> guaranteed worst-case latencies, obviously you should be laughing :-)
>>
>>>
>>> I can give a figure for DPS - and guarantee it, commercially.
>>> As an OS DPS is meanwhile no smaller than linux - just the applications
>>> written for it are much much fewer. VM, windows, filesystem, networking
>>> etc., it is all in there.
>>
>> I am sure DPS has lots of useful and important features - including
>> everything you and your customers need. But I am also sure it /is/
>> smaller than Linux (which is currently at about 17e6 lines for the
>> kernel alone) - the comparison is not useful.
>
> Oh but it is - if we compare the OS itself, not the applications.
> Meaning what you as a programmer will have as functionality via
> system calls. 17e6 lines of wasteful programming could well be
> less than mine 1.7e6 lines (not sure about the exact figure),
> hard to say. Does their kernel include the support for windows,
> offscreen buffers, graphics draw calls etc.?
>
>> Comparing to vxworks,
>> QNX, RTEMS, etc., would make more sense.
>
> Do these come with all the features like windows, VM, filesystem,
> networking?
I am not sure that this is the best place to give a beginners course on
Linux, QNX, RTEMS, or operating systems in general. Obviously you have
vast experience with DPS - yet your questions show a lack of knowledge
of how these sorts of OS's are built up and structured. I can't tell if
you really know so little about what Linux is, and what an OS kernel is,
or if you are being intentionally naïve - I have no wish to sound
patronising and write about things you have worked with every day for
twenty years, but equally I am happy to explain things if it is helpful
to you. Can I just say you should read the Wikipedia articles plus each
project's home page, and if we need to go further then we'll take it
from there?
>
>>> And I do have a figure for the latency.
>>> So this figure for linux is infinity?
>>>
>>
>> Unless you have calculated it, or at least measured it to a desired
>> statistical level of accuracy, then by the definition of "worst case",
>> it is infinite. (You might prefer to say "real time" requires
>> calculation, not just measurement - but that gets increasingly difficult
>> for more complex systems. If your tests suggest that missing a timing
>> deadline is statistically less likely than being struck by lightning,
>> that is often good enough.)
>
> Measuring is OK, calculating is not just difficult, it can be outright
> impractical nowadays. One should do it to get a ballpark figure what to
> expect then measure it - over a long enough time the worst case response
> is not so hard to measure, provided you know what is going on.
>
Agreed.
>> One report I found with Google is for an 800 MHz Cortex A8 chip with
>> kernel 2.6.31, testing with and without the "real time" patch (this is
>> not a "real-time extension" to Linux, which work in a different way -
>> basically the "real-time patch" sacrifices total throughput but allows
>> most system calls and functions to be pre-emptable). Without the
>> "real-time patch", maximum measured latencies were 2465 us - with the
>> patch, the maximum measured latency was 58 us.
>
> Well 58uS is still OK, only about 5 times (or is it 10 times, I am
> not sure whether the 10 uS figure was not on a 200 MHz machine)
> worse than DPS at a 400 MHz power (mpc5200b). The question why
> is this real time patch not universally applied remains of course,
> how much of the functionality do they have to sacrifice if they
> use it.
In Linux, interrupts get passed on to kernel interrupt threads, and thus
involve a (limited) context switch. That is always going to be more
costly than handling the interrupt directly, but allows the interrupt
code more access to kernel functions.
As far as I understand the RT patch, there are two issues regarding
universal application in the kernel. One is that improving worst-case
response times means minimising the size of critical sections with
interrupts disabled. The other is that much more of the kernel is
preemptable and re-entrant, and uses finer grain locking. So code that
used to be "get lock, do A, B, C, release lock" might be changed to "get
lock, do A, release lock, do B, get lock, do C, release lock". The
locked (or interrupt disabled) sections are shorter, but total
throughput is reduced as there is more overhead in the locking. In
particular, I gather than most spin-locks (which are very fast at taking
a free lock) are replaced by mutexes with priority inheritance.
Certainly some aspects have moved into the main kernel - modern Linux
kernels have a lot finer grained locking than older ones, which tended
to use the "big kernel lock" a great deal. The main motivation here is
for SMP systems - when Linux systems were generally on one core, a
single "large" lock was okay, but with multiple cores it gets very
inefficient.
Other aspects are configurable (as are many things in Linux) - you often
want a different balance between throughput and response times for
server systems, desktops, and embedded systems.
>
> I asked for this figure only to put into its context the claim about
> the "need" to save all 32 registers. So let us see - saving 16
> registers more to say the slower of the two DDRAM-s, the one on the
> 400 MHz 5200b (assuming a complete cache miss), 133 MHz clocked
> DDRAM, which does something like a 10nS per .l IIRC on average
> will save 160 nS from the 58uS.
Register sizes are not relevant in this context (which is why people
can't understand your jump to Linux) - clearly the number of registers
saved is going to be a drop in the ocean when you are talking about big
cpus running big OS's, rather than microcontrollers running bare-bones
or dedicated OS's (and we have long ago established that saved register
counts is usually, but not always, negligible in those systems too).
Register save sizes is relevant when it is useful to have a response
time of 12 cycles rather than 30 cycles - it is not an issue when the
response time is 5000 cycles!
> I think we all can only laugh here. The funnier thing of course is
> that there is no justified necessity to waste these 160nS - but I
> can understand the programmer who may have wasted them, why would
> he bother - it would be just a waste of his time to chase nanoseconds
> when the system stays masked for tens of microseconds. I would
> not have bothered.
>
Yes indeed - premature optimisation is the root of all evil, after all.