On 14/01/15 12:14, Dimiter_Popoff wrote:> On 14.1.2015 г. 13:42, Tom Gardner wrote: >> On 14/01/15 02:11, Dimiter_Popoff wrote: >>> So what is the guaranteed IRQ latency on your ARM core of choice >>> running linux with some SATA drives, multiple windows, ethernet, >>> some serial interfaces. Try to give some figure - please notice >>> the word "guaranteed", I know how much the linux crowd prefers >>> to talk "in general". >> >> Having L1/L2/L3 caches will instantly introduce a high variation >> between the mean and max latencies. Even for i486s with their >> minimal cache and no operating system, a 10:1 variability was >> visible. > > Yes, though on some processors one has the ability to lock part of the > L1 cache - which allows to have it dedicated to interrupts which can > make things a lot tighter (by saving the necessity to update entire > cachelines). > > Overall the latency variability obviously increases as processor > sizes increase but then total execution times decrease, memories...*mean* total... But you know that!> get faster etc. so the worst case latency can still be very low.It can be very difficult to /measure/ the maximum latency of even main processing loops, let alone interrupts. Calculation of maximum times is only possible on the XMOS processors AFAIK.> On the 5200b which I use I have never needed to resort to any > cache locks etc., all I do is just stay masked only as absolutely > necessary. > >> Any variability to do with register saving will be completely >> insignificant compared to the effects of caches. Unless, of >> course, you are having to dump the entire hidden state of >> an Itanic processor :) >> > > Well we have not come to that obvious point yet I am afraid :-). > Let us first have the figure on the worst-case linux IRQ latency > I asked for then put into its context the try of ARM/linux > devotees about lower latency by not having enough registers :-).Well, the ARM (embedded with an FPGA) I'm about to start using is dual core each L1 32+32K, 512K L2, and then 256K RAM. Maybe I'll do some serious timing, but the hard realtime stuff will be in the FPGA.
Integrated TFT controller in PIC MCUs
Started by ●January 7, 2015
Reply by ●January 14, 20152015-01-14
Reply by ●January 14, 20152015-01-14
On 14/01/15 13:14, Dimiter_Popoff wrote:> On 14.1.2015 г. 13:42, Tom Gardner wrote: >> On 14/01/15 02:11, Dimiter_Popoff wrote: >>> So what is the guaranteed IRQ latency on your ARM core of choice >>> running linux with some SATA drives, multiple windows, ethernet, >>> some serial interfaces. Try to give some figure - please notice >>> the word "guaranteed", I know how much the linux crowd prefers >>> to talk "in general". >> >> Having L1/L2/L3 caches will instantly introduce a high variation >> between the mean and max latencies. Even for i486s with their >> minimal cache and no operating system, a 10:1 variability was >> visible. > > Yes, though on some processors one has the ability to lock part of the > L1 cache - which allows to have it dedicated to interrupts which can > make things a lot tighter (by saving the necessity to update entire > cachelines). > > Overall the latency variability obviously increases as processor > sizes increase but then total execution times decrease, memories > get faster etc. so the worst case latency can still be very low. > On the 5200b which I use I have never needed to resort to any > cache locks etc., all I do is just stay masked only as absolutely > necessary. > >> Any variability to do with register saving will be completely >> insignificant compared to the effects of caches. Unless, of >> course, you are having to dump the entire hidden state of >> an Itanic processor :) >> > > Well we have not come to that obvious point yet I am afraid :-). > Let us first have the figure on the worst-case linux IRQ latency > I asked for then put into its context the try of ARM/linux > devotees about lower latency by not having enough registers :-). > > Dimiter >Neither you nor anyone else can give worst-case IRQ latencies for Linux running on PPC, MIPS, ARM, x86 or anything else - there is too much variation. It is a rare system that can can give any useful worst-case IRQ latencies on /any/ software on processors running at many hundreds of MHz, with multi-layer caches, heavy pipelines, MMUs, etc. When you take into account all the possible issues your true worst-case IRQ latency can be enormous, measured in thousands of clock cycles, and orders of magnitude greater than the realistic average latencies. That's why such systems are great for throughput, but poor for real-time systems. This is comp.arch.embedded. While some people here use embedded Linux, the majority do not - Cortex M3 cores running FreeRTOS or no OS is far more common than Cortex A9 cores running Linux. So for a real comparison, an M3 is ready for user interrupt code (all volatile registers stacked, ready for a non-trivial handler ) in 12 cycles. On the 180 MHz PPC microcontroller I used, I'd guess (I haven't measured, and don't intend to measure) a dozen cycles for the interrupt vectoring and pipeline flushing, then 20 instructions to save the interrupt registers and volatile registers - taking more than 20 cycles because of the instruction fetch times. If you are maximally unlucky with the caching, it will take perhaps twice that. The chip with the smaller register set and dedicated interrupt hardware reacts faster, with lower variation, and puts you directly into the user code. The bigger and more complex chip has longer delays, more variation, and requires more user code. And in this case, the faster clock speed of the PPC device does not outweigh the higher clock cycle count in interrupt handling.
Reply by ●January 14, 20152015-01-14
On 2015-01-13, Dimiter_Popoff <dp@tgi-sci.com> wrote:> On 13.1.2015 ?. 22:00, Simon Clubley wrote: >> On 2015-01-13, Dimiter_Popoff <dp@tgi-sci.com> wrote: >>> >>> This is where the next few tons of ink will have to go apparently. >>> What on Earth makes you think having 32 registers rather than 15 >>> makes you have more volatile registers. >> >> Because it depends on the ABI in use. > > This is at least the third time I explain this to you but I don't > mind, I'll do it as many times as it takes: there are many ways > to destroy something working other than inept programming, some > of them much easier. > > So what is the guaranteed IRQ latency on your ARM core of choice > running linux with some SATA drives, multiple windows, ethernet, > some serial interfaces. Try to give some figure - please notice > the word "guaranteed", I know how much the linux crowd prefers > to talk "in general". >If I needed to meet guaranteed timing schedules, I wouldn't be using Linux to try and achieve them - it simply hasn't been designed for that. I would use a RTOS or maybe even push the hard realtime part of the problem onto it's own bare metal board if the constraints were too tight for even a RTOS. Note that even in the case of the RTOS, your drivers are still generally written in a HLL these days so the RTOS will still push the caller saves registers even if your driver doesn't use them because the RTOS has to assume the HLL compiler could potentially use all the caller saves registers in the ABI that it's allowed to. I don't understand your fixation on the number of registers pushed; pushing a few extra registers is a _very_ small price to pay for all the advantages of being able to write drivers and other code in a HLL. Note that even when writing HLL code to run on bare metal, the compiler still has to generate code against an ABI and hence follow the ABI's rules unless you modify the compiler to use your own custom ABI. Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
Reply by ●January 14, 20152015-01-14
On 14.1.2015 г. 14:54, David Brown wrote:> On 14/01/15 13:14, Dimiter_Popoff wrote: >> On 14.1.2015 г. 13:42, Tom Gardner wrote: >>> On 14/01/15 02:11, Dimiter_Popoff wrote: >>>> So what is the guaranteed IRQ latency on your ARM core of choice >>>> running linux with some SATA drives, multiple windows, ethernet, >>>> some serial interfaces. Try to give some figure - please notice >>>> the word "guaranteed", I know how much the linux crowd prefers >>>> to talk "in general". >>> >>> Having L1/L2/L3 caches will instantly introduce a high variation >>> between the mean and max latencies. Even for i486s with their >>> minimal cache and no operating system, a 10:1 variability was >>> visible. >> >> Yes, though on some processors one has the ability to lock part of the >> L1 cache - which allows to have it dedicated to interrupts which can >> make things a lot tighter (by saving the necessity to update entire >> cachelines). >> >> Overall the latency variability obviously increases as processor >> sizes increase but then total execution times decrease, memories >> get faster etc. so the worst case latency can still be very low. >> On the 5200b which I use I have never needed to resort to any >> cache locks etc., all I do is just stay masked only as absolutely >> necessary. >> >>> Any variability to do with register saving will be completely >>> insignificant compared to the effects of caches. Unless, of >>> course, you are having to dump the entire hidden state of >>> an Itanic processor :) >>> >> >> Well we have not come to that obvious point yet I am afraid :-). >> Let us first have the figure on the worst-case linux IRQ latency >> I asked for then put into its context the try of ARM/linux >> devotees about lower latency by not having enough registers :-). >> >> Dimiter >> > > Neither you nor anyone else can give worst-case IRQ latencies for Linux > running on PPC, MIPS, ARM, x86 or anything else - there is too much > variation.This answer means it is infinite - nice figure in the context of saving a few registers, no doubt about that. Am I supposed to laugh or to cry. I can give a figure for DPS - and guarantee it, commercially. As an OS DPS is meanwhile no smaller than linux - just the applications written for it are much much fewer. VM, windows, filesystem, networking etc., it is all in there. And I do have a figure for the latency. So this figure for linux is infinity? Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
Reply by ●January 14, 20152015-01-14
On 14.1.2015 г. 15:06, Simon Clubley wrote:> On 2015-01-13, Dimiter_Popoff <dp@tgi-sci.com> wrote: >> On 13.1.2015 ?. 22:00, Simon Clubley wrote: >>> On 2015-01-13, Dimiter_Popoff <dp@tgi-sci.com> wrote: >>>> >>>> This is where the next few tons of ink will have to go apparently. >>>> What on Earth makes you think having 32 registers rather than 15 >>>> makes you have more volatile registers. >>> >>> Because it depends on the ABI in use. >> >> This is at least the third time I explain this to you but I don't >> mind, I'll do it as many times as it takes: there are many ways >> to destroy something working other than inept programming, some >> of them much easier. >> >> So what is the guaranteed IRQ latency on your ARM core of choice >> running linux with some SATA drives, multiple windows, ethernet, >> some serial interfaces. Try to give some figure - please notice >> the word "guaranteed", I know how much the linux crowd prefers >> to talk "in general". >> > > If I needed to meet guaranteed timing schedules, I wouldn't be using > Linux to try and achieve them - it simply hasn't been designed for that.So your answer is "too huge to even look up the exact figure", fairly similar to the "infinite" David gave.> > I don't understand your fixation on the number of registers pushed; > pushing a few extra registers is a _very_ small price to pay for all > the advantages of being able to write drivers and other code in a HLL.Oh but this is your fixation, not mine. You argued that ARM is at an advantage because it does not have 32 registers but only 15 and put that in the linux context by talking all that ABI and whatever abbreviation gibberish the linux crowd constantly invents to mask the mess they live in. My point was - still is - that ARM is a crippled load/store machine because it has too few registers to be a viable (i.e. pipelined) one. You (and a few others) wrote tons of irrelevant nonsense about saving registers, latency etc. - clearly talking without knowing what you are talking about.> Note that even when writing HLL code to run on bare metal, the compiler > still has to generate code against an ABI and hence follow the ABI's > rules unless you modify the compiler to use your own custom ABI.Oh this will be the fourth or the fifth time I have to explain this to you: there are easier ways to destroy something working than inept programming, a hammer or even a piece of rock will do as nicely. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
Reply by ●January 14, 20152015-01-14
On 14/01/15 14:14, Dimiter_Popoff wrote:> On 14.1.2015 г. 14:54, David Brown wrote: >> On 14/01/15 13:14, Dimiter_Popoff wrote: >>> On 14.1.2015 г. 13:42, Tom Gardner wrote: >>>> On 14/01/15 02:11, Dimiter_Popoff wrote: >>>>> So what is the guaranteed IRQ latency on your ARM core of choice >>>>> running linux with some SATA drives, multiple windows, ethernet, >>>>> some serial interfaces. Try to give some figure - please notice >>>>> the word "guaranteed", I know how much the linux crowd prefers >>>>> to talk "in general". >>>> >>>> Having L1/L2/L3 caches will instantly introduce a high variation >>>> between the mean and max latencies. Even for i486s with their >>>> minimal cache and no operating system, a 10:1 variability was >>>> visible. >>> >>> Yes, though on some processors one has the ability to lock part of the >>> L1 cache - which allows to have it dedicated to interrupts which can >>> make things a lot tighter (by saving the necessity to update entire >>> cachelines). >>> >>> Overall the latency variability obviously increases as processor >>> sizes increase but then total execution times decrease, memories >>> get faster etc. so the worst case latency can still be very low. >>> On the 5200b which I use I have never needed to resort to any >>> cache locks etc., all I do is just stay masked only as absolutely >>> necessary. >>> >>>> Any variability to do with register saving will be completely >>>> insignificant compared to the effects of caches. Unless, of >>>> course, you are having to dump the entire hidden state of >>>> an Itanic processor :) >>>> >>> >>> Well we have not come to that obvious point yet I am afraid :-). >>> Let us first have the figure on the worst-case linux IRQ latency >>> I asked for then put into its context the try of ARM/linux >>> devotees about lower latency by not having enough registers :-). >>> >>> Dimiter >>> >> >> Neither you nor anyone else can give worst-case IRQ latencies for Linux >> running on PPC, MIPS, ARM, x86 or anything else - there is too much >> variation. > > This answer means it is infinite - nice figure in the context of > saving a few registers, no doubt about that. Am I supposed to > laugh or to cry.You are supposed to use something other than standard Linux (or Windows) when you need hard real time. If you really need to use Linux and you also really need real time, then you can use one of several real-time extensions to Linux which will give you a high (compared to dedicated RTOS's and more suitable hardware) but definitely not infinite maximum latency. Of course, since you sell a real-time system which /does/ have guaranteed worst-case latencies, obviously you should be laughing :-)> > I can give a figure for DPS - and guarantee it, commercially. > As an OS DPS is meanwhile no smaller than linux - just the applications > written for it are much much fewer. VM, windows, filesystem, networking > etc., it is all in there.I am sure DPS has lots of useful and important features - including everything you and your customers need. But I am also sure it /is/ smaller than Linux (which is currently at about 17e6 lines for the kernel alone) - the comparison is not useful. Comparing to vxworks, QNX, RTEMS, etc., would make more sense. (And these folks will also give you figures for latencies - assuming you can give details of the hardware, and perhaps pay them enough money!)> And I do have a figure for the latency. > So this figure for linux is infinity? >Unless you have calculated it, or at least measured it to a desired statistical level of accuracy, then by the definition of "worst case", it is infinite. (You might prefer to say "real time" requires calculation, not just measurement - but that gets increasingly difficult for more complex systems. If your tests suggest that missing a timing deadline is statistically less likely than being struck by lightning, that is often good enough.) One report I found with Google is for an 800 MHz Cortex A8 chip with kernel 2.6.31, testing with and without the "real time" patch (this is not a "real-time extension" to Linux, which work in a different way - basically the "real-time patch" sacrifices total throughput but allows most system calls and functions to be pre-emptable). Without the "real-time patch", maximum measured latencies were 2465 us - with the patch, the maximum measured latency was 58 us. Measurements will only be valid on a particular system, with particular kernel versions, and typical realistic (and worst case) loads - but that 58 us will give you a ballpark figure that's a little lower than infinity.
Reply by ●January 14, 20152015-01-14
On 14/01/15 14:58, Dimiter_Popoff wrote:> Oh but this is your fixation, not mine. You argued that ARM is at an > advantage because it does not have 32 registers but only 15 and > put that in the linux context by talking all that ABI and whatever > abbreviation gibberish the linux crowd constantly invents to > mask the mess they live in. >Perhaps the abbreviation "ABI" has multiple uses, and you are thinking of a different one than the rest of us? In this context, it is "Application Binary Interface", and is a set of rules for code and calling convensions for a particular target system. In some cases, the ABI will vary from compiler to compiler, or between target OS's - in other cases, the cpu manufacturer will control it tightly. In the x86 world, Intel gave very little guidance on an ABI - hence x86 compilers use wildly different calling conventions. AMD did better for amd64 - almost all compilers and OS's on amd64 use AMD's ABI, but of course Microsoft picked their own incompatible (and inferior) ABI. In the PPC world, PPC EABI is the standard for embedded systems, with other ABI's used for AIX, Linux, etc. The PPC EABI (with 32-bit and 64-bit variations) covers a wide range of standardisations, including register usage, stack alignment, size of standard types, section names, standard functions, etc. It is the ABI that says register R1 is the stack pointer in the PPC, and that R2 and R13 are anchors for small data areas (constant and read/write respectively), and that registers R0 and R3-R12 are "volatile" and must be saved by interrupt wrappers that call other EABI functions. I don't know why you assumed the mention of ABI meant people were talking about Linux.
Reply by ●January 14, 20152015-01-14
On 14.1.2015 г. 16:05, David Brown wrote:> On 14/01/15 14:14, Dimiter_Popoff wrote: >> On 14.1.2015 г. 14:54, David Brown wrote: >>> On 14/01/15 13:14, Dimiter_Popoff wrote: >>>> On 14.1.2015 г. 13:42, Tom Gardner wrote: >>>>> On 14/01/15 02:11, Dimiter_Popoff wrote: >>>>>> So what is the guaranteed IRQ latency on your ARM core of choice >>>>>> running linux with some SATA drives, multiple windows, ethernet, >>>>>> some serial interfaces. Try to give some figure - please notice >>>>>> the word "guaranteed", I know how much the linux crowd prefers >>>>>> to talk "in general". >>>>> >>>>> Having L1/L2/L3 caches will instantly introduce a high variation >>>>> between the mean and max latencies. Even for i486s with their >>>>> minimal cache and no operating system, a 10:1 variability was >>>>> visible. >>>> >>>> Yes, though on some processors one has the ability to lock part of the >>>> L1 cache - which allows to have it dedicated to interrupts which can >>>> make things a lot tighter (by saving the necessity to update entire >>>> cachelines). >>>> >>>> Overall the latency variability obviously increases as processor >>>> sizes increase but then total execution times decrease, memories >>>> get faster etc. so the worst case latency can still be very low. >>>> On the 5200b which I use I have never needed to resort to any >>>> cache locks etc., all I do is just stay masked only as absolutely >>>> necessary. >>>> >>>>> Any variability to do with register saving will be completely >>>>> insignificant compared to the effects of caches. Unless, of >>>>> course, you are having to dump the entire hidden state of >>>>> an Itanic processor :) >>>>> >>>> >>>> Well we have not come to that obvious point yet I am afraid :-). >>>> Let us first have the figure on the worst-case linux IRQ latency >>>> I asked for then put into its context the try of ARM/linux >>>> devotees about lower latency by not having enough registers :-). >>>> >>>> Dimiter >>>> >>> >>> Neither you nor anyone else can give worst-case IRQ latencies for Linux >>> running on PPC, MIPS, ARM, x86 or anything else - there is too much >>> variation. >> >> This answer means it is infinite - nice figure in the context of >> saving a few registers, no doubt about that. Am I supposed to >> laugh or to cry. > > You are supposed to use something other than standard Linux (or Windows) > when you need hard real time. If you really need to use Linux and you > also really need real time, then you can use one of several real-time > extensions to Linux which will give you a high (compared to dedicated > RTOS's and more suitable hardware) but definitely not infinite maximum > latency. > > Of course, since you sell a real-time system which /does/ have > guaranteed worst-case latencies, obviously you should be laughing :-) > >> >> I can give a figure for DPS - and guarantee it, commercially. >> As an OS DPS is meanwhile no smaller than linux - just the applications >> written for it are much much fewer. VM, windows, filesystem, networking >> etc., it is all in there. > > I am sure DPS has lots of useful and important features - including > everything you and your customers need. But I am also sure it /is/ > smaller than Linux (which is currently at about 17e6 lines for the > kernel alone) - the comparison is not useful.Oh but it is - if we compare the OS itself, not the applications. Meaning what you as a programmer will have as functionality via system calls. 17e6 lines of wasteful programming could well be less than mine 1.7e6 lines (not sure about the exact figure), hard to say. Does their kernel include the support for windows, offscreen buffers, graphics draw calls etc.?> Comparing to vxworks, > QNX, RTEMS, etc., would make more sense.Do these come with all the features like windows, VM, filesystem, networking?>> And I do have a figure for the latency. >> So this figure for linux is infinity? >> > > Unless you have calculated it, or at least measured it to a desired > statistical level of accuracy, then by the definition of "worst case", > it is infinite. (You might prefer to say "real time" requires > calculation, not just measurement - but that gets increasingly difficult > for more complex systems. If your tests suggest that missing a timing > deadline is statistically less likely than being struck by lightning, > that is often good enough.)Measuring is OK, calculating is not just difficult, it can be outright impractical nowadays. One should do it to get a ballpark figure what to expect then measure it - over a long enough time the worst case response is not so hard to measure, provided you know what is going on.> One report I found with Google is for an 800 MHz Cortex A8 chip with > kernel 2.6.31, testing with and without the "real time" patch (this is > not a "real-time extension" to Linux, which work in a different way - > basically the "real-time patch" sacrifices total throughput but allows > most system calls and functions to be pre-emptable). Without the > "real-time patch", maximum measured latencies were 2465 us - with the > patch, the maximum measured latency was 58 us.Well 58uS is still OK, only about 5 times (or is it 10 times, I am not sure whether the 10 uS figure was not on a 200 MHz machine) worse than DPS at a 400 MHz power (mpc5200b). The question why is this real time patch not universally applied remains of course, how much of the functionality do they have to sacrifice if they use it. I asked for this figure only to put into its context the claim about the "need" to save all 32 registers. So let us see - saving 16 registers more to say the slower of the two DDRAM-s, the one on the 400 MHz 5200b (assuming a complete cache miss), 133 MHz clocked DDRAM, which does something like a 10nS per .l IIRC on average will save 160 nS from the 58uS. I think we all can only laugh here. The funnier thing of course is that there is no justified necessity to waste these 160nS - but I can understand the programmer who may have wasted them, why would he bother - it would be just a waste of his time to chase nanoseconds when the system stays masked for tens of microseconds. I would not have bothered. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
Reply by ●January 14, 20152015-01-14
On 14/01/15 15:41, Dimiter_Popoff wrote:> On 14.1.2015 г. 16:05, David Brown wrote: >> On 14/01/15 14:14, Dimiter_Popoff wrote: >>> On 14.1.2015 г. 14:54, David Brown wrote: >>>> On 14/01/15 13:14, Dimiter_Popoff wrote: >>>>> On 14.1.2015 г. 13:42, Tom Gardner wrote: >>>>>> On 14/01/15 02:11, Dimiter_Popoff wrote: >>>>>>> So what is the guaranteed IRQ latency on your ARM core of choice >>>>>>> running linux with some SATA drives, multiple windows, ethernet, >>>>>>> some serial interfaces. Try to give some figure - please notice >>>>>>> the word "guaranteed", I know how much the linux crowd prefers >>>>>>> to talk "in general". >>>>>> >>>>>> Having L1/L2/L3 caches will instantly introduce a high variation >>>>>> between the mean and max latencies. Even for i486s with their >>>>>> minimal cache and no operating system, a 10:1 variability was >>>>>> visible. >>>>> >>>>> Yes, though on some processors one has the ability to lock part of the >>>>> L1 cache - which allows to have it dedicated to interrupts which can >>>>> make things a lot tighter (by saving the necessity to update entire >>>>> cachelines). >>>>> >>>>> Overall the latency variability obviously increases as processor >>>>> sizes increase but then total execution times decrease, memories >>>>> get faster etc. so the worst case latency can still be very low. >>>>> On the 5200b which I use I have never needed to resort to any >>>>> cache locks etc., all I do is just stay masked only as absolutely >>>>> necessary. >>>>> >>>>>> Any variability to do with register saving will be completely >>>>>> insignificant compared to the effects of caches. Unless, of >>>>>> course, you are having to dump the entire hidden state of >>>>>> an Itanic processor :) >>>>>> >>>>> >>>>> Well we have not come to that obvious point yet I am afraid :-). >>>>> Let us first have the figure on the worst-case linux IRQ latency >>>>> I asked for then put into its context the try of ARM/linux >>>>> devotees about lower latency by not having enough registers :-). >>>>> >>>>> Dimiter >>>>> >>>> >>>> Neither you nor anyone else can give worst-case IRQ latencies for Linux >>>> running on PPC, MIPS, ARM, x86 or anything else - there is too much >>>> variation. >>> >>> This answer means it is infinite - nice figure in the context of >>> saving a few registers, no doubt about that. Am I supposed to >>> laugh or to cry. >> >> You are supposed to use something other than standard Linux (or Windows) >> when you need hard real time. If you really need to use Linux and you >> also really need real time, then you can use one of several real-time >> extensions to Linux which will give you a high (compared to dedicated >> RTOS's and more suitable hardware) but definitely not infinite maximum >> latency. >> >> Of course, since you sell a real-time system which /does/ have >> guaranteed worst-case latencies, obviously you should be laughing :-) >> >>> >>> I can give a figure for DPS - and guarantee it, commercially. >>> As an OS DPS is meanwhile no smaller than linux - just the applications >>> written for it are much much fewer. VM, windows, filesystem, networking >>> etc., it is all in there. >> >> I am sure DPS has lots of useful and important features - including >> everything you and your customers need. But I am also sure it /is/ >> smaller than Linux (which is currently at about 17e6 lines for the >> kernel alone) - the comparison is not useful. > > Oh but it is - if we compare the OS itself, not the applications. > Meaning what you as a programmer will have as functionality via > system calls. 17e6 lines of wasteful programming could well be > less than mine 1.7e6 lines (not sure about the exact figure), > hard to say. Does their kernel include the support for windows, > offscreen buffers, graphics draw calls etc.? > >> Comparing to vxworks, >> QNX, RTEMS, etc., would make more sense. > > Do these come with all the features like windows, VM, filesystem, > networking?I am not sure that this is the best place to give a beginners course on Linux, QNX, RTEMS, or operating systems in general. Obviously you have vast experience with DPS - yet your questions show a lack of knowledge of how these sorts of OS's are built up and structured. I can't tell if you really know so little about what Linux is, and what an OS kernel is, or if you are being intentionally naïve - I have no wish to sound patronising and write about things you have worked with every day for twenty years, but equally I am happy to explain things if it is helpful to you. Can I just say you should read the Wikipedia articles plus each project's home page, and if we need to go further then we'll take it from there?> >>> And I do have a figure for the latency. >>> So this figure for linux is infinity? >>> >> >> Unless you have calculated it, or at least measured it to a desired >> statistical level of accuracy, then by the definition of "worst case", >> it is infinite. (You might prefer to say "real time" requires >> calculation, not just measurement - but that gets increasingly difficult >> for more complex systems. If your tests suggest that missing a timing >> deadline is statistically less likely than being struck by lightning, >> that is often good enough.) > > Measuring is OK, calculating is not just difficult, it can be outright > impractical nowadays. One should do it to get a ballpark figure what to > expect then measure it - over a long enough time the worst case response > is not so hard to measure, provided you know what is going on. >Agreed.>> One report I found with Google is for an 800 MHz Cortex A8 chip with >> kernel 2.6.31, testing with and without the "real time" patch (this is >> not a "real-time extension" to Linux, which work in a different way - >> basically the "real-time patch" sacrifices total throughput but allows >> most system calls and functions to be pre-emptable). Without the >> "real-time patch", maximum measured latencies were 2465 us - with the >> patch, the maximum measured latency was 58 us. > > Well 58uS is still OK, only about 5 times (or is it 10 times, I am > not sure whether the 10 uS figure was not on a 200 MHz machine) > worse than DPS at a 400 MHz power (mpc5200b). The question why > is this real time patch not universally applied remains of course, > how much of the functionality do they have to sacrifice if they > use it.In Linux, interrupts get passed on to kernel interrupt threads, and thus involve a (limited) context switch. That is always going to be more costly than handling the interrupt directly, but allows the interrupt code more access to kernel functions. As far as I understand the RT patch, there are two issues regarding universal application in the kernel. One is that improving worst-case response times means minimising the size of critical sections with interrupts disabled. The other is that much more of the kernel is preemptable and re-entrant, and uses finer grain locking. So code that used to be "get lock, do A, B, C, release lock" might be changed to "get lock, do A, release lock, do B, get lock, do C, release lock". The locked (or interrupt disabled) sections are shorter, but total throughput is reduced as there is more overhead in the locking. In particular, I gather than most spin-locks (which are very fast at taking a free lock) are replaced by mutexes with priority inheritance. Certainly some aspects have moved into the main kernel - modern Linux kernels have a lot finer grained locking than older ones, which tended to use the "big kernel lock" a great deal. The main motivation here is for SMP systems - when Linux systems were generally on one core, a single "large" lock was okay, but with multiple cores it gets very inefficient. Other aspects are configurable (as are many things in Linux) - you often want a different balance between throughput and response times for server systems, desktops, and embedded systems.> > I asked for this figure only to put into its context the claim about > the "need" to save all 32 registers. So let us see - saving 16 > registers more to say the slower of the two DDRAM-s, the one on the > 400 MHz 5200b (assuming a complete cache miss), 133 MHz clocked > DDRAM, which does something like a 10nS per .l IIRC on average > will save 160 nS from the 58uS.Register sizes are not relevant in this context (which is why people can't understand your jump to Linux) - clearly the number of registers saved is going to be a drop in the ocean when you are talking about big cpus running big OS's, rather than microcontrollers running bare-bones or dedicated OS's (and we have long ago established that saved register counts is usually, but not always, negligible in those systems too). Register save sizes is relevant when it is useful to have a response time of 12 cycles rather than 30 cycles - it is not an issue when the response time is 5000 cycles!> I think we all can only laugh here. The funnier thing of course is > that there is no justified necessity to waste these 160nS - but I > can understand the programmer who may have wasted them, why would > he bother - it would be just a waste of his time to chase nanoseconds > when the system stays masked for tens of microseconds. I would > not have bothered. >Yes indeed - premature optimisation is the root of all evil, after all.
Reply by ●January 16, 20152015-01-16
On Sat, 10 Jan 2015, Anders.Montonen@kapsi.spam.stop.fi.invalid wrote:> Vladimir Ivanov <none@none.tld> wrote: >> On Fri, 9 Jan 2015, Anders.Montonen@kapsi.spam.stop.fi.invalid wrote: > >>> MIPS32 support is optional for cores that support microMIPS. In fact, >>> the latest version of Microchip's XC32 compiler includes support for an >>> unreleased PIC32MM family which only supports microMIPS. >> Now that you mention it, I remember seeing pointers about future PIC32MM >> stuck to microMIPS only. Again marketing pressure? > > As far as I can tell from the header files and compiler source code, the > PIC32MM could be a replacement/follow-up for the PIC32MX1xx/2xx. There's > no DSP ASE, and no shadow registers, so it's clearly not a high- > performance chip, and it doesn't seem like it has any special > peripherals either. Using microMIPS at the low end makes sense, as you > can fit more code in a smaller flash. I don't know how much silicon area > is saved by having only the one instruction set, but that kind of makes > sense for a low-end chip as well.They will shave some Flash space from the MIPS16e -> microMIPS transition, but that won't be revolutionary. But the MIPS32 -> microMIPS will be noticeable, yes. Maybe MIPS16e is not that popular after all. The silicon savings of MM's core are probably close to none, I think this is mostly for the user's comfort of staying into a single mode and having a distinguished Thumb-2 competitor. I wouldn't be surprised if the MIPS32 decoder is present in the macro cell, just fused/disabled. It is only a speculation, of course, but keeping less cores is a sane choice. Still, the MM might be interesting.







