Reply by Niklas Holsti December 1, 20092009-12-01
brOS [Bogdan Rosandic] wrote:

 > Task_Suspend_no_sched should looks like this:
 >
 > void Task_Suspend_no_sched(void){
 >   change_state();
 >   put_in_suspend_list();
 >   save_context();//it should use PC placed on stack by
 >                 //  Task_Suspend_no_sched  call
 >   //_bis(LPM1); or loop like below
 >   for(;;){
 >     Idlecnt++
 >   }

Marc Jet wrote:
> brOS [Bogdan Rosandic] wrote: >> So ISR which provide system tick should interrupt loop or low power mode, >> and then scheduler should be called. When Scheduler schedule this task >> again it should jump at the beginning of the task. > > Then all the comments about your approach (being inappropriate) where > dead-on.
Bogdan's extended description of his design closely matches my guess of his design, as far as I can tell. Therefore I don't agree with Marc's conclusion. I think Bogdan's approach is reasonable, except that it is risky to use C for code such as save_context() and the unconditional spin loop. But perhaps Bogdan is only showing C-like pseudocode? There is one possibly significant difference between Bogdan's description and my guess. The example task that Bogdan shows: brOS [Bogdan Rosandic] wrote: > Task routine should look like this : > > void TaskRoutine(void){ > while(1){ > > //do some work > > Task_Suspend_no_sched(); > } > > } contains just one call of Task_Suspend_no_sched, at the end of the eternal while(1) loop. If this is true of all tasks in Bogdan's application, the design could be simplified by inverting the control, so that instead of TaskRoutine calling the kernel/Scheduler through Task_Suspend_no_sched, and the Scheduler then resuming the TaskRoutine after this call, the Scheduler could always call TaskRoutine at its entry point, and the TaskRoutine could return to the kernel after doing its work. The while(1) in TaskRoutine would be removed as would the call of Task_Suspend_no_sched, and their functions would be taken over by things that the kernel/Scheduler does in between calls of TaskRoutine. The TaskRoutine would be just void TaskRoutine(void){ //do some work } A TaskRoutine with a single call of Task_Suspend_no_sched corresponds to a very strict design rule for real-time systems called the single suspension point rule. By this rule, each task shall have a single point where it can be suspended and resumed. The rule is good for schedulability analysis, but can be difficult to use if the task must perform complex timing or interaction sequences, because the task must then use data variables to remember what it is doing -- how far the sequence has advanced -- perhaps by means of a finite-state automaton.
> Your function Task_Suspend_no_sched() should really be named something > like WaitForTick(). And it should do just that: wait for the next > system tick.
Marc, did you not note that the Scheduler may schedule *another* task, not simply continue the one that is "waiting for the next system tick" in your terms? So the waiting task may be waiting for a longer time, several ticks, and it is necessary to save its context, including the PC. One way to get the PC is to retrieve the return address for Task_Suspend_no_sched, as Bogdan does. Another way is to put the task in a loop until it is interrupted, then retrieve the PC of the interrupt point (in other words, the return address of the interrupt handler).
> The implementation does not require breaking out of endless loops or > other fancy stuff. It is basic task switching theory, descibed in OS > literature and all over the internet.
Any kernel call that can suspend the calling task must retrieve the PC, which seems to be the kind of "fancy stuff" that Marc means. As for the endless loop, most schedulers can encounter a situation where no real task is ready to do any real work until the next interrupt happens. There are several ways to deal with this: - Use a special instruction or configuration that halts, idles, or powers-down the processor until an interrupt comes in. This is what Bogdan plans to do in the real system. (As an aside, this can have a nasty drawback: in one project where I was involved and it was tried, the resulting square-wave variation in the processor's power consumption disturbed the sensitive analog electronics on the board, so we had to use an eternal loop instead. That was not an MSP430, however, but a space-qualified 80C32, so it used rather more power.) - Schedule a lowest-priority null task that contains just an eternal loop that does nothing, or perhaps maintains a processor-load indicator such as Bogdan's Idlecnt. But this is not so easy in a non-preemptive scheduler. - Use a spin loop in the kernel itself, as Bogdan does. Earlier discussion in this thread shows that making this loop conditional is logically redundant. The only reason for not using an eternal (unconditional) loop would be to avoid confusing the C compiler, which is one reason why this loop should be written in assembly language. I have seen such eternal null loops in more than one kernel, including commercial kernels. I think they are an appropriate solution to this requirement.
> Good luck with your project!
Good luck from me, too, Bogdan. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Reply by Marc Jet December 1, 20092009-12-01
> So ISR which provide system tick should interrupt loop or low power mode, > and then scheduler should be called. When Scheduler schedule this task > again it should jump at the beginning of the task.
Then all the comments about your approach (being inappropriate) where dead-on. Your function Task_Suspend_no_sched() should really be named something like WaitForTick(). And it should do just that: wait for the next system tick. The implementation does not require breaking out of endless loops or other fancy stuff. It is basic task switching theory, descibed in OS literature and all over the internet. Using such a function, all the rest falls nicely into place. Good luck with your project! Marc
Reply by brOS November 30, 20092009-11-30
Hi all.... :)

I 'd like to thank you for your posts because you helped me a lot. And I'm
also sorry because I couldn't reply earlier...
I would also would like to explain what I wanted with my Spin function.
First, it is infinite loop function which increments idle counter only for
testing code in simulator. For real work instead of it I'm using low power
mode.

The Spin function is part of a kernel API call Task_Suspend_no_sched().
This function should suspend task, change its state, put it in a suspended
list, save context and then enter low power mode until next tick, when
scheduler should be called. So I needed that PC to save task's  context in
such way, that when it is scheduled again it could jump outside the Spin,
or to instruction after the _bis(LPM1) or somthing like that. 
So that was idea and finally I realized it like shown below

Task routine should look like this :

void TaskRoutine(void){
   while(1){
     
     //do some work
     
     Task_Suspend_no_sched();
   }

} 

Task_Suspend_no_sched should looks like this:

void Task_Suspend_no_sched(void){
  change_state();
  put_in_suspend_list();
  save_context();//it should use PC placed on stack by
                //  Task_Suspend_no_sched  call
  //_bis(LPM1); or loop like below
  for(;;){
    Idlecnt++
  }

}
So ISR which provide system tick should interrupt loop or low power mode,
and then scheduler should be called. When Scheduler schedule this task
again it should jump at the beginning of the task.

Thank you for your comments and suggestions, it was very useful. :)  	   
					
---------------------------------------		
This message was sent using the comp.arch.embedded web interface on
http://www.EmbeddedRelated.com
Reply by brOS November 30, 20092009-11-30
Hi all.... :)

I 'd like to thank you for your posts because you helped me a lot. And I'm
also sorry because I couldn't reply earlier...
I would also would like to explain what I wanted with my Spin function.
First, it is infinite loop function which increments idle counter only for
testing code in simulator. For real work instead of it I'm using low power
mode.

The Spin function is part of a kernel API call Task_Suspend_no_sched().
This function should suspend task, change its state, put it in a suspended
list, save context and then enter low power mode until next tick, when
scheduler should be called. So I needed that PC to save task's  context in
such way, that when it is scheduled again it could jump outside the Spin,
or to instruction after the _bis(LPM1) or somthing like that. 
So that was idea and finally I realized it like shown below

Task routine should look like this :

void TaskRoutine(void){
   while(1){
     
     //do some work
     
     Task_Suspend_no_sched();
   }

} 

Task_Suspend_no_sched should looks like this:

void Task_Suspend_no_sched(void){
  change_state();
  put_in_suspend_list();
  save_context();//it should use PC placed on stack by
                //  Task_Suspend_no_sched  call
  //_bis(LPM1); or loop like below
  for(;;){
    Idlecnt++
  }

}
So ISR which provide system tick should interrupt loop or low power mode,
and then scheduler should be called. When Scheduler schedule this task
again it should jump at the beginning of the task.

Thank you for your comments and suggestions, it was very useful. :)  	   
					
---------------------------------------		
This message was sent using the comp.arch.embedded web interface on
http://www.EmbeddedRelated.com
Reply by Niklas Holsti November 27, 20092009-11-27
David Brown wrote:
> Niklas Holsti wrote: >> David Brown wrote: >>> Niklas Holsti wrote: >>>> David Brown wrote: >>>>> Niklas Holsti wrote: >>>>>> David Brown wrote: >>>>>>> Niklas Holsti wrote: >>>>>>>> [ Quotations edited severely but hopefully without misattribution.] >>>>>>>> >>>>>>>>>>>> brOS wrote: >>>>>>>>>>>>>> On Mon, 23 Nov 2009 14:19:14 -0600 >>>>>>>>>>>>>> "brOS" <bogdanrosandic@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dear all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Does anybody knows how to force compiler to use call >>>>>>>>>>>>>>> instruction >>>>>>>>>>>>>>> instead of br(branch)for disassembling function call? >>>>>>>>>>>>>>> It is extremely important for me to specific function is >>>>>>>>>>>>>>> disassembled >>>>>>>>>>>>>>> using call instead of brunch, as compiler always does. >>>>>>>>>>>>>>> >>>>>>>>>>>> ... >>>>>>>>>>>>>> >>>>>>>>>>>>> This is why i need it.... >>>>>>>>>>>>> Function I'm calling have looks something like this: >>>>>>>>>>>>> void Spin(void){ >>>>>>>>>>>>> for(;;){} >>>>>>>>>>>>> } >>>>>>>>>>>>> So if it is disassembled with call before entering in pc >>>>>>>>>>>>> will be saved on >>>>>>>>>>>>> stack and it will point to instruction after function >>>>>>>>>>>>> spin....So I want to >>>>>>>>>>>>> use that pc and to save context so when my scheduler >>>>>>>>>>>>> schedule that task >>>>>>>>>>>>> again it will not continue spinning in that forever loop >>>>>>>>>>>>> but it will jump >>>>>>>>>>>>> to next instruction after Spin function..... >>>>>>>>>>>>> branch doesn t push pc to stack so taht s my problem;) >>
[ snip lots ]
>> Assuming that >> >> o the thread calls Spin using the normal calling sequence, in >> which the return address is left on top of the stack, >> > > You can't assume that (otherwise the OP would never have asked the > question in the first place...), although we can probably assume this > can be forced in some way.
The OP was specifically asking how to force the use of a normal calling sequence. The OP thought that a branch instruction did not represent a normal calling sequence, and it doesn't, except when it is implementing a tail call, as I think was the case in the OP's problem.
>> o the code in Spin does not push more data on the stack, and >> > > It may make sense for the end-of-time-slice function to do more than > just spin. Then it may have to make a separate call to Spin.
Agreed. Or pop whatever it has pushed on the stack, before spinning. Or even better, save the context of the thread before spinning, which would really decrease the thread-resumption latency.
>> o the handler is written in MSP430 assembly language, >> > > That's a big requirement, and totally unnecessary except as a way to > implement this bad idea.
Pooh. You have admitted that parts of a thread switch must be written in assembly language. So this code is in that part.
> Don't forget the comparisons to check that you are in the spin loop.
Yep. But if you want a check for time-slice overrun, that has to be done somehow, perhaps with a flag. Both checks are about equally fast, I think. If you don't want to check for time-slice overrun, you need a pre-emptive scheduler.
>> If Spin is inlined we lose the ability to check for time-slice overruns >> by checking that the interrupted thread is in the unique and only Spin, > > Yes, but you don't /need/ to check call or return addresses if you write > proper C code...
But if you still want to check for time-slice overrun you have to use flags, and watch out for race conditions. The amount of inlined code is growing...
>>> Even making the great leap of faith that there is a reliable way to >>> get the desired return address, >> >> Wow, this makes me feel like a preacher. But "raise your eyes to the >> text above", and believe! :-) >> > > I think we'll just have to agree to disagree on this one.
OK.
>> the concept is /still/ wrong. >> >> That sounds rather dogmatic. If it works and is reliable, why is it >> "wrong"? Too heretical? >> > > Yes, it's against my religion :-) You don't write hacked code based on > lying to the compiler, assembly code, and stack manipulation tricks when > there are perfectly safe, efficient and reliable ways to do the same job > with C.
Except for the assembly code that you need to switch threads. The only "lying" that has been done here is writing the eternal Spin loop in C, from which the compiler could deduce that no Spin call returns. I think Spin should be written in assembly language and considered part of the thread-switching code. In fact (but don't take this too grievously :-), the only non-standard C code for "stack manipulation tricks" was shown by you, when you referred to the special gcc function for getting the return address.
> And yes, I know I am pontificating > - doesn't that beat preaching?
Good one, David! Luckily I'm an atheist... well, perhaps an agnostic for the purposes of this thread. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Reply by Niklas Holsti November 27, 20092009-11-27
Marc Jet wrote:
> Niklas Holsti ha escrito: >> The OP seems to know perfectly well how to break >> out of this loop by changing the PC in the scheduler (when the looping >> code is interrupted). > > Having seen this thread up to this point, I think the key point is > summarized in the quote. > > You assume that the OP has a very thorough knowledge of what he's > doing and why. To me, his questions show that this might not be the > case!
You may of course be right, Marc. My impression is based on two things: the OP seems to have a workable design for the scheduler, and the OP seems able to read and understand the assembly-language code the compiler has generated. The OP's worry was only that a branch instruction would not leave a return address that the scheduler could use to resume the thread. I think that the OP did not know about tail calls implemented as branches, or did not remember this possibility.
> To mention a few points that support this impression of mine: > > - What's the behaviour of a function that does not return, when it > suddenly returns? To all my knowledge the behaviour will be > undefined. You can't complain when undefined behaviour turns out to > not be what you expected.
The behaviour is defined by the code that the C compiler generates, and which the OP seems to have inspected. Of course, it is risky to rely on future compilations giving the same code. Writing Spin in assembly language would remove that risk. I do understand that your question is about behaviour as defined in the C standard, but this is not a pure C program.
> - How should a scheduler detect that the interrupted task is in fact > in the spin loop? Should it use the linker map to find the code > address (and what about optimizations)? Should it match instruction > patterns (and how do you force those to be generated by the > compiler)? In my opinion there is no safe way to detect this, unless > the function "helps" the scheduler (for example using a Yield() call, > global vars, etc).
As I understood it, the Spin function is part of the OP's kernel/scheduler. If the loop in Spin is of the form "lab: jump lab", the scheduler interrupt handler can compare the PC at the interrupt point to the address of the label "lab". If Spin is written in assembly language, "lab" can be defined as a global symbol so its address is accessible to the scheduler as a constant. Or the Spin module can define a globally visible data word that holds the address of "lab". But I don't know how, or even if, the OP intends to check that the task has reached Spin. Perhaps the OP's Spin sets a flag before entering the loop. Anyway, the check can be done easily and quickly, whether using a flag or using the PC.
> Since we're talking about a C program, we are constrained by the rules > of C.
We are talking about a program consisting of some C code, divided into several threads/tasks (which is outside C semantics, I believe), plus a thread scheduler (also outside C semantics), probably implemented by a tick interrupt handler. Typical multi-threaded embedded program.
> We programmers exhaust those rules to our advantage, and the > compiler writers exhaust "their side" of the rules to make better > compilers. Therefore we can't break the rules, and C is not the > correct tool for what the OP wants (IF it really is what he wants).
That would certainly be my approach -- for my own purposes, I am not at all an "anything that works is good" programmer; I usually write in Ada, and enjoy it. I would write Spin in assembly language, as well as any coding-sensitive parts of the scheduler.
> The clean way is to use any language or tool to create the desired > functionality and encapsulate it into a linkable object. Only then it > is compatible to the C portion of the design, and will stay so until > the compiler calling conventions change.
I agree fully, but I did not want to lecture the OP, only answer the OP's question. I did advise the OP to use separate compilation for Spin. In retrospect, I should have advised the OP to use assembly language for Spin. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Reply by David Brown November 27, 20092009-11-27
Niklas Holsti wrote:
> David Brown wrote: >> Niklas Holsti wrote: >>> David Brown wrote: >>>> Niklas Holsti wrote: >>>>> David Brown wrote: >>>>>> Niklas Holsti wrote: >>>>>>> [ Quotations edited severely but hopefully without misattribution.] >>>>>>> >>>>>>>>>>> brOS wrote: >>>>>>>>>>>>> On Mon, 23 Nov 2009 14:19:14 -0600 >>>>>>>>>>>>> "brOS" <bogdanrosandic@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Dear all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does anybody knows how to force compiler to use call >>>>>>>>>>>>>> instruction >>>>>>>>>>>>>> instead of br(branch)for disassembling function call? >>>>>>>>>>>>>> It is extremely important for me to specific function is >>>>>>>>>>>>>> disassembled >>>>>>>>>>>>>> using call instead of brunch, as compiler always does. >>>>>>>>>>>>>> >>>>>>>>>>> ... >>>>>>>>>>>>> >>>>>>>>>>>> This is why i need it.... >>>>>>>>>>>> Function I'm calling have looks something like this: >>>>>>>>>>>> void Spin(void){ >>>>>>>>>>>> for(;;){} >>>>>>>>>>>> } >>>>>>>>>>>> So if it is disassembled with call before entering in pc >>>>>>>>>>>> will be saved on >>>>>>>>>>>> stack and it will point to instruction after function >>>>>>>>>>>> spin....So I want to >>>>>>>>>>>> use that pc and to save context so when my scheduler >>>>>>>>>>>> schedule that task >>>>>>>>>>>> again it will not continue spinning in that forever loop but >>>>>>>>>>>> it will jump >>>>>>>>>>>> to next instruction after Spin function..... >>>>>>>>>>>> branch doesn t push pc to stack so taht s my problem;) > > [ snip ] > > Niklas Holsti wrote: >>>>> Agreed. For most embedded compilers, though, anything in a separate >>>>> compilation is considered "foreign". But as you say, there is no >>>>> guarantee in general. > > David Brown wrote: >>>> These days, full program optimisation is not uncommon. Even gcc >>>> (despite its critics' opinions) can do reasonable full program >>>> optimisation by compiling all the C modules in one shot. > > Niklas Holsti wrote: >>> Sure, but in that case it would not be "separate compilation". > > And David Brown wrote: >> Fair enough. >> >> What about gcc 4.5 with -flto ? Then you can compile C modules >> separately into object files, but the object files hold a copy of the >> internal trees as well as generated object code. When you link these >> object files, the trees are used for link-time optimisation, including >> inlining across modules. > > That is interesting info on gcc, and new to me (I don't follow gcc > development that closely). Thanks, David, this is definitely something > one needs to know about (future) gcc. >
See <http://gcc.gnu.org/gcc-4.5/changes.html> and <http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flto-796> (and also <http://gcc.gnu.org/> more generally). It was news to me too that the LTO (link time optimisation) branch of gcc had been merged with the main development line. I've known of its existence for a long time, but for many years it has been (or appeared to be) a bit of a blue-sky project with a lot of ideas and limited working code. It will still take a while before LTO rolls down to gcc compilers popular in c.a.e. gcc 4.5 is in stage 3 (no new features, bug fix and testing) - it will be a early next year before we can expect a first release. It will take a while from then for Code Sourcery to qualify and verify it thoroughly on their targets, and the 32-bit embedded gcc suppliers will pick it up from there. Smaller ports, such as avr-gcc, will take longer - they have fewer developers and resources. For out-of-tree ports such as the msp430, it depends entirely on what the developers want to prioritise. At the moment (gcc 4.3), using -combine and -fwhole-program can get you quite a lot of these effects for C programming. Basically, it treats all the files in the program as a single big C file with everything declared "static". LTO will give several advantages on top of that. Files can be individually compiled - useful for large projects, when files are in different directories, or when you want different compiler options. Libraries can also have LTO information. You can use languages other than C (for example, C++), and mix them together. And gcc 4.5 has a number of new optimisations that are only relevant for whole-program compilation.
>> You lose all clarity in the definitions of "compile", "link", and >> "separate compilation". > > Yes indeed, at least for the purpose of ensuring a standard calling > sequence is used for a given function. > >> But that is a digression, especially since the msp430 gcc port is not >> (yet) updated to gcc 4.5, which is itself not yet released. > > And the OP specifically asked about the IAR MSP430 compiler, anyway. > >>> Interesting question, though: Is there a standard way in a C >>> environment to ensure that the standard calling sequence is used for >>> an extern function, with no C-calling-C optimizations? >>> >> >> I think the only way is by being sure that the compiler can't access >> the code for a function declared as "extern". It should not be hard >> to do, but you may have to do it explicitly. For example, if you use >> a compiler's IDE and project manager, you might have to go out of your >> way to force true separate compilation. > > I was afraid that would be the answer, as far as it goes. Yuck. >
It is like a lot of things in C development - finding a general, portable, standards-compliant solution is hard, even though making it work in a real life project is typically very easy. The trick is to find a balance for a solution that is general enough without being overly complicated.
>>>>> I'm not so ready to call this "bad design" without knowing more >>>>> about the OP's requirements and design. The code generated for Spin >>>>> is exactly the kind of tight eternal loop that you often find in a >>>>> kernel where the kernel has no ready tasks and waits for an >>>>> interrupt. I haven't tried it, but it seems to me that writing this >>>>> loop as a conditional, flag-checking one could increase (by a >>>>> little) the latency for resuming the right task when an interrupt >>>>> happens, compared to resuming the task directly from the interrupt >>>>> handler and simply abandoning the tight loop. >>>>> >>>> >>>> Nah, the loop overhead to continually read a flag would be a few >>>> cycles at most. The interrupt function overhead to figure out >>>> return addresses from the stack will be much, much worse. > > There are several things under discussion here: > > - Whether it makes sense to use a routine Spin, containing a loop > (whether conditional or unconditional) as the last thing a thread should > call in its time-slice, such that threads are always suspended and > resumed only at this point, that is, within the call of Spin. > > David, I think you have more or less agreed that this is a workable > design for a non-preemptive (in my definition) time-sliced system that > does not schedule other threads to use the slack left over in one > thread's time-slice. I won't say more to defend it at this point. >
OK.
> - How difficult or time-consuming is it for an interrupt handler that > interrupts the loop in Spin to find the return address of that call of > Spin? > > Assuming that > > o the thread calls Spin using the normal calling sequence, in > which the return address is left on top of the stack, >
You can't assume that (otherwise the OP would never have asked the question in the first place...), although we can probably assume this can be forced in some way.
> o the code in Spin does not push more data on the stack, and >
It may make sense for the end-of-time-slice function to do more than just spin. Then it may have to make a separate call to Spin.
> o the handler is written in MSP430 assembly language, >
That's a big requirement, and totally unnecessary except as a way to implement this bad idea.
> then this is just one POP.W instruction, executed after the interrupt > handler has popped the saved status register and saved interrupt-point > PC from the stack. (I'm not very familiar with the MSP430 instruction > set and its interrupt handling, so this may be a bit optimistic. But the > MSP430 instruction set is claimed to be strong on stack accesses, so it > should not be much harder.) >
That is mostly true (you can't just pop the top of the stack, because the interrupt function must first preserve a register or two - but as you say the msp430 has good stack access instructions), given your assumptions. But as I noted above, the assumptions are not reasonable, IMHO.
> Thus, getting the return address of Spin (under the above assumptions) > is quick and well-defined. > > In fact, the tick interrupt handler could do it smartly as follows: > > 1) Pop the saved status register. > 2) Pop the saved interrupt-point PC, check that it points to the loop in > Spin, and then discard it. > 3) Push back the saved status register. >
Once you have taken into account saving a working register or two (easy enough), then that's a fairly elegant implementation of a very ugly hack.
> This makes the two top words on the stack be the resumption PC (the > return address for the Spin call) and the saved status register, exactly > the state needed for a future RETI to resume this thread. It is not even > necessary to get and manipulate the return address for the Spin call. > (I'm assuming that each thread has its own stack area.) > > - Whether the Spin routine can or should be written in C. > > If the C compiler generates code for Spin and for the calls to Spin that > satisfies the above assumptions, it can be written in C. But David is > right to say that it is hard to be sure that the assumptions do hold, > and will continue to hold, if Spin is written in C. So let's assume that > we write Spin in assembly language, which lets us be sure that the > assumptions hold. > > - Whether the thread-resumption latency can be shorter if the Spin loop > is unconditional, and the return address of Spin is saved and used as > the resumption point (case A), compared to the latency when the Spin > loop polls a flag, the address in the interrupted loop is saved and used > as the resumption point, and the interrupt handler sets the flag to make > the loop terminate (case B). > > I comment on that below. > >>> Let's consider what the kernel has to do, in my guess of the OP's >>> design, considering the two cases of (A) an unconditional "eternal" >>> loop and (B) a flag-checking loop. >>> >>> The kernel knows which thread is running. >>> >>> When the thread finishes its job in this time-slice, it calls Spin, >>> expecting to be resumed at the next instruction after the Spin call, >>> say instruction R. >>> >>> The Spin function loops, eating up the rest of the time-slice. >>> >>> The tick interrupt comes in. >>> >>> The tick interrupt handler saves the context of the interrupted >>> thread. By comparing its PC to the address of the Spin loop, it can >>> check that the thread has not overrun its time-slice. At this point: >>> >> >> Note that a sensible Spin function would tell the kernel that it is >> finished and entering the spin loop, > > Why would the kernel need that information, if it is not going to > schedule another thread for the rest of this time-slice? >
If this system is for scheduling important or time-critical tasks, and there is no prioritised pre-emption, it is very likely that you need to track when the task's work is done for testing and verification, or for tracking errors. But you are correct that the kernel might not /need/ that information.
>> rather than leaving the interrupt handler to figure it out in this >> fragile way. > > I don't see much fragility in it. It is beautifully simple: if the > thread finished what it had to do, it is in Spin; otherwise not. (I hope > I am lauding the OP here, not my own guess about the design.) >
Working around the compiler in this way /is/ fragile. It is a hack, and it is dependent on details of the compiler, the processor, the stack structure, it requires assembly for what should be simple C code, and it hinders the compiler's optimiser. However, I have to agree that you have come up with a simple implementation of this design (and I am lauding /your/ implementation here, not the OP's bad design, or our guesses about it).
> And perhaps the OP's kernel actually has an "I_am_done" kernel call, > which just ends up in Spin. > >>> - For (A) the handler gets the return address of Spin >>> by a POP and stores this return address as the resumption >>> point for the thread to be suspended. >>> >> >> Assuming, of course, that you've figured out a way to do that safely >> and reliably.... > > See above: a POP instruction. It is safe and reliable, if Spin is > written in assembly language. > >>> In case (B), the thread is resumed in the middle of the flag-checking >>> loop. It still has to read the flag, branch out of the loop, and >>> execute a return instruction (effectively a POP from the stack), >>> before instruction R is reached. >>> >> >> Yes, you can expect it to take about 3 or 4 instructions before >> getting to R. That would still be a lot less time than you spend >> messing around getting the address of R in case (A), so case (B) wins >> here in time. > > Getting the address of R takes one POP in case (A). Probably faster than > these 3-4 instructions, at least not much slower. >
Don't forget the comparisons to check that you are in the spin loop.
>> Remember that all ideas about how case (A) could feasibly be >> implemented are based on hobbling the compiler. > > No, it is based only on putting the right code in Spin, and ensuring > that Spin is called with the standard calling sequence that leaves a > return address on top of stack. This is readily and normally done by > writing Spin in assembly language. The compiler is not hobbled in the C > code parts. >
A function as simple as Spin case B would often be inlined into the calling function (either explicitly, or via whole-program optimisation). Code like that is smaller as well as faster when inlined.
>> Write the code correctly (case B), and you can let the optimiser do >> its job - that will overwhelm any conceivable time advantage case A >> might have had. > > It can hardly overwhelm it if Spin contains just the loop -- not much to > optimise there.
The code calling Spin can be better optimised if Spin is a proper C function, and the complier knows its definition. > Or do you mean to write the tick interrupt handler in C?
> I know that some C compilers claim that you can use them to write > interrupt handlers, but to me this seems more fragile than writing Spin > in C. Especially for an interrupt handler that is meant to switch > threads, not just manage som peripheral device and return to the > interrupted thread. >
The majority of interrupt handlers in embedded systems are written in C these days. For general interrupts, if you are not happy to trust your compiler to generate good and safe interrupt code, get a better compiler! But thread switching code will almost certainly need some inline assembly at least.
> However, the OP said that Spin in the OP's kernel contains some other > things, too, so it's hard to say what the optimiser could do. > >> Among other things, Spin() could be inlined in its calling function >> and remove most of the overhead. > > What "overhead" are you talking about, David?
I am referring to the check of the exit flag that you think makes my Spin function too slow compared to your version. > If Spin's profile is as
> simple as the OP showed (void Spin (void)), inlining would directly save > only one call or branch instruction per time-slice. Perhaps the > optimiser could let the thread keep more local data in registers over > the (in-lined) Spin call, avoiding some store/load instructions. The IAR > MSP430 compiler defines R12-R15 as scratch registers (caller-save) and > R4-R11 as preserved registers (callee-save), so an inlined function > could increase the available registers from 8 to 12; hard to say if that > would be significant. >
It is, as you say, hard to be sure - especially if Spin contains other code.
> If Spin is inlined we lose the ability to check for time-slice overruns > by checking that the interrupted thread is in the unique and only Spin,
Yes, but you don't /need/ to check call or return addresses if you write proper C code...
> so I think that Spin should not be inlined. Perhaps you consider this to > be one of the "fragile" aspects of this overrun-checking method, but it > is not difficult to make sure that Spin is not inlined. > >> Even making the great leap of faith that there is a reliable way to >> get the desired return address, > > Wow, this makes me feel like a preacher. But "raise your eyes to the > text above", and believe! :-) >
I think we'll just have to agree to disagree on this one.
> There can hardly be a more reliable aspect of a standard calling > convention than the presence and location of the return address. > > > and then making a second leap of faith that >> case A is faster, the concept is /still/ wrong. > > That sounds rather dogmatic. If it works and is reliable, why is it > "wrong"? Too heretical? >
Yes, it's against my religion :-) You don't write hacked code based on lying to the compiler, assembly code, and stack manipulation tricks when there are perfectly safe, efficient and reliable ways to do the same job with C. It's about writing legible, maintainable, portable code that is clear in its purpose and easy to verify. Even if this is nothing more than a simple test program, you should maintain a certain level of development quality. I am not saying that /all/ such hacks are a bad thing - just that you have to have very good reason for using them. Shaving off a few processor cycles (if you are correct and you /do/ save time) is very seldom a good enough reason. Just because code is accepted by the compiler, and works in practice, does not mean it cannot be *wrong*. And yes, I know I am pontificating - doesn't that beat preaching?
>> There is no way that shaving a few cycles off the latency >> could justify using this horrible hack. > > Although I still think that case A is a bit faster (and feel I have > given good reasons above and in my preceding posting) it isn't the main > point in favour of case A, the unconditional loop. Given this design of > a Spin function in which the threads are suspended and resumed, a > flag-polling loop is logically unnecessary: after the tick interrupt, > the thread that gets to poll the flag is *the* scheduled thread, so > polling the flag is superfluous. > > I think the design is a neat solution to specific, limited requirements. > It is a bit tricky, but interrupt-handling and thread-switching are > often tricky. I mean "tricky" in the sense of "a trick", not in the > sense of "difficult". >
Reply by Marc Jet November 27, 20092009-11-27
Niklas Holsti ha escrito:
> The OP seems to know perfectly well how to break > out of this loop by changing the PC in the scheduler (when the looping > code is interrupted).
Having seen this thread up to this point, I think the key point is summarized in the quote. You assume that the OP has a very thorough knowledge of what he's doing and why. To me, his questions show that this might not be the case! Solving his question to the letter will probably not be of much value. To mention a few points that support this impression of mine: - What's the behaviour of a function that does not return, when it suddenly returns? To all my knowledge the behaviour will be undefined. You can't complain when undefined behaviour turns out to not be what you expected. - How should a scheduler detect that the interrupted task is in fact in the spin loop? Should it use the linker map to find the code address (and what about optimizations)? Should it match instruction patterns (and how do you force those to be generated by the compiler)? In my opinion there is no safe way to detect this, unless the function "helps" the scheduler (for example using a Yield() call, global vars, etc). Since we're talking about a C program, we are constrained by the rules of C. We programmers exhaust those rules to our advantage, and the compiler writers exhaust "their side" of the rules to make better compilers. Therefore we can't break the rules, and C is not the correct tool for what the OP wants (IF it really is what he wants). The clean way is to use any language or tool to create the desired functionality and encapsulate it into a linkable object. Only then it is compatible to the C portion of the design, and will stay so until the compiler calling conventions change. I say this as programmer who has written schedulers on various architectures, and coincidentially also a virtual CPU of the architecture used by the OP (involving detailed inspection of instruction set and compiler conventions to create efficient hooks from virtual to physical). Best regards Marc
Reply by Niklas Holsti November 26, 20092009-11-26
David Brown wrote:
> Niklas Holsti wrote: >> David Brown wrote: >>> Niklas Holsti wrote: >>>> David Brown wrote: >>>>> Niklas Holsti wrote: >>>>>> [ Quotations edited severely but hopefully without misattribution.] >>>>>> >>>>>>>>>> brOS wrote: >>>>>>>>>>>> On Mon, 23 Nov 2009 14:19:14 -0600 >>>>>>>>>>>> "brOS" <bogdanrosandic@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Dear all, >>>>>>>>>>>>> >>>>>>>>>>>>> Does anybody knows how to force compiler to use call >>>>>>>>>>>>> instruction >>>>>>>>>>>>> instead of br(branch)for disassembling function call? >>>>>>>>>>>>> It is extremely important for me to specific function is >>>>>>>>>>>>> disassembled >>>>>>>>>>>>> using call instead of brunch, as compiler always does. >>>>>>>>>>>>> >>>>>>>>>> ... >>>>>>>>>>>> >>>>>>>>>>> This is why i need it.... >>>>>>>>>>> Function I'm calling have looks something like this: >>>>>>>>>>> void Spin(void){ >>>>>>>>>>> for(;;){} >>>>>>>>>>> } >>>>>>>>>>> So if it is disassembled with call before entering in pc will >>>>>>>>>>> be saved on >>>>>>>>>>> stack and it will point to instruction after function >>>>>>>>>>> spin....So I want to >>>>>>>>>>> use that pc and to save context so when my scheduler schedule >>>>>>>>>>> that task >>>>>>>>>>> again it will not continue spinning in that forever loop but >>>>>>>>>>> it will jump >>>>>>>>>>> to next instruction after Spin function..... >>>>>>>>>>> branch doesn t push pc to stack so taht s my problem;)
[ snip ] Niklas Holsti wrote:
>>>> Agreed. For most embedded compilers, though, anything in a separate >>>> compilation is considered "foreign". But as you say, there is no >>>> guarantee in general.
David Brown wrote:
>>> These days, full program optimisation is not uncommon. Even gcc >>> (despite its critics' opinions) can do reasonable full program >>> optimisation by compiling all the C modules in one shot.
Niklas Holsti wrote:
>> Sure, but in that case it would not be "separate compilation".
And David Brown wrote:
> Fair enough. > > What about gcc 4.5 with -flto ? Then you can compile C modules > separately into object files, but the object files hold a copy of the > internal trees as well as generated object code. When you link these > object files, the trees are used for link-time optimisation, including > inlining across modules.
That is interesting info on gcc, and new to me (I don't follow gcc development that closely). Thanks, David, this is definitely something one needs to know about (future) gcc.
> You lose all clarity in the definitions of > "compile", "link", and "separate compilation".
Yes indeed, at least for the purpose of ensuring a standard calling sequence is used for a given function.
> But that is a > digression, especially since the msp430 gcc port is not (yet) updated to > gcc 4.5, which is itself not yet released.
And the OP specifically asked about the IAR MSP430 compiler, anyway.
>> Interesting question, though: Is there a standard way in a C >> environment to ensure that the standard calling sequence is used for >> an extern function, with no C-calling-C optimizations? >> > > I think the only way is by being sure that the compiler can't access the > code for a function declared as "extern". It should not be hard to do, > but you may have to do it explicitly. For example, if you use a > compiler's IDE and project manager, you might have to go out of your way > to force true separate compilation.
I was afraid that would be the answer, as far as it goes. Yuck.
>>>> I'm not so ready to call this "bad design" without knowing more >>>> about the OP's requirements and design. The code generated for Spin >>>> is exactly the kind of tight eternal loop that you often find in a >>>> kernel where the kernel has no ready tasks and waits for an >>>> interrupt. I haven't tried it, but it seems to me that writing this >>>> loop as a conditional, flag-checking one could increase (by a >>>> little) the latency for resuming the right task when an interrupt >>>> happens, compared to resuming the task directly from the interrupt >>>> handler and simply abandoning the tight loop. >>>> >>> >>> Nah, the loop overhead to continually read a flag would be a few >>> cycles at most. The interrupt function overhead to figure out return >>> addresses from the stack will be much, much worse.
There are several things under discussion here: - Whether it makes sense to use a routine Spin, containing a loop (whether conditional or unconditional) as the last thing a thread should call in its time-slice, such that threads are always suspended and resumed only at this point, that is, within the call of Spin. David, I think you have more or less agreed that this is a workable design for a non-preemptive (in my definition) time-sliced system that does not schedule other threads to use the slack left over in one thread's time-slice. I won't say more to defend it at this point. - How difficult or time-consuming is it for an interrupt handler that interrupts the loop in Spin to find the return address of that call of Spin? Assuming that o the thread calls Spin using the normal calling sequence, in which the return address is left on top of the stack, o the code in Spin does not push more data on the stack, and o the handler is written in MSP430 assembly language, then this is just one POP.W instruction, executed after the interrupt handler has popped the saved status register and saved interrupt-point PC from the stack. (I'm not very familiar with the MSP430 instruction set and its interrupt handling, so this may be a bit optimistic. But the MSP430 instruction set is claimed to be strong on stack accesses, so it should not be much harder.) Thus, getting the return address of Spin (under the above assumptions) is quick and well-defined. In fact, the tick interrupt handler could do it smartly as follows: 1) Pop the saved status register. 2) Pop the saved interrupt-point PC, check that it points to the loop in Spin, and then discard it. 3) Push back the saved status register. This makes the two top words on the stack be the resumption PC (the return address for the Spin call) and the saved status register, exactly the state needed for a future RETI to resume this thread. It is not even necessary to get and manipulate the return address for the Spin call. (I'm assuming that each thread has its own stack area.) - Whether the Spin routine can or should be written in C. If the C compiler generates code for Spin and for the calls to Spin that satisfies the above assumptions, it can be written in C. But David is right to say that it is hard to be sure that the assumptions do hold, and will continue to hold, if Spin is written in C. So let's assume that we write Spin in assembly language, which lets us be sure that the assumptions hold. - Whether the thread-resumption latency can be shorter if the Spin loop is unconditional, and the return address of Spin is saved and used as the resumption point (case A), compared to the latency when the Spin loop polls a flag, the address in the interrupted loop is saved and used as the resumption point, and the interrupt handler sets the flag to make the loop terminate (case B). I comment on that below.
>> Let's consider what the kernel has to do, in my guess of the OP's >> design, considering the two cases of (A) an unconditional "eternal" >> loop and (B) a flag-checking loop. >> >> The kernel knows which thread is running. >> >> When the thread finishes its job in this time-slice, it calls Spin, >> expecting to be resumed at the next instruction after the Spin call, >> say instruction R. >> >> The Spin function loops, eating up the rest of the time-slice. >> >> The tick interrupt comes in. >> >> The tick interrupt handler saves the context of the interrupted >> thread. By comparing its PC to the address of the Spin loop, it can >> check that the thread has not overrun its time-slice. At this point: >> > > Note that a sensible Spin function would tell the kernel that it is > finished and entering the spin loop,
Why would the kernel need that information, if it is not going to schedule another thread for the rest of this time-slice?
> rather than leaving the interrupt > handler to figure it out in this fragile way.
I don't see much fragility in it. It is beautifully simple: if the thread finished what it had to do, it is in Spin; otherwise not. (I hope I am lauding the OP here, not my own guess about the design.) And perhaps the OP's kernel actually has an "I_am_done" kernel call, which just ends up in Spin.
>> - For (A) the handler gets the return address of Spin >> by a POP and stores this return address as the resumption >> point for the thread to be suspended. >> > > Assuming, of course, that you've figured out a way to do that safely and > reliably....
See above: a POP instruction. It is safe and reliable, if Spin is written in assembly language.
>> In case (B), the thread is resumed in the middle of the flag-checking >> loop. It still has to read the flag, branch out of the loop, and >> execute a return instruction (effectively a POP from the stack), >> before instruction R is reached. >> > > Yes, you can expect it to take about 3 or 4 instructions before getting > to R. That would still be a lot less time than you spend messing around > getting the address of R in case (A), so case (B) wins here in time.
Getting the address of R takes one POP in case (A). Probably faster than these 3-4 instructions, at least not much slower.
> Remember that all ideas about how case (A) could feasibly be implemented > are based on hobbling the compiler.
No, it is based only on putting the right code in Spin, and ensuring that Spin is called with the standard calling sequence that leaves a return address on top of stack. This is readily and normally done by writing Spin in assembly language. The compiler is not hobbled in the C code parts.
> Write the code correctly (case B), > and you can let the optimiser do its job - that will overwhelm any > conceivable time advantage case A might have had.
It can hardly overwhelm it if Spin contains just the loop -- not much to optimise there. Or do you mean to write the tick interrupt handler in C? I know that some C compilers claim that you can use them to write interrupt handlers, but to me this seems more fragile than writing Spin in C. Especially for an interrupt handler that is meant to switch threads, not just manage som peripheral device and return to the interrupted thread. However, the OP said that Spin in the OP's kernel contains some other things, too, so it's hard to say what the optimiser could do.
> Among other things, > Spin() could be inlined in its calling function and remove most of the > overhead.
What "overhead" are you talking about, David? If Spin's profile is as simple as the OP showed (void Spin (void)), inlining would directly save only one call or branch instruction per time-slice. Perhaps the optimiser could let the thread keep more local data in registers over the (in-lined) Spin call, avoiding some store/load instructions. The IAR MSP430 compiler defines R12-R15 as scratch registers (caller-save) and R4-R11 as preserved registers (callee-save), so an inlined function could increase the available registers from 8 to 12; hard to say if that would be significant. If Spin is inlined we lose the ability to check for time-slice overruns by checking that the interrupted thread is in the unique and only Spin, so I think that Spin should not be inlined. Perhaps you consider this to be one of the "fragile" aspects of this overrun-checking method, but it is not difficult to make sure that Spin is not inlined.
> Even making the great leap of faith that there is a reliable way to get > the desired return address,
Wow, this makes me feel like a preacher. But "raise your eyes to the text above", and believe! :-) There can hardly be a more reliable aspect of a standard calling convention than the presence and location of the return address. > and then making a second leap of faith that
> case A is faster, the concept is /still/ wrong.
That sounds rather dogmatic. If it works and is reliable, why is it "wrong"? Too heretical?
> There is no way that shaving a few cycles off the latency > could justify using this horrible hack.
Although I still think that case A is a bit faster (and feel I have given good reasons above and in my preceding posting) it isn't the main point in favour of case A, the unconditional loop. Given this design of a Spin function in which the threads are suspended and resumed, a flag-polling loop is logically unnecessary: after the tick interrupt, the thread that gets to poll the flag is *the* scheduled thread, so polling the flag is superfluous. I think the design is a neat solution to specific, limited requirements. It is a bit tricky, but interrupt-handling and thread-switching are often tricky. I mean "tricky" in the sense of "a trick", not in the sense of "difficult". -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Reply by David Brown November 26, 20092009-11-26
Niklas Holsti wrote:
> David Brown wrote: >> Niklas Holsti wrote: >>> David Brown wrote: >>>> Niklas Holsti wrote: >>>>> [ Quotations edited severely but hopefully without misattribution.] >>>>> >>>>>>>>> brOS wrote: >>>>>>>>>>> On Mon, 23 Nov 2009 14:19:14 -0600 >>>>>>>>>>> "brOS" <bogdanrosandic@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Dear all, >>>>>>>>>>>> >>>>>>>>>>>> Does anybody knows how to force compiler to use call >>>>>>>>>>>> instruction >>>>>>>>>>>> instead of br(branch)for disassembling function call? >>>>>>>>>>>> It is extremely important for me to specific function is >>>>>>>>>>>> disassembled >>>>>>>>>>>> using call instead of brunch, as compiler always does. >>>>>>>>>>>> >>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>> This is why i need it.... >>>>>>>>>> Function I'm calling have looks something like this: >>>>>>>>>> void Spin(void){ >>>>>>>>>> for(;;){} >>>>>>>>>> } >>>>>>>>>> So if it is disassembled with call before entering in pc will >>>>>>>>>> be saved on >>>>>>>>>> stack and it will point to instruction after function >>>>>>>>>> spin....So I want to >>>>>>>>>> use that pc and to save context so when my scheduler schedule >>>>>>>>>> that task >>>>>>>>>> again it will not continue spinning in that forever loop but >>>>>>>>>> it will jump >>>>>>>>>> to next instruction after Spin function..... >>>>>>>>>> branch doesn t push pc to stack so taht s my problem;) >>>>> >>>> >>>> The OP has given us very little information to go on - a lot of what >>>> we both are writing about is speculation (and I am just as likely to >>>> guess incorrectly as you). >>> >>> Yes. >>> >>> I wrote my answer assuming that the OP knows what he or she is doing >>> but was concerned that the branch instruction might not leave a good >>> return address on the stack. >>> >> >> I don't think "the OP knows what he or she is doing" is a fair >> assumption, based on the posted code for Spin() ! >> >>>> However, since an infinite loop can clearly never be broken without >>>> pre-emption, I am assuming he /does/ want pre-emption. >>> >>> I would not call it pre-emption, but interruption. To me, pre-emption >>> means suspending a task at some arbitrary point in its execution and >>> switching control to another task. In the OP's code, the Spin >>> function seems to be the expected place for suspending and resuming >>> the task, so the task is prepared for it, at that point. This looks >>> like co-operative multi-tasking. >>> >> >> It's possible (or maybe even likely) that the OP is /trying/ to >> implement a co-operative scheduler. But it doesn't actually >> co-operate - an eternal loop is not co-operative, even if it you cheat >> and break out using interrupts. Interrupts are inherently >> asynchronous - if the thread can be suspended by an interrupt >> function, that is pre-emptive multitasking. > > Well, what constitutes "co-operation" may be a matter of precise > definition (in real life, sometimes of litigation :-). In my guess of > the OP's kernel/scheduler design, the suspension is designed to happen > only when the thread is looping in the Spin function. By calling Spin > the thread shows that it is ready to be suspended, so it is co-operating > in my view. (As discussed earlier, we don't know what happens if the > scheduler interrupt finds the thread is *not* in Spin.) > >>> Agreed. For most embedded compilers, though, anything in a separate >>> compilation is considered "foreign". But as you say, there is no >>> guarantee in general. >>> >> >> These days, full program optimisation is not uncommon. Even gcc >> (despite its critics' opinions) can do reasonable full program >> optimisation by compiling all the C modules in one shot. > > Sure, but in that case it would not be "separate compilation". >
Fair enough. What about gcc 4.5 with -flto ? Then you can compile C modules separately into object files, but the object files hold a copy of the internal trees as well as generated object code. When you link these object files, the trees are used for link-time optimisation, including inlining across modules. You lose all clarity in the definitions of "compile", "link", and "separate compilation". But that is a digression, especially since the msp430 gcc port is not (yet) updated to gcc 4.5, which is itself not yet released.
> Interesting question, though: Is there a standard way in a C environment > to ensure that the standard calling sequence is used for an extern > function, with no C-calling-C optimizations? >
I think the only way is by being sure that the compiler can't access the code for a function declared as "extern". It should not be hard to do, but you may have to do it explicitly. For example, if you use a compiler's IDE and project manager, you might have to go out of your way to force true separate compilation.
>>> I'm not so ready to call this "bad design" without knowing more about >>> the OP's requirements and design. The code generated for Spin is >>> exactly the kind of tight eternal loop that you often find in a >>> kernel where the kernel has no ready tasks and waits for an >>> interrupt. I haven't tried it, but it seems to me that writing this >>> loop as a conditional, flag-checking one could increase (by a little) >>> the latency for resuming the right task when an interrupt happens, >>> compared to resuming the task directly from the interrupt handler and >>> simply abandoning the tight loop. >>> >> >> Nah, the loop overhead to continually read a flag would be a few >> cycles at most. The interrupt function overhead to figure out return >> addresses from the stack will be much, much worse. > > Let's consider what the kernel has to do, in my guess of the OP's > design, considering the two cases of (A) an unconditional "eternal" loop > and (B) a flag-checking loop. > > The kernel knows which thread is running. > > When the thread finishes its job in this time-slice, it calls Spin, > expecting to be resumed at the next instruction after the Spin call, say > instruction R. > > The Spin function loops, eating up the rest of the time-slice. > > The tick interrupt comes in. > > The tick interrupt handler saves the context of the interrupted thread. > By comparing its PC to the address of the Spin loop, it can check that > the thread has not overrun its time-slice. At this point: >
Note that a sensible Spin function would tell the kernel that it is finished and entering the spin loop, rather than leaving the interrupt handler to figure it out in this fragile way.
> - For (A) the handler gets the return address of Spin > by a POP and stores this return address as the resumption > point for the thread to be suspended. >
Assuming, of course, that you've figured out a way to do that safely and reliably....
> - For (B) the handler stores the interrupted PC (in > the flag-checking loop) as the resumption point. >
This bit will typically require some assembly, compiler-specific features, or some knowledge of the way the compiler generates interrupt routines. But that's unavoidable when you have an interrupt-based scheduler.
> The interrupt handler (scheduler) finds the thread to run in the next > time-slice. In case (B) it then sets the (thread-specific) flag on which > Spin is waiting. In case (A) it does not need to set any flag. >
Fair enough, although setting a flag is exactly a hard job, and can be done within standard C.
> The handler restores the context of the new thread. As the last step in > this, it pushes the resumption address and the restored status register > and does return-from-interrupt (RETI). >
OK.
> In case (A), the thread is resumed immediately at the desired > instruction, the instruction R that follows the Spin call. >
Again assuming that is it is possible to figure out the address of R in a reliable way...
> In case (B), the thread is resumed in the middle of the flag-checking > loop. It still has to read the flag, branch out of the loop, and execute > a return instruction (effectively a POP from the stack), before > instruction R is reached. >
Yes, you can expect it to take about 3 or 4 instructions before getting to R. That would still be a lot less time than you spend messing around getting the address of R in case (A), so case (B) wins here in time.
> In summary, case (A) and case (B) both have to POP the stack to get to > instruction R, but case (B) also has to set a flag and check a flag. It > is a close call, but you might save some cycles in case (A). Morever, in > case (B) the flag has to be thread-specific, so it has to be passed to > Spin with a parameter, consuming more cycles. >
Remember that all ideas about how case (A) could feasibly be implemented are based on hobbling the compiler. Write the code correctly (case B), and you can let the optimiser do its job - that will overwhelm any conceivable time advantage case A might have had. Among other things, Spin() could be inlined in its calling function and remove most of the overhead. Even making the great leap of faith that there is a reliable way to get the desired return address, and then making a second leap of faith that case A is faster, the concept is /still/ wrong. There is no way that shaving a few cycles off the latency could justify using this horrible hack. If those cycles matter, you need a new design.