brOS [Bogdan Rosandic] wrote:

 > Task_Suspend_no_sched should looks like this:
 >
 > void Task_Suspend_no_sched(void){
 >   change_state();
 >   put_in_suspend_list();
 >   save_context();//it should use PC placed on stack by
 >                 //  Task_Suspend_no_sched  call
 >   //_bis(LPM1); or loop like below
 >   for(;;){
 >     Idlecnt++
 >   }

Marc Jet wrote:
> brOS [Bogdan Rosandic] wrote:
>> So ISR which provide system tick should interrupt loop or low power mode,
>> and then scheduler should be called. When Scheduler schedule this task
>> again it should jump at the beginning of the task.
> 
> Then all the comments about your approach (being inappropriate) where
> dead-on.

Bogdan's extended description of his design closely matches my guess of 
his design, as far as I can tell. Therefore I don't agree with Marc's 
conclusion. I think Bogdan's approach is reasonable, except that it is 
risky to use C for code such as save_context() and the unconditional 
spin loop. But perhaps Bogdan is only showing C-like pseudocode?

There is one possibly significant difference between Bogdan's 
description and my guess. The example task that Bogdan shows:

brOS [Bogdan Rosandic] wrote:

 > Task routine should look like this :
 >
 > void TaskRoutine(void){
 >    while(1){
 >
 >      //do some work
 >
 >      Task_Suspend_no_sched();
 >    }
 >
 > }

contains just one call of Task_Suspend_no_sched, at the end of the 
eternal while(1) loop. If this is true of all tasks in Bogdan's 
application, the design could be simplified by inverting the control, so 
that instead of TaskRoutine calling the kernel/Scheduler through 
Task_Suspend_no_sched, and the Scheduler then resuming the TaskRoutine 
after this call, the Scheduler could always call TaskRoutine at its 
entry point, and the TaskRoutine could return to the kernel after doing 
its work. The while(1) in TaskRoutine would be removed as would the call 
of Task_Suspend_no_sched, and their functions would be taken over by 
things that the kernel/Scheduler does in between calls of TaskRoutine. 
The TaskRoutine would be just

    void TaskRoutine(void){
       //do some work
    }

A TaskRoutine with a single call of Task_Suspend_no_sched corresponds to 
a very strict design rule for real-time systems called the single 
suspension point rule. By this rule, each task shall have a single point 
where it can be suspended and resumed. The rule is good for 
schedulability analysis, but can be difficult to use if the task must 
perform complex timing or interaction sequences, because the task must 
then use data variables to remember what it is doing -- how far the 
sequence has advanced -- perhaps by means of a finite-state automaton.

> Your function Task_Suspend_no_sched() should really be named something
> like WaitForTick().  And it should do just that: wait for the next
> system tick.

Marc, did you not note that the Scheduler may schedule *another* task, 
not simply continue the one that is "waiting for the next system tick" 
in your terms? So the waiting task may be waiting for a longer time, 
several ticks, and it is necessary to save its context, including the 
PC. One way to get the PC is to retrieve the return address for 
Task_Suspend_no_sched, as Bogdan does. Another way is to put the task in 
a loop until it is interrupted, then retrieve the PC of the interrupt 
point (in other words, the return address of the interrupt handler).

> The implementation does not require breaking out of endless loops or
> other fancy stuff.  It is basic task switching theory, descibed in OS
> literature and all over the internet.

Any kernel call that can suspend the calling task must retrieve the PC, 
which seems to be the kind of "fancy stuff" that Marc means.

As for the endless loop, most schedulers can encounter a situation where 
no real task is ready to do any real work until the next interrupt 
happens. There are several ways to deal with this:

- Use a special instruction or configuration that halts, idles, or 
powers-down the processor until an interrupt comes in. This is what 
Bogdan plans to do in the real system. (As an aside, this can have a 
nasty drawback: in one project where I was involved and it was tried, 
the resulting square-wave variation in the processor's power consumption 
disturbed the sensitive analog electronics on the board, so we had to 
use an eternal loop instead. That was not an MSP430, however, but a 
space-qualified 80C32, so it used rather more power.)

- Schedule a lowest-priority null task that contains just an eternal 
loop that does nothing, or perhaps maintains a processor-load indicator 
such as Bogdan's Idlecnt. But this is not so easy in a non-preemptive 
scheduler.

- Use a spin loop in the kernel itself, as Bogdan does. Earlier 
discussion in this thread shows that making this loop conditional is 
logically redundant. The only reason for not using an eternal 
(unconditional) loop would be to avoid confusing the C compiler, which 
is one reason why this loop should be written in assembly language. I 
have seen such eternal null loops in more than one kernel, including 
commercial kernels. I think they are an appropriate solution to this 
requirement.

> Good luck with your project!

Good luck from me, too, Bogdan.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

> So ISR which provide system tick should interrupt loop or low power mode,
> and then scheduler should be called. When Scheduler schedule this task
> again it should jump at the beginning of the task.

Then all the comments about your approach (being inappropriate) where
dead-on.

Your function Task_Suspend_no_sched() should really be named something
like WaitForTick().  And it should do just that: wait for the next
system tick.

The implementation does not require breaking out of endless loops or
other fancy stuff.  It is basic task switching theory, descibed in OS
literature and all over the internet.

Using such a function, all the rest falls nicely into place.

Good luck with your project!
Marc

Hi all.... :)

I 'd like to thank you for your posts because you helped me a lot. And I'm
also sorry because I couldn't reply earlier...
I would also would like to explain what I wanted with my Spin function.
First, it is infinite loop function which increments idle counter only for
testing code in simulator. For real work instead of it I'm using low power
mode.

The Spin function is part of a kernel API call Task_Suspend_no_sched().
This function should suspend task, change its state, put it in a suspended
list, save context and then enter low power mode until next tick, when
scheduler should be called. So I needed that PC to save task's  context in
such way, that when it is scheduled again it could jump outside the Spin,
or to instruction after the _bis(LPM1) or somthing like that. 
So that was idea and finally I realized it like shown below

Task routine should look like this :

void TaskRoutine(void){
   while(1){
     
     //do some work
     
     Task_Suspend_no_sched();
   }

} 

Task_Suspend_no_sched should looks like this:

void Task_Suspend_no_sched(void){
  change_state();
  put_in_suspend_list();
  save_context();//it should use PC placed on stack by
                //  Task_Suspend_no_sched  call
  //_bis(LPM1); or loop like below
  for(;;){
    Idlecnt++
  }

}
So ISR which provide system tick should interrupt loop or low power mode,
and then scheduler should be called. When Scheduler schedule this task
again it should jump at the beginning of the task.

Thank you for your comments and suggestions, it was very useful. :)  	   
					
---------------------------------------		
This message was sent using the comp.arch.embedded web interface on
http://www.EmbeddedRelated.com

Hi all.... :)

I 'd like to thank you for your posts because you helped me a lot. And I'm
also sorry because I couldn't reply earlier...
I would also would like to explain what I wanted with my Spin function.
First, it is infinite loop function which increments idle counter only for
testing code in simulator. For real work instead of it I'm using low power
mode.

The Spin function is part of a kernel API call Task_Suspend_no_sched().
This function should suspend task, change its state, put it in a suspended
list, save context and then enter low power mode until next tick, when
scheduler should be called. So I needed that PC to save task's  context in
such way, that when it is scheduled again it could jump outside the Spin,
or to instruction after the _bis(LPM1) or somthing like that. 
So that was idea and finally I realized it like shown below

Task routine should look like this :

void TaskRoutine(void){
   while(1){
     
     //do some work
     
     Task_Suspend_no_sched();
   }

} 

Task_Suspend_no_sched should looks like this:

void Task_Suspend_no_sched(void){
  change_state();
  put_in_suspend_list();
  save_context();//it should use PC placed on stack by
                //  Task_Suspend_no_sched  call
  //_bis(LPM1); or loop like below
  for(;;){
    Idlecnt++
  }

}
So ISR which provide system tick should interrupt loop or low power mode,
and then scheduler should be called. When Scheduler schedule this task
again it should jump at the beginning of the task.

Thank you for your comments and suggestions, it was very useful. :)  	   
					
---------------------------------------		
This message was sent using the comp.arch.embedded web interface on
http://www.EmbeddedRelated.com

David Brown wrote:
> Niklas Holsti wrote:
>> David Brown wrote:
>>> Niklas Holsti wrote:
>>>> David Brown wrote:
>>>>> Niklas Holsti wrote:
>>>>>> David Brown wrote:
>>>>>>> Niklas Holsti wrote:
>>>>>>>> [ Quotations edited severely but hopefully without misattribution.]
>>>>>>>>
>>>>>>>>>>>> brOS wrote:
>>>>>>>>>>>>>> On Mon, 23 Nov 2009 14:19:14 -0600
>>>>>>>>>>>>>> "brOS" <bogdanrosandic@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does anybody knows how to force compiler to use call 
>>>>>>>>>>>>>>> instruction
>>>>>>>>>>>>>>> instead of br(branch)for disassembling function call?
>>>>>>>>>>>>>>> It is extremely important for me to specific function is 
>>>>>>>>>>>>>>> disassembled
>>>>>>>>>>>>>>> using call instead of brunch, as compiler always does.
>>>>>>>>>>>>>>>
>>>>>>>>>>>> ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>> This is why i need it....
>>>>>>>>>>>>> Function I'm calling have looks something like this:
>>>>>>>>>>>>> void Spin(void){
>>>>>>>>>>>>> for(;;){}
>>>>>>>>>>>>> }
>>>>>>>>>>>>> So if it is disassembled with call before entering in pc 
>>>>>>>>>>>>> will be saved on
>>>>>>>>>>>>> stack and it will point to instruction after function 
>>>>>>>>>>>>> spin....So I want to
>>>>>>>>>>>>> use that pc and to save context so when my scheduler 
>>>>>>>>>>>>> schedule that task
>>>>>>>>>>>>> again it will not continue spinning in that forever loop 
>>>>>>>>>>>>> but it will jump
>>>>>>>>>>>>> to next instruction after Spin function.....
>>>>>>>>>>>>>  branch doesn t push pc to stack so taht s my problem;)        
>>

   [ snip lots ]

>> Assuming that
>>
>>   o  the thread calls Spin using the normal calling sequence, in
>>      which the return address is left on top of the stack,
>>
> 
> You can't assume that (otherwise the OP would never have asked the 
> question in the first place...), although we can probably assume this 
> can be forced in some way.

The OP was specifically asking how to force the use of a normal calling 
sequence. The OP thought that a branch instruction did not represent a 
normal calling sequence, and it doesn't, except when it is implementing 
a tail call, as I think was the case in the OP's problem.

>>   o  the code in Spin does not push more data on the stack, and
>>
> 
> It may make sense for the end-of-time-slice function to do more than 
> just spin.  Then it may have to make a separate call to Spin.

Agreed. Or pop whatever it has pushed on the stack, before spinning. Or 
even better, save the context of the thread before spinning, which would 
really decrease the thread-resumption latency.

>>   o  the handler is written in MSP430 assembly language,
>>
> 
> That's a big requirement, and totally unnecessary except as a way to 
> implement this bad idea.

Pooh. You have admitted that parts of a thread switch must be written in 
assembly language. So this code is in that part.

> Don't forget the comparisons to check that you are in the spin loop.

Yep. But if you want a check for time-slice overrun, that has to be done 
somehow, perhaps with a flag. Both checks are about equally fast, I 
think. If you don't want to check for time-slice overrun, you need a 
pre-emptive scheduler.

>> If Spin is inlined we lose the ability to check for time-slice overruns
>> by checking that the interrupted thread is in the unique and only Spin, 
> 
> Yes, but you don't /need/ to check call or return addresses if you write 
> proper C code...

But if you still want to check for time-slice overrun you have to use 
flags, and watch out for race conditions. The amount of inlined code is 
growing...

>>> Even making the great leap of faith that there is a reliable way to 
>>> get the desired return address,
>>
>> Wow, this makes me feel like a preacher. But "raise your eyes to the 
>> text above", and believe! :-)
>>
> 
> I think we'll just have to agree to disagree on this one.

OK.

>> the concept is /still/ wrong.
>>
>> That sounds rather dogmatic. If it works and is reliable, why is it 
>> "wrong"? Too heretical?
>>
> 
> Yes, it's against my religion :-)  You don't write hacked code based on 
> lying to the compiler, assembly code, and stack manipulation tricks when 
> there are perfectly safe, efficient and reliable ways to do the same job 
> with C.

Except for the assembly code that you need to switch threads.

The only "lying" that has been done here is writing the eternal Spin 
loop in C, from which the compiler could deduce that no Spin call 
returns. I think Spin should be written in assembly language and 
considered part of the thread-switching code.

In fact (but don't take this too grievously :-), the only non-standard C 
code for "stack manipulation tricks" was shown by you, when you referred 
to the special gcc function for getting the return address.

>  And yes, I know I am pontificating 
> - doesn't that beat preaching?

Good one, David! Luckily I'm an atheist... well, perhaps an agnostic for 
the purposes of this thread.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Marc Jet wrote:
> Niklas Holsti ha escrito:
>> The OP seems to know perfectly well how to break
>> out of this loop by changing the PC in the scheduler (when the looping
>> code is interrupted).
> 
> Having seen this thread up to this point, I think the key point is
> summarized in the quote.
> 
> You assume that the OP has a very thorough knowledge of what he's
> doing and why.  To me, his questions show that this might not be the
> case!

You may of course be right, Marc. My impression is based on two things: 
the OP seems to have a workable design for the scheduler, and the OP 
seems able to read and understand the assembly-language code the 
compiler has generated. The OP's worry was only that a branch 
instruction would not leave a return address that the scheduler could 
use to resume the thread. I think that the OP did not know about tail 
calls implemented as branches, or did not remember this possibility.

> To mention a few points that support this impression of mine:
> 
> - What's the behaviour of a function that does not return, when it
> suddenly returns?  To all my knowledge the behaviour will be
> undefined.  You can't complain when undefined behaviour turns out to
> not be what you expected.

The behaviour is defined by the code that the C compiler generates, and 
which the OP seems to have inspected. Of course, it is risky to rely on 
future compilations giving the same code. Writing Spin in assembly 
language would remove that risk.

I do understand that your question is about behaviour as defined in the 
C standard, but this is not a pure C program.

> - How should a scheduler detect that the interrupted task is in fact
> in the spin loop?  Should it use the linker map to find the code
> address (and what about optimizations)? Should it match instruction
> patterns (and how do you force those to be generated by the
> compiler)?  In my opinion there is no safe way to detect this, unless
> the function "helps" the scheduler (for example using a Yield() call,
> global vars, etc).

As I understood it, the Spin function is part of the OP's 
kernel/scheduler. If the loop in Spin is of the form "lab: jump lab", 
the scheduler interrupt handler can compare the PC at the interrupt 
point to the address of the label "lab". If Spin is written in assembly 
language, "lab" can be defined as a global symbol so its address is 
accessible to the scheduler as a constant. Or the Spin module can define 
a globally visible data word that holds the address of "lab".

But I don't know how, or even if, the OP intends to check that the task 
has reached Spin. Perhaps the OP's Spin sets a flag before entering the 
loop. Anyway, the check can be done easily and quickly, whether using a 
flag or using the PC.

> Since we're talking about a C program, we are constrained by the rules
> of C.

We are talking about a program consisting of some C code, divided into 
several threads/tasks (which is outside C semantics, I believe), plus a 
thread scheduler (also outside C semantics), probably implemented by a 
tick interrupt handler. Typical multi-threaded embedded program.

>  We programmers exhaust those rules to our advantage, and the
> compiler writers exhaust "their side" of the rules to make better
> compilers.  Therefore we can't break the rules, and C is not the
> correct tool for what the OP wants (IF it really is what he wants).

That would certainly be my approach -- for my own purposes, I am not at 
all an "anything that works is good" programmer; I usually write in Ada, 
and enjoy it. I would write Spin in assembly language, as well as any 
coding-sensitive parts of the scheduler.

> The clean way is to use any language or tool to create the desired
> functionality and encapsulate it into a linkable object.  Only then it
> is compatible to the C portion of the design, and will stay so until
> the compiler calling conventions change.

I agree fully, but I did not want to lecture the OP, only answer the 
OP's question. I did advise the OP to use separate compilation for Spin. 
In retrospect, I should have advised the OP to use assembly language for 
Spin.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Niklas Holsti wrote:
> David Brown wrote:
>> Niklas Holsti wrote:
>>> David Brown wrote:
>>>> Niklas Holsti wrote:
>>>>> David Brown wrote:
>>>>>> Niklas Holsti wrote:
>>>>>>> [ Quotations edited severely but hopefully without misattribution.]
>>>>>>>
>>>>>>>>>>> brOS wrote:
>>>>>>>>>>>>> On Mon, 23 Nov 2009 14:19:14 -0600
>>>>>>>>>>>>> "brOS" <bogdanrosandic@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does anybody knows how to force compiler to use call 
>>>>>>>>>>>>>> instruction
>>>>>>>>>>>>>> instead of br(branch)for disassembling function call?
>>>>>>>>>>>>>> It is extremely important for me to specific function is 
>>>>>>>>>>>>>> disassembled
>>>>>>>>>>>>>> using call instead of brunch, as compiler always does.
>>>>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>>>
>>>>>>>>>>>> This is why i need it....
>>>>>>>>>>>> Function I'm calling have looks something like this:
>>>>>>>>>>>> void Spin(void){
>>>>>>>>>>>> for(;;){}
>>>>>>>>>>>> }
>>>>>>>>>>>> So if it is disassembled with call before entering in pc 
>>>>>>>>>>>> will be saved on
>>>>>>>>>>>> stack and it will point to instruction after function 
>>>>>>>>>>>> spin....So I want to
>>>>>>>>>>>> use that pc and to save context so when my scheduler 
>>>>>>>>>>>> schedule that task
>>>>>>>>>>>> again it will not continue spinning in that forever loop but 
>>>>>>>>>>>> it will jump
>>>>>>>>>>>> to next instruction after Spin function.....
>>>>>>>>>>>>  branch doesn t push pc to stack so taht s my problem;)        
> 
>  [ snip ]
> 
> Niklas Holsti wrote:
>>>>> Agreed. For most embedded compilers, though, anything in a separate 
>>>>> compilation is considered "foreign". But as you say, there is no 
>>>>> guarantee in general.
> 
> David Brown wrote:
>>>> These days, full program optimisation is not uncommon.  Even gcc 
>>>> (despite its critics' opinions) can do reasonable full program 
>>>> optimisation by compiling all the C modules in one shot.
> 
> Niklas Holsti wrote:
>>> Sure, but in that case it would not be "separate compilation".
> 
> And David Brown wrote:
>> Fair enough.
>>
>> What about gcc 4.5 with -flto ?  Then you can compile C modules 
>> separately into object files, but the object files hold a copy of the 
>> internal trees as well as generated object code.  When you link these 
>> object files, the trees are used for link-time optimisation, including 
>> inlining across modules.
> 
> That is interesting info on gcc, and new to me (I don't follow gcc 
> development that closely). Thanks, David, this is definitely something 
> one needs to know about (future) gcc.
> 

See <http://gcc.gnu.org/gcc-4.5/changes.html> and 
<http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flto-796> 
(and also <http://gcc.gnu.org/> more generally).

It was news to me too that the LTO (link time optimisation) branch of 
gcc had been merged with the main development line.  I've known of its 
existence for a long time, but for many years it has been (or appeared 
to be) a bit of a blue-sky project with a lot of ideas and limited 
working code.

It will still take a while before LTO rolls down to gcc compilers 
popular in c.a.e.  gcc 4.5 is in stage 3 (no new features, bug fix and 
testing) - it will be a early next year before we can expect a first 
release.  It will take a while from then for Code Sourcery to qualify 
and verify it thoroughly on their targets, and the 32-bit embedded gcc 
suppliers will pick it up from there.  Smaller ports, such as avr-gcc, 
will take longer - they have fewer developers and resources.  For 
out-of-tree ports such as the msp430, it depends entirely on what the 
developers want to prioritise.


At the moment (gcc 4.3), using -combine and -fwhole-program can get you 
quite a lot of these effects for C programming.  Basically, it treats 
all the files in the program as a single big C file with everything 
declared "static".

LTO will give several advantages on top of that.  Files can be 
individually compiled - useful for large projects, when files are in 
different directories, or when you want different compiler options. 
Libraries can also have LTO information.  You can use languages other 
than C (for example, C++), and mix them together.  And gcc 4.5 has a 
number of new optimisations that are only relevant for whole-program 
compilation.


>>  You lose all clarity in the definitions of "compile", "link", and 
>> "separate compilation".
> 
> Yes indeed, at least for the purpose of ensuring a standard calling
> sequence is used for a given function.
> 
>> But that is a digression, especially since the msp430 gcc port is not 
>> (yet) updated to gcc 4.5, which is itself not yet released.
> 
> And the OP specifically asked about the IAR MSP430 compiler, anyway.
> 
>>> Interesting question, though: Is there a standard way in a C 
>>> environment to ensure that the standard calling sequence is used for 
>>> an extern function, with no C-calling-C optimizations?
>>>
>>
>> I think the only way is by being sure that the compiler can't access 
>> the code for a function declared as "extern".  It should not be hard 
>> to do, but you may have to do it explicitly.  For example, if you use 
>> a compiler's IDE and project manager, you might have to go out of your 
>> way to force true separate compilation.
> 
> I was afraid that would be the answer, as far as it goes. Yuck.
> 

It is like a lot of things in C development - finding a general, 
portable, standards-compliant solution is hard, even though making it 
work in a real life project is typically very easy.  The trick is to 
find a balance for a solution that is general enough without being 
overly complicated.

>>>>> I'm not so ready to call this "bad design" without knowing more 
>>>>> about the OP's requirements and design. The code generated for Spin 
>>>>> is exactly the kind of tight eternal loop that you often find in a 
>>>>> kernel where the kernel has no ready tasks and waits for an 
>>>>> interrupt. I haven't tried it, but it seems to me that writing this 
>>>>> loop as a conditional, flag-checking one could increase (by a 
>>>>> little) the latency for resuming the right task when an interrupt 
>>>>> happens, compared to resuming the task directly from the interrupt 
>>>>> handler and simply abandoning the tight loop.
>>>>>
>>>>
>>>> Nah, the loop overhead to continually read a flag would be a few 
>>>> cycles at most.  The interrupt function overhead to figure out 
>>>> return addresses from the stack will be much, much worse.
> 
> There are several things under discussion here:
> 
> - Whether it makes sense to use a routine Spin, containing a loop
> (whether conditional or unconditional) as the last thing a thread should
> call in its time-slice, such that threads are always suspended and
> resumed only at this point, that is, within the call of Spin.
> 
> David, I think you have more or less agreed that this is a workable
> design for a non-preemptive (in my definition) time-sliced system that
> does not schedule other threads to use the slack left over in one
> thread's time-slice. I won't say more to defend it at this point.
> 

OK.

> - How difficult or time-consuming is it for an interrupt handler that
> interrupts the loop in Spin to find the return address of that call of 
> Spin?
> 
> Assuming that
> 
>   o  the thread calls Spin using the normal calling sequence, in
>      which the return address is left on top of the stack,
> 

You can't assume that (otherwise the OP would never have asked the 
question in the first place...), although we can probably assume this 
can be forced in some way.

>   o  the code in Spin does not push more data on the stack, and
> 

It may make sense for the end-of-time-slice function to do more than 
just spin.  Then it may have to make a separate call to Spin.

>   o  the handler is written in MSP430 assembly language,
> 

That's a big requirement, and totally unnecessary except as a way to 
implement this bad idea.

> then this is just one POP.W instruction, executed after the interrupt
> handler has popped the saved status register and saved interrupt-point 
> PC from the stack. (I'm not very familiar with the MSP430 instruction 
> set and its interrupt handling, so this may be a bit optimistic. But the 
> MSP430 instruction set is claimed to be  strong on stack accesses, so it 
> should not be much harder.)
> 

That is mostly true (you can't just pop the top of the stack, because 
the interrupt function must first preserve a register or two - but as 
you say the msp430 has good stack access instructions), given your 
assumptions.  But as I noted above, the assumptions are not reasonable, 
IMHO.

> Thus, getting the return address of Spin (under the above assumptions)
> is quick and well-defined.
> 
> In fact, the tick interrupt handler could do it smartly as follows:
> 
> 1) Pop the saved status register.
> 2) Pop the saved interrupt-point PC, check that it points to the loop in 
> Spin, and then discard it.
> 3) Push back the saved status register.
> 

Once you have taken into account saving a working register or two (easy 
enough), then that's a fairly elegant implementation of a very ugly hack.

> This makes the two top words on the stack be the resumption PC (the 
> return address for the Spin call) and the saved status register, exactly 
> the state needed for a future RETI to resume this thread. It is not even 
> necessary to get and manipulate the return address for the Spin call. 
> (I'm assuming that each thread has its own stack area.)
> 
> - Whether the Spin routine can or should be written in C.
> 
> If the C compiler generates code for Spin and for the calls to Spin that 
> satisfies the above assumptions, it can be written in C. But David is 
> right to say that it is hard to be sure that the assumptions do hold, 
> and will continue to hold, if Spin is written in C. So let's assume that 
> we write Spin in assembly language, which lets us be sure that the 
> assumptions hold.
> 
> - Whether the thread-resumption latency can be shorter if the Spin loop
> is unconditional, and the return address of Spin is saved and used as 
> the resumption point (case A), compared to the latency when the Spin 
> loop polls a flag, the address in the interrupted loop is saved and used 
> as the resumption point, and the interrupt handler sets the flag to make 
> the loop terminate (case B).
> 
> I comment on that below.
> 
>>> Let's consider what the kernel has to do, in my guess of the OP's 
>>> design, considering the two cases of (A) an unconditional "eternal" 
>>> loop and (B) a flag-checking loop.
>>>
>>> The kernel knows which thread is running.
>>>
>>> When the thread finishes its job in this time-slice, it calls Spin, 
>>> expecting to be resumed at the next instruction after the Spin call, 
>>> say instruction R.
>>>
>>> The Spin function loops, eating up the rest of the time-slice.
>>>
>>> The tick interrupt comes in.
>>>
>>> The tick interrupt handler saves the context of the interrupted 
>>> thread. By comparing its PC to the address of the Spin loop, it can 
>>> check that the thread has not overrun its time-slice. At this point:
>>>
>>
>> Note that a sensible Spin function would tell the kernel that it is 
>> finished and entering the spin loop,
> 
> Why would the kernel need that information, if it is not going to 
> schedule another thread for the rest of this time-slice?
> 

If this system is for scheduling important or time-critical tasks, and 
there is no prioritised pre-emption, it is very likely that you need to 
track when the task's work is done for testing and verification, or for 
tracking errors.

But you are correct that the kernel might not /need/ that information.

>> rather than leaving the interrupt handler to figure it out in this 
>> fragile way.
> 
> I don't see much fragility in it. It is beautifully simple: if the
> thread finished what it had to do, it is in Spin; otherwise not. (I hope 
> I am lauding the OP here, not my own guess about the design.)
> 

Working around the compiler in this way /is/ fragile.  It is a hack, and 
it is dependent on details of the compiler, the processor, the stack 
structure, it requires assembly for what should be simple C code, and it 
hinders the compiler's optimiser.

However, I have to agree that you have come up with a simple 
implementation of this design (and I am lauding /your/ implementation 
here, not the OP's bad design, or our guesses about it).

> And perhaps the OP's kernel actually has an "I_am_done" kernel call,
> which just ends up in Spin.
> 
>>> - For (A) the handler gets the return address of Spin
>>>   by a POP and stores this return address as the resumption
>>>   point for the thread to be suspended.
>>>
>>
>> Assuming, of course, that you've figured out a way to do that safely 
>> and reliably....
> 
> See above: a POP instruction. It is safe and reliable, if Spin is
> written in assembly language.
> 
>>> In case (B), the thread is resumed in the middle of the flag-checking 
>>> loop. It still has to read the flag, branch out of the loop, and 
>>> execute a return instruction (effectively a POP from the stack), 
>>> before instruction R is reached.
>>>
>>
>> Yes, you can expect it to take about 3 or 4 instructions before 
>> getting to R.  That would still be a lot less time than you spend 
>> messing around getting the address of R in case (A), so case (B) wins 
>> here in time.
> 
> Getting the address of R takes one POP in case (A). Probably faster than 
> these 3-4 instructions, at least not much slower.
> 

Don't forget the comparisons to check that you are in the spin loop.

>> Remember that all ideas about how case (A) could feasibly be 
>> implemented are based on hobbling the compiler.
> 
> No, it is based only on putting the right code in Spin, and ensuring 
> that Spin is called with the standard calling sequence that leaves a 
> return address on top of stack. This is readily and normally done by 
> writing Spin in assembly language. The compiler is not hobbled in the C 
> code parts.
> 

A function as simple as Spin case B would often be inlined into the 
calling function (either explicitly, or via whole-program optimisation). 
  Code like that is smaller as well as faster when inlined.

>> Write the code correctly (case B), and you can let the optimiser do 
>> its job - that will overwhelm any conceivable time advantage case A 
>> might have had.
> 
> It can hardly overwhelm it if Spin contains just the loop -- not much to 
> optimise there. 

The code calling Spin can be better optimised if Spin is a proper C 
function, and the complier knows its definition.

 > Or do you mean to write the tick interrupt handler in C?
> I know that some C compilers claim that you can use them to write 
> interrupt handlers, but to me this seems more fragile than writing Spin 
> in C. Especially for an interrupt handler that is meant to switch 
> threads, not just manage som peripheral device and return to the 
> interrupted thread.
> 

The majority of interrupt handlers in embedded systems are written in C 
these days.  For general interrupts, if you are not happy to trust your 
compiler to generate good and safe interrupt code, get a better 
compiler!  But thread switching code will almost certainly need some 
inline assembly at least.

> However, the OP said that Spin in the OP's kernel contains some other 
> things, too, so it's hard to say what the optimiser could do.
> 
>> Among other things, Spin() could be inlined in its calling function 
>> and remove most of the overhead.
> 
> What "overhead" are you talking about, David? 

I am referring to the check of the exit flag that you think makes my 
Spin function too slow compared to your version.

 > If Spin's profile is as
> simple as the OP showed (void Spin (void)), inlining would directly save 
> only one call or branch instruction per time-slice. Perhaps the 
> optimiser could let the thread keep more local data in registers over 
> the (in-lined) Spin call, avoiding some store/load instructions. The IAR 
> MSP430 compiler defines R12-R15 as scratch registers (caller-save) and 
> R4-R11 as preserved registers (callee-save), so an inlined function 
> could increase the available registers from 8 to 12; hard to say if that 
> would be significant.
> 

It is, as you say, hard to be sure - especially if Spin contains other code.

> If Spin is inlined we lose the ability to check for time-slice overruns
> by checking that the interrupted thread is in the unique and only Spin, 

Yes, but you don't /need/ to check call or return addresses if you write 
proper C code...

> so I think that Spin should not be inlined. Perhaps you consider this to 
> be one of the "fragile" aspects of this overrun-checking method, but it 
> is not difficult to make sure that Spin is not inlined.
> 
>> Even making the great leap of faith that there is a reliable way to 
>> get the desired return address,
> 
> Wow, this makes me feel like a preacher. But "raise your eyes to the 
> text above", and believe! :-)
> 

I think we'll just have to agree to disagree on this one.

> There can hardly be a more reliable aspect of a standard calling 
> convention than the presence and location of the return address.
> 
>  > and then making a second leap of faith that
>> case A is faster, the concept is /still/ wrong.
> 
> That sounds rather dogmatic. If it works and is reliable, why is it 
> "wrong"? Too heretical?
> 

Yes, it's against my religion :-)  You don't write hacked code based on 
lying to the compiler, assembly code, and stack manipulation tricks when 
there are perfectly safe, efficient and reliable ways to do the same job 
with C.  It's about writing legible, maintainable, portable code that is 
clear in its purpose and easy to verify.  Even if this is nothing more 
than a simple test program, you should maintain a certain level of 
development quality.

I am not saying that /all/ such hacks are a bad thing - just that you 
have to have very good reason for using them.  Shaving off a few 
processor cycles (if you are correct and you /do/ save time) is very 
seldom a good enough reason.

Just because code is accepted by the compiler, and works in practice, 
does not mean it cannot be *wrong*.  And yes, I know I am pontificating 
- doesn't that beat preaching?

>> There is no way that shaving a few cycles off the latency
>> could justify using this horrible hack.
> 
> Although I still think that case A is a bit faster (and feel I have 
> given good reasons above and in my preceding posting) it isn't the main 
> point in favour of case A, the unconditional loop. Given this design of 
> a Spin function in which the threads are suspended and resumed, a 
> flag-polling loop is logically unnecessary: after the tick interrupt, 
> the thread that gets to poll the flag is *the* scheduled thread, so 
> polling the flag is superfluous.
> 
> I think the design is a neat solution to specific, limited requirements. 
> It is a bit tricky, but interrupt-handling and thread-switching are 
> often tricky. I mean "tricky" in the sense of "a trick", not in the 
> sense of "difficult".
>

Niklas Holsti ha escrito:
> The OP seems to know perfectly well how to break
> out of this loop by changing the PC in the scheduler (when the looping
> code is interrupted).

Having seen this thread up to this point, I think the key point is
summarized in the quote.

You assume that the OP has a very thorough knowledge of what he's
doing and why.  To me, his questions show that this might not be the
case!

Solving his question to the letter will probably not be of much value.

To mention a few points that support this impression of mine:

- What's the behaviour of a function that does not return, when it
suddenly returns?  To all my knowledge the behaviour will be
undefined.  You can't complain when undefined behaviour turns out to
not be what you expected.

- How should a scheduler detect that the interrupted task is in fact
in the spin loop?  Should it use the linker map to find the code
address (and what about optimizations)? Should it match instruction
patterns (and how do you force those to be generated by the
compiler)?  In my opinion there is no safe way to detect this, unless
the function "helps" the scheduler (for example using a Yield() call,
global vars, etc).

Since we're talking about a C program, we are constrained by the rules
of C.  We programmers exhaust those rules to our advantage, and the
compiler writers exhaust "their side" of the rules to make better
compilers.  Therefore we can't break the rules, and C is not the
correct tool for what the OP wants (IF it really is what he wants).

The clean way is to use any language or tool to create the desired
functionality and encapsulate it into a linkable object.  Only then it
is compatible to the C portion of the design, and will stay so until
the compiler calling conventions change.

I say this as programmer who has written schedulers on various
architectures, and coincidentially also a virtual CPU of the
architecture used by the OP (involving detailed inspection of
instruction set and compiler conventions to create efficient hooks
from virtual to physical).

Best regards
Marc

David Brown wrote:
> Niklas Holsti wrote:
>> David Brown wrote:
>>> Niklas Holsti wrote:
>>>> David Brown wrote:
>>>>> Niklas Holsti wrote:
>>>>>> [ Quotations edited severely but hopefully without misattribution.]
>>>>>>
>>>>>>>>>> brOS wrote:
>>>>>>>>>>>> On Mon, 23 Nov 2009 14:19:14 -0600
>>>>>>>>>>>> "brOS" <bogdanrosandic@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does anybody knows how to force compiler to use call 
>>>>>>>>>>>>> instruction
>>>>>>>>>>>>> instead of br(branch)for disassembling function call?
>>>>>>>>>>>>> It is extremely important for me to specific function is 
>>>>>>>>>>>>> disassembled
>>>>>>>>>>>>> using call instead of brunch, as compiler always does.
>>>>>>>>>>>>>
>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>> This is why i need it....
>>>>>>>>>>> Function I'm calling have looks something like this:
>>>>>>>>>>> void Spin(void){
>>>>>>>>>>> for(;;){}
>>>>>>>>>>> }
>>>>>>>>>>> So if it is disassembled with call before entering in pc will 
>>>>>>>>>>> be saved on
>>>>>>>>>>> stack and it will point to instruction after function 
>>>>>>>>>>> spin....So I want to
>>>>>>>>>>> use that pc and to save context so when my scheduler schedule 
>>>>>>>>>>> that task
>>>>>>>>>>> again it will not continue spinning in that forever loop but 
>>>>>>>>>>> it will jump
>>>>>>>>>>> to next instruction after Spin function.....
>>>>>>>>>>>  branch doesn t push pc to stack so taht s my problem;)        

  [ snip ]

Niklas Holsti wrote:
>>>> Agreed. For most embedded compilers, though, anything in a separate 
>>>> compilation is considered "foreign". But as you say, there is no 
>>>> guarantee in general.

David Brown wrote:
>>> These days, full program optimisation is not uncommon.  Even gcc 
>>> (despite its critics' opinions) can do reasonable full program 
>>> optimisation by compiling all the C modules in one shot.

Niklas Holsti wrote:
>> Sure, but in that case it would not be "separate compilation".

And David Brown wrote:
> Fair enough.
> 
> What about gcc 4.5 with -flto ?  Then you can compile C modules 
> separately into object files, but the object files hold a copy of the 
> internal trees as well as generated object code.  When you link these 
> object files, the trees are used for link-time optimisation, including 
> inlining across modules.

That is interesting info on gcc, and new to me (I don't follow gcc 
development that closely). Thanks, David, this is definitely something 
one needs to know about (future) gcc.

>  You lose all clarity in the definitions of 
> "compile", "link", and "separate compilation".

Yes indeed, at least for the purpose of ensuring a standard calling
sequence is used for a given function.

> But that is a 
> digression, especially since the msp430 gcc port is not (yet) updated to 
> gcc 4.5, which is itself not yet released.

And the OP specifically asked about the IAR MSP430 compiler, anyway.

>> Interesting question, though: Is there a standard way in a C 
>> environment to ensure that the standard calling sequence is used for 
>> an extern function, with no C-calling-C optimizations?
>>
> 
> I think the only way is by being sure that the compiler can't access the 
> code for a function declared as "extern".  It should not be hard to do, 
> but you may have to do it explicitly.  For example, if you use a 
> compiler's IDE and project manager, you might have to go out of your way 
> to force true separate compilation.

I was afraid that would be the answer, as far as it goes. Yuck.

>>>> I'm not so ready to call this "bad design" without knowing more 
>>>> about the OP's requirements and design. The code generated for Spin 
>>>> is exactly the kind of tight eternal loop that you often find in a 
>>>> kernel where the kernel has no ready tasks and waits for an 
>>>> interrupt. I haven't tried it, but it seems to me that writing this 
>>>> loop as a conditional, flag-checking one could increase (by a 
>>>> little) the latency for resuming the right task when an interrupt 
>>>> happens, compared to resuming the task directly from the interrupt 
>>>> handler and simply abandoning the tight loop.
>>>>
>>>
>>> Nah, the loop overhead to continually read a flag would be a few 
>>> cycles at most.  The interrupt function overhead to figure out return 
>>> addresses from the stack will be much, much worse.

There are several things under discussion here:

- Whether it makes sense to use a routine Spin, containing a loop
(whether conditional or unconditional) as the last thing a thread should
call in its time-slice, such that threads are always suspended and
resumed only at this point, that is, within the call of Spin.

David, I think you have more or less agreed that this is a workable
design for a non-preemptive (in my definition) time-sliced system that
does not schedule other threads to use the slack left over in one
thread's time-slice. I won't say more to defend it at this point.

- How difficult or time-consuming is it for an interrupt handler that
interrupts the loop in Spin to find the return address of that call of Spin?

Assuming that

   o  the thread calls Spin using the normal calling sequence, in
      which the return address is left on top of the stack,

   o  the code in Spin does not push more data on the stack, and

   o  the handler is written in MSP430 assembly language,

then this is just one POP.W instruction, executed after the interrupt
handler has popped the saved status register and saved interrupt-point 
PC from the stack. (I'm not very familiar with the MSP430 instruction 
set and its interrupt handling, so this may be a bit optimistic. But the 
MSP430 instruction set is claimed to be  strong on stack accesses, so it 
should not be much harder.)

Thus, getting the return address of Spin (under the above assumptions)
is quick and well-defined.

In fact, the tick interrupt handler could do it smartly as follows:

1) Pop the saved status register.
2) Pop the saved interrupt-point PC, check that it points to the loop in 
Spin, and then discard it.
3) Push back the saved status register.

This makes the two top words on the stack be the resumption PC (the 
return address for the Spin call) and the saved status register, exactly 
the state needed for a future RETI to resume this thread. It is not even 
necessary to get and manipulate the return address for the Spin call. 
(I'm assuming that each thread has its own stack area.)

- Whether the Spin routine can or should be written in C.

If the C compiler generates code for Spin and for the calls to Spin that 
satisfies the above assumptions, it can be written in C. But David is 
right to say that it is hard to be sure that the assumptions do hold, 
and will continue to hold, if Spin is written in C. So let's assume that 
we write Spin in assembly language, which lets us be sure that the 
assumptions hold.

- Whether the thread-resumption latency can be shorter if the Spin loop
is unconditional, and the return address of Spin is saved and used as 
the resumption point (case A), compared to the latency when the Spin 
loop polls a flag, the address in the interrupted loop is saved and used 
as the resumption point, and the interrupt handler sets the flag to make 
the loop terminate (case B).

I comment on that below.

>> Let's consider what the kernel has to do, in my guess of the OP's 
>> design, considering the two cases of (A) an unconditional "eternal" 
>> loop and (B) a flag-checking loop.
>>
>> The kernel knows which thread is running.
>>
>> When the thread finishes its job in this time-slice, it calls Spin, 
>> expecting to be resumed at the next instruction after the Spin call, 
>> say instruction R.
>>
>> The Spin function loops, eating up the rest of the time-slice.
>>
>> The tick interrupt comes in.
>>
>> The tick interrupt handler saves the context of the interrupted 
>> thread. By comparing its PC to the address of the Spin loop, it can 
>> check that the thread has not overrun its time-slice. At this point:
>>
> 
> Note that a sensible Spin function would tell the kernel that it is 
> finished and entering the spin loop,

Why would the kernel need that information, if it is not going to 
schedule another thread for the rest of this time-slice?

> rather than leaving the interrupt 
> handler to figure it out in this fragile way.

I don't see much fragility in it. It is beautifully simple: if the
thread finished what it had to do, it is in Spin; otherwise not. (I hope 
I am lauding the OP here, not my own guess about the design.)

And perhaps the OP's kernel actually has an "I_am_done" kernel call,
which just ends up in Spin.

>> - For (A) the handler gets the return address of Spin
>>   by a POP and stores this return address as the resumption
>>   point for the thread to be suspended.
>>
> 
> Assuming, of course, that you've figured out a way to do that safely and 
> reliably....

See above: a POP instruction. It is safe and reliable, if Spin is
written in assembly language.

>> In case (B), the thread is resumed in the middle of the flag-checking 
>> loop. It still has to read the flag, branch out of the loop, and 
>> execute a return instruction (effectively a POP from the stack), 
>> before instruction R is reached.
>>
> 
> Yes, you can expect it to take about 3 or 4 instructions before getting 
> to R.  That would still be a lot less time than you spend messing around 
> getting the address of R in case (A), so case (B) wins here in time.

Getting the address of R takes one POP in case (A). Probably faster than 
these 3-4 instructions, at least not much slower.

> Remember that all ideas about how case (A) could feasibly be implemented 
> are based on hobbling the compiler.

No, it is based only on putting the right code in Spin, and ensuring 
that Spin is called with the standard calling sequence that leaves a 
return address on top of stack. This is readily and normally done by 
writing Spin in assembly language. The compiler is not hobbled in the C 
code parts.

> Write the code correctly (case B), 
> and you can let the optimiser do its job - that will overwhelm any 
> conceivable time advantage case A might have had.

It can hardly overwhelm it if Spin contains just the loop -- not much to 
optimise there. Or do you mean to write the tick interrupt handler in C? 
I know that some C compilers claim that you can use them to write 
interrupt handlers, but to me this seems more fragile than writing Spin 
in C. Especially for an interrupt handler that is meant to switch 
threads, not just manage som peripheral device and return to the 
interrupted thread.

However, the OP said that Spin in the OP's kernel contains some other 
things, too, so it's hard to say what the optimiser could do.

> Among other things, 
> Spin() could be inlined in its calling function and remove most of the 
> overhead.

What "overhead" are you talking about, David? If Spin's profile is as 
simple as the OP showed (void Spin (void)), inlining would directly save 
only one call or branch instruction per time-slice. Perhaps the 
optimiser could let the thread keep more local data in registers over 
the (in-lined) Spin call, avoiding some store/load instructions. The IAR 
MSP430 compiler defines R12-R15 as scratch registers (caller-save) and 
R4-R11 as preserved registers (callee-save), so an inlined function 
could increase the available registers from 8 to 12; hard to say if that 
would be significant.

If Spin is inlined we lose the ability to check for time-slice overruns
by checking that the interrupted thread is in the unique and only Spin, 
so I think that Spin should not be inlined. Perhaps you consider this to 
be one of the "fragile" aspects of this overrun-checking method, but it 
is not difficult to make sure that Spin is not inlined.

> Even making the great leap of faith that there is a reliable way to get 
> the desired return address,

Wow, this makes me feel like a preacher. But "raise your eyes to the 
text above", and believe! :-)

There can hardly be a more reliable aspect of a standard calling 
convention than the presence and location of the return address.

 > and then making a second leap of faith that
> case A is faster, the concept is /still/ wrong.

That sounds rather dogmatic. If it works and is reliable, why is it 
"wrong"? Too heretical?

> There is no way that shaving a few cycles off the latency
> could justify using this horrible hack.

Although I still think that case A is a bit faster (and feel I have 
given good reasons above and in my preceding posting) it isn't the main 
point in favour of case A, the unconditional loop. Given this design of 
a Spin function in which the threads are suspended and resumed, a 
flag-polling loop is logically unnecessary: after the tick interrupt, 
the thread that gets to poll the flag is *the* scheduled thread, so 
polling the flag is superfluous.

I think the design is a neat solution to specific, limited requirements. 
It is a bit tricky, but interrupt-handling and thread-switching are 
often tricky. I mean "tricky" in the sense of "a trick", not in the 
sense of "difficult".

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Niklas Holsti wrote:
> David Brown wrote:
>> Niklas Holsti wrote:
>>> David Brown wrote:
>>>> Niklas Holsti wrote:
>>>>> [ Quotations edited severely but hopefully without misattribution.]
>>>>>
>>>>>>>>> brOS wrote:
>>>>>>>>>>> On Mon, 23 Nov 2009 14:19:14 -0600
>>>>>>>>>>> "brOS" <bogdanrosandic@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>
>>>>>>>>>>>> Does anybody knows how to force compiler to use call 
>>>>>>>>>>>> instruction
>>>>>>>>>>>> instead of br(branch)for disassembling function call?
>>>>>>>>>>>> It is extremely important for me to specific function is 
>>>>>>>>>>>> disassembled
>>>>>>>>>>>> using call instead of brunch, as compiler always does.
>>>>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>> This is why i need it....
>>>>>>>>>> Function I'm calling have looks something like this:
>>>>>>>>>> void Spin(void){
>>>>>>>>>> for(;;){}
>>>>>>>>>> }
>>>>>>>>>> So if it is disassembled with call before entering in pc will 
>>>>>>>>>> be saved on
>>>>>>>>>> stack and it will point to instruction after function 
>>>>>>>>>> spin....So I want to
>>>>>>>>>> use that pc and to save context so when my scheduler schedule 
>>>>>>>>>> that task
>>>>>>>>>> again it will not continue spinning in that forever loop but 
>>>>>>>>>> it will jump
>>>>>>>>>> to next instruction after Spin function.....
>>>>>>>>>>  branch doesn t push pc to stack so taht s my problem;)        
>>>>>
>>>>
>>>> The OP has given us very little information to go on - a lot of what 
>>>> we both are writing about is speculation (and I am just as likely to 
>>>> guess incorrectly as you).
>>>
>>> Yes.
>>>
>>> I wrote my answer assuming that the OP knows what he or she is doing 
>>> but was concerned that the branch instruction might not leave a good 
>>> return address on the stack.
>>>
>>
>> I don't think "the OP knows what he or she is doing" is a fair 
>> assumption, based on the posted code for Spin() !
>>
>>>> However, since an infinite loop can clearly never be broken without 
>>>> pre-emption, I am assuming he /does/ want pre-emption.
>>>
>>> I would not call it pre-emption, but interruption. To me, pre-emption 
>>> means suspending a task at some arbitrary point in its execution and 
>>> switching control to another task. In the OP's code, the Spin 
>>> function seems to be the expected place for suspending and resuming 
>>> the task, so the task is prepared for it, at that point. This looks 
>>> like co-operative multi-tasking.
>>>
>>
>> It's possible (or maybe even likely) that the OP is /trying/ to 
>> implement a co-operative scheduler.  But it doesn't actually 
>> co-operate - an eternal loop is not co-operative, even if it you cheat 
>> and break out using interrupts.  Interrupts are inherently 
>> asynchronous - if the thread can be suspended by an interrupt 
>> function, that is pre-emptive multitasking.
> 
> Well, what constitutes "co-operation" may be a matter of precise 
> definition (in real life, sometimes of litigation :-). In my guess of 
> the OP's kernel/scheduler design, the suspension is designed to happen 
> only when the thread is looping in the Spin function. By calling Spin 
> the thread shows that it is ready to be suspended, so it is co-operating 
> in my view. (As discussed earlier, we don't know what happens if the 
> scheduler interrupt finds the thread is *not* in Spin.)
> 
>>> Agreed. For most embedded compilers, though, anything in a separate 
>>> compilation is considered "foreign". But as you say, there is no 
>>> guarantee in general.
>>>
>>
>> These days, full program optimisation is not uncommon.  Even gcc 
>> (despite its critics' opinions) can do reasonable full program 
>> optimisation by compiling all the C modules in one shot.
> 
> Sure, but in that case it would not be "separate compilation".
> 

Fair enough.

What about gcc 4.5 with -flto ?  Then you can compile C modules 
separately into object files, but the object files hold a copy of the 
internal trees as well as generated object code.  When you link these 
object files, the trees are used for link-time optimisation, including 
inlining across modules.  You lose all clarity in the definitions of 
"compile", "link", and "separate compilation".  But that is a 
digression, especially since the msp430 gcc port is not (yet) updated to 
gcc 4.5, which is itself not yet released.

> Interesting question, though: Is there a standard way in a C environment 
> to ensure that the standard calling sequence is used for an extern 
> function, with no C-calling-C optimizations?
> 

I think the only way is by being sure that the compiler can't access the 
code for a function declared as "extern".  It should not be hard to do, 
but you may have to do it explicitly.  For example, if you use a 
compiler's IDE and project manager, you might have to go out of your way 
to force true separate compilation.

>>> I'm not so ready to call this "bad design" without knowing more about 
>>> the OP's requirements and design. The code generated for Spin is 
>>> exactly the kind of tight eternal loop that you often find in a 
>>> kernel where the kernel has no ready tasks and waits for an 
>>> interrupt. I haven't tried it, but it seems to me that writing this 
>>> loop as a conditional, flag-checking one could increase (by a little) 
>>> the latency for resuming the right task when an interrupt happens, 
>>> compared to resuming the task directly from the interrupt handler and 
>>> simply abandoning the tight loop.
>>>
>>
>> Nah, the loop overhead to continually read a flag would be a few 
>> cycles at most.  The interrupt function overhead to figure out return 
>> addresses from the stack will be much, much worse.
> 
> Let's consider what the kernel has to do, in my guess of the OP's 
> design, considering the two cases of (A) an unconditional "eternal" loop 
> and (B) a flag-checking loop.
> 
> The kernel knows which thread is running.
> 
> When the thread finishes its job in this time-slice, it calls Spin, 
> expecting to be resumed at the next instruction after the Spin call, say 
> instruction R.
> 
> The Spin function loops, eating up the rest of the time-slice.
> 
> The tick interrupt comes in.
> 
> The tick interrupt handler saves the context of the interrupted thread. 
> By comparing its PC to the address of the Spin loop, it can check that 
> the thread has not overrun its time-slice. At this point:
> 

Note that a sensible Spin function would tell the kernel that it is 
finished and entering the spin loop, rather than leaving the interrupt 
handler to figure it out in this fragile way.

> - For (A) the handler gets the return address of Spin
>   by a POP and stores this return address as the resumption
>   point for the thread to be suspended.
> 

Assuming, of course, that you've figured out a way to do that safely and 
reliably....

> - For (B) the handler stores the interrupted PC (in
>   the flag-checking loop) as the resumption point.
> 

This bit will typically require some assembly, compiler-specific 
features, or some knowledge of the way the compiler generates interrupt 
routines.  But that's unavoidable when you have an interrupt-based 
scheduler.

> The interrupt handler (scheduler) finds the thread to run in the next 
> time-slice. In case (B) it then sets the (thread-specific) flag on which 
> Spin is waiting. In case (A) it does not need to set any flag.
> 

Fair enough, although setting a flag is exactly a hard job, and can be 
done within standard C.

> The handler restores the context of the new thread. As the last step in 
> this, it pushes the resumption address and the restored status register 
> and does return-from-interrupt (RETI).
> 

OK.

> In case (A), the thread is resumed immediately at the desired 
> instruction, the instruction R that follows the Spin call.
> 

Again assuming that is it is possible to figure out the address of R in 
a reliable way...

> In case (B), the thread is resumed in the middle of the flag-checking 
> loop. It still has to read the flag, branch out of the loop, and execute 
> a return instruction (effectively a POP from the stack), before 
> instruction R is reached.
> 

Yes, you can expect it to take about 3 or 4 instructions before getting 
to R.  That would still be a lot less time than you spend messing around 
getting the address of R in case (A), so case (B) wins here in time.

> In summary, case (A) and case (B) both have to POP the stack to get to 
> instruction R, but case (B) also has to set a flag and check a flag. It 
> is a close call, but you might save some cycles in case (A). Morever, in 
> case (B) the flag has to be thread-specific, so it has to be passed to 
> Spin with a parameter, consuming more cycles.
> 

Remember that all ideas about how case (A) could feasibly be implemented 
are based on hobbling the compiler.  Write the code correctly (case B), 
and you can let the optimiser do its job - that will overwhelm any 
conceivable time advantage case A might have had.  Among other things, 
Spin() could be inlined in its calling function and remove most of the 
overhead.

Even making the great leap of faith that there is a reliable way to get 
the desired return address, and then making a second leap of faith that 
case A is faster, the concept is /still/ wrong.  There is no way that 
shaving a few cycles off the latency could justify using this horrible 
hack.  If those cycles matter, you need a new design.