EmbeddedRelated.com
Forums

Getting the size of a C function

Started by john January 22, 2010
In article <efkt27x98l.ln2@news.flash-gordon.me.uk>, 
smap@spam.causeway.com says...
> Mark Borgerson wrote: > > In article <uq4nl51thfeujhkgk5i7nhtmapjesi6ns8@4ax.com>, > > jonk@infinitefactors.org says... > >> On Sat, 23 Jan 2010 15:18:25 -0800, Mark Borgerson > >> <mborgerson@comcast.net> wrote: > >> > > <<SNIP discussion of unpredictable, but legal, compiler behavior>> > >>> I've also run across main processing loops such as > >>> > >>> void MainLoop(void) > >>> while(1){ > >>> get user input > >>> execute commands > >>> } > >>> MarkEnd(); > >>> } > >>> > >>> where MarkEnd doesn't appear in the generated machine > >>> code, because the compiler, even at lowest optimization > >>> setting, recognizes that the code after the loop > >>> will never get executed. > >> Yes, of course. That is another possibility. The intended > >> function may be essentially the "main loop" of the code and > >> as such never returns. However, whether or not MarkEnd() > >> were optimized out, it wouldn't ever get executed anyway. So > >> you'd never get the address stuffed into something useful... > >> and so it doesn't even matter were it that the compiler kept > >> the function call. So it makes a good case against your > >> approach for an entirely different reason than optimization > >> itself. > > > > Well, I wuould not dream of using this approach on a function > > that never returns. OTOH a flash-update routine had better > > return, or it won't be particularly useful (unless your goal > > is to test the write-endurance of the flash) ;-) > > <snip> > > Some times you *do* write such functions so they never return. Whilst > reprogramming the flash it keeps kicking the watchdog, but it stops when > it's finished and the watchdog resets the system thus booting it in to > the new code. Or it might branch to the reset (or power-up) vector > rather than return. In fact, returning could easily be impossible > because the code from which the function was called is no longer there! >
Hmmmm. I hadn't thought of that watchdog idea, since TI recommends shutting off the watchdog and disabling interrupts while programming flash. I also agree about the return not being normal---there's probably not much chance returning to the address on the stack is going to work out, so a reset is probably the best idea after a firmware update. I should have said that I wouldn't use this idea on a function designed to run forever---or at least not one that the compiler might think runs forever. I would also examine the resulting code to make sure the compiler was doing what I intended. I think that the ideas I have described will work on some processors and compilers for some functions, but not on all compilers for all processors and functions. If you do a lot of embedded systems programming, restrictions like that are nothing new. Mark Borgerson Mark Borgerson
In article <hjda8u$t4k$1@speranza.aioe.org>, john  <john@nospam.com> wrote:
>Hi, > >I need to know the size of a function or module because I need to >temporarily relocate the function or module from flash into sram to >do firmware updates. > >How can I determine that at runtime? The >sizeof( myfunction) >generates an error: "size of function unknown".
Admit it, you do something that can't be done in C. By far the simplest is to generate assembler code, and add a small instrumentation to that. Start by accessing the function through a pointer to subroutine. Then you can store an sram address there when needed.
> >Thanks.
-- -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
Keith Thompson wrote:
> WangoTango <Asgard24@mindspring.com> writes: >> In article <hjda8u$t4k$1@speranza.aioe.org>, john@nospam.com says... >>> I need to know the size of a function or module because I need to >>> temporarily relocate the function or module from flash into sram to >>> do firmware updates. >>> >>> How can I determine that at runtime? The >>> sizeof( myfunction) >>> generates an error: "size of function unknown". >>> >> Good question, and I would like to know if there is an easy way to do it >> during runtime, and a portable way would be nice too. I would probably >> look at the map file and use the size I calculated from there, but >> that's surely not runtime. >> >> You can get the starting address of the function pretty easy, but how >> about the end? Hmmm, gotta' think about that. > > You can't even portably assume that &func is the memory address of the > beginning of the function. I think there are systems (AS/400) where > function pointers are not just machine addresses. >
Closer to comp.arch.embedded, &func may not be the memory address of a function on smaller micros with more than 64KB (or sometimes 64K words) of flash. gcc for the AVR, for example, uses trampolines for function pointers on devices with more than 64K words flash - &func gives the address of a jump instruction in the lower 64K memory, which jumps to the real function. That way you can use 16-bit function pointers with larger memories.
> Given whatever it is you're doing, you're probably not too concerned > with portability, so that likely not to be an issue. But there's no > portable way in C to determine the size of a function, so you're more > likely to get help somewhere other than comp.lang.c. >
Mark Borgerson wrote:
> In article <pan.2010.01.23.05.08.12.672000@nowhere.com>, > nobody@nowhere.com says... >> On Fri, 22 Jan 2010 22:53:18 +0000, john wrote: >> >>> I need to know the size of a function or module because I need to >>> temporarily relocate the function or module from flash into sram to >>> do firmware updates. >> Do you need to be able to run it from RAM? If so, simply memcpy()ing it >> may not work. And you would also need to copy anything which the function >> calls (just because there aren't any explicit function calls in the source >> code, that doesn't mean that there aren't any in the resulting object code). >> >> > At the expense of a few words of code and a parameter, you could do > > > int MoveMe(...., bool findend){ > if(!findend){ > > // do all the stuff the function is supposed to do > > } else Markend(); > > } > > > Where Markend is a function that pulls the return > address off the stack and stashes it somewhere > convenient. Markend may have to have some > assembly code. External code can then > subtract the function address from the address > stashed by Markend(), add a safety margin, and > know how many bytes to move to RAM. > > > Mark Borgerson >
Anything that relies on the compiler being stupid, or deliberately crippled ("disable all optimisations") or other such nonsense is a bad solution. It is conceivable that it might happen to work - /if/ you can get the compiler in question to generate bad enough code. But it is highly dependent on the tools in question, and needs to be carefully checked at the disassembly level after any changes. In this particular example of a highly risky solution, what happens when the compiler generates proper code? The compiler is likely to generate the equivalent of : int MoveMe(..., bool findend) { if (findend) "jump" Markend(); // do all the stuff } Or perhaps it will inline Markend, MoveMe, or both. Or maybe it will figure out that MoveMe is never called with "findend" set, and thus optimise away that branch. All you can be sure of, is that there is no way you can demand that a compiler produces directly the code you apparently want it to produce - C is not assembly.
Mark Borgerson wrote:
> In article <slrnhlmqth.1evp.willem@turtle.stack.nl>, willem@stack.nl > says... >> Mark Borgerson wrote: >> ) In article <87tyucpp7x.fsf@blp.benpfaff.org>, blp@cs.stanford.edu >> ) says... >> )> Mark Borgerson <mborgerson@comcast.net> writes: >> )> >> )> > In article <87y6jopsl9.fsf@blp.benpfaff.org>, blp@cs.stanford.edu >> )> > says... >> )> >> Mark Borgerson <mborgerson@comcast.net> writes: >> )> >> You seem to be assuming that the compiler emits machine code that >> )> >> is in the same order as the corresponding C code, i.e. that the >> )> >> call to Markend() will occur at the end of MoveMe(). This is not >> )> >> a good assumption. >> )> > >> )> > This would certainly be a dangerous technique on a processor >> )> > with multi-threading and possible out-of-order execution. >> )> > I think it will work OK on the MSP430 that is the CPU where >> )> > I am working on a flash-burning routine. >> )> >> )> Threading and out-of-order execution has little if anything to do >> )> with it. The issue is the order of the code emitted by compiler, >> )> not the order of the code's execution. >> )> >> ) But woudn't an optimizing compiler generating code for a >> ) complex processor be more likely to compile optimize in >> ) a way that changed the order of operations? I think >> ) that might apply particularly to a call to a function >> ) that returns no result to be used in a specific >> ) place inside the outer function. >>
You get good and bad compilers for all sorts of processors, and even a half-decent one will be able to move code around if it improves the speed or size of the target - something that can apply on any size of processor. <snip>
> > In any of these instances, I would certainly review > the assembly code to make sure the compiler was doing > what I intended in the order I wanted. Maybe programmers > in comp.lang.c don't do that as often as programmers > in comp.arch.embedded. ;-) >
I don't know about typical "comp.lang.c" programmers, but typical "comp.arch.embedded" programmers use compilers that generate tight code, and they let the compiler do its job without trying to force the tools into their way of thinking. At least, that's the case for good embedded programmers - small and fast code means cheap and reliable microcontrollers in this line of work. And code that has to be disassembled and manually checked at every change is not reliable or quality code.
On 24 Jan, 21:44, David Brown <david.br...@hesbynett.removethisbit.no>
wrote:
...
> Anything that relies on the compiler being stupid, or deliberately > crippled ("disable all optimisations") or other such nonsense is a bad > solution.
I *think* Mark is aware of the limitations of his suggestion but there seems to be no C way to solve the OP's problem. It does sound like the problem only needs to be solved as a one-off in a particular environment. That said, what about taking function pointers for all functions and sorting their values? It still wouldn't help with the size of the last function. Can we assume the data area would follow the code? I guess not. James
On Sun, 24 Jan 2010 22:53:01 +0100, David Brown
<david.brown@hesbynett.removethisbit.no> wrote:

>Mark Borgerson wrote: >> In article <slrnhlmqth.1evp.willem@turtle.stack.nl>, willem@stack.nl >> says... >>> Mark Borgerson wrote: >>> ) In article <87tyucpp7x.fsf@blp.benpfaff.org>, blp@cs.stanford.edu >>> ) says... >>> )> Mark Borgerson <mborgerson@comcast.net> writes: >>> )> >>> )> > In article <87y6jopsl9.fsf@blp.benpfaff.org>, blp@cs.stanford.edu >>> )> > says... >>> )> >> Mark Borgerson <mborgerson@comcast.net> writes: >>> )> >> You seem to be assuming that the compiler emits machine code that >>> )> >> is in the same order as the corresponding C code, i.e. that the >>> )> >> call to Markend() will occur at the end of MoveMe(). This is not >>> )> >> a good assumption. >>> )> > >>> )> > This would certainly be a dangerous technique on a processor >>> )> > with multi-threading and possible out-of-order execution. >>> )> > I think it will work OK on the MSP430 that is the CPU where >>> )> > I am working on a flash-burning routine. >>> )> >>> )> Threading and out-of-order execution has little if anything to do >>> )> with it. The issue is the order of the code emitted by compiler, >>> )> not the order of the code's execution. >>> )> >>> ) But woudn't an optimizing compiler generating code for a >>> ) complex processor be more likely to compile optimize in >>> ) a way that changed the order of operations? I think >>> ) that might apply particularly to a call to a function >>> ) that returns no result to be used in a specific >>> ) place inside the outer function. >>> > >You get good and bad compilers for all sorts of processors, and even a >half-decent one will be able to move code around if it improves the >speed or size of the target - something that can apply on any size of >processor. > ><snip> > >> >> In any of these instances, I would certainly review >> the assembly code to make sure the compiler was doing >> what I intended in the order I wanted. Maybe programmers >> in comp.lang.c don't do that as often as programmers >> in comp.arch.embedded. ;-) >> > >I don't know about typical "comp.lang.c" programmers, but typical >"comp.arch.embedded" programmers use compilers that generate tight code, >and they let the compiler do its job without trying to force the tools >into their way of thinking. At least, that's the case for good embedded >programmers - small and fast code means cheap and reliable >microcontrollers in this line of work. And code that has to be >disassembled and manually checked at every change is not reliable or >quality code.
You give me a great way to segue into something. There are cases where you simply have no other option than to do exactly that. I'll provide one example. There are others. I was working on a project using the PIC18F252 processor and, at the time, the Microchip c compiler was in its roughly-v1.1 incarnation. We'd spent about 4 months in development time and the project was nearing completion when we discovered an intermittent (very rarely occurred) problem in testing. Once in a while, the program would emit strange outputs that we simply couldn't understand when closely examining and walking through the code that was supposed to generate that output. It simply wasn't possible. Specific ASCII characters were being generated that simply were not present in the code constants. In digging through the problem, by closely examining the generated assembly output, I discovered one remarkable fact that led me to imagine a possibility that might explain things. The Microchip c compiler was using static variables for compiler temporaries. And it would _spill_ live variables that might be destroyed across a function call into them. They would be labelled something like __temp0 and the like. There was _no_ problem when the c compiler was doing that for calls made to functions within the same module, because they had anticipated that there might be more than one compiler temporary needed in nested calls and they added the extra code in the c compiler to observe if a decendent function, called by a parent, would also need to spill live variables and would then construct more __temp1... variables to cover that case. Not unlike what good 8051 compilers might do when generating static variable slots for nested call parameters for efficiency (counting spills all the way down, so to speak.) However, when calling functions in _other_ modules, where the c compiler had _no_ visibility about what it had already done over there on a separate compilation, it had no means to do that and, of course, there became a problem. What was spilled into __temp0 in module-A was also spilled into __temp0 in module-B and, naturally, I just happened to have a case where that became a problem under the influence of interrupt processing. I had completely saved _all_ registers at the moment of the interrupt code before attempting to call any c functions, of course. That goes without saying. But I'd had _no_ idea that I might have to save some statics which may, or may not, at the time be "live." Worse, besides the fact that there was no way I could know in advance which naming the c compiler would use in any circumstance, the c compiler chose these names in such a way that they were NOT global or accessible either to c code or to assembly. I had to actually _observe_ in the linker file the memory location where they resided and make sure that the interrupt routine protected them, as well. This required me to document a procedure where every time we made a modification to the code that might _move_ the location of these compiiler generated statics, we had to update a #define constant to reflect it, and then recompile again. Got us by. Whether it is _reliable_ or not would be another debate. The resulting code was very reliable -- no problems at all. However, the process/procedures we had to apply were not reliable, of course, because we might forget to apply the documented procedure before release. So on that score, sure. Life happens. Oh, well. Jon
"James Harris" <james.harris.1@googlemail.com> wrote in message
news:c448f39c-2775-4ea5-b25a-7c8bfa0c6ded@b2g2000yqi.googlegroups.com...
> On 24 Jan, 21:44, David Brown <david.br...@hesbynett.removethisbit.no> > wrote: > ... >> Anything that relies on the compiler being stupid, or deliberately >> crippled ("disable all optimisations") or other such nonsense is a bad >> solution. > > I *think* Mark is aware of the limitations of his suggestion but there > seems to be no C way to solve the OP's problem. It does sound like the > problem only needs to be solved as a one-off in a particular > environment. > > That said, what about taking function pointers for all functions and > sorting their values? It still wouldn't help with the size of the last > function. Can we assume the data area would follow the code? I guess > not.
You'd need to sort *all* the functions of an application (include non-global functions), and there would still be the possibility that some function or other stuff you don't know about resides between 'consecutive' functions f() and g(). Reading f() might be alright but overwriting it would be tricky. -- Bartc
On Sun, 24 Jan 2010 15:13:15 -0800 (PST), James Harris wrote:

>On 24 Jan, 21:44, David Brown <david.br...@hesbynett.removethisbit.no> >wrote: >... >> Anything that relies on the compiler being stupid, or deliberately >> crippled ("disable all optimisations") or other such nonsense is a bad >> solution. > >I *think* Mark is aware of the limitations of his suggestion but there >seems to be no C way to solve the OP's problem. It does sound like the >problem only needs to be solved as a one-off in a particular >environment. > >That said, what about taking function pointers for all functions and >sorting their values? It still wouldn't help with the size of the last >function. Can we assume the data area would follow the code? I guess >not.
In general, no universally "good" assumptions exist. Partly also because the very idea itself of "moving a function" in memory at run-time is itself not yet well-defined by those talking about it here. Any given function may have the following: code --> Code is essentially strings of constants. It may reside in a von-Neumann memory system or a Harvard one. It therefore may be readable by other code, or not. Many of the Harvard implementations include a special instruction or a special pointer register, perhaps, to allow access to the code space memory. But not all do. In general, it may not even be possible to read and move code. Even in von-Neumann memory systems where, in theory there is no problem, the code may have been "distributed" in pieces. An example here would be an implementation I saw with Metaware's c compiler where they had extended it to support a type of co-routine called an 'iterator.' In this case, the body-block of a for-loop would be moved outside the function's code region into a separate function so that their implementation could call the for-loop body through their very excellently considered support mechanism for iterators. You'd need to know where that part was, as well, to meaningfully move things. constants --> A function may include instanced constants (which a smart compiler may "understand" from something like 'const int aa= 5;', if it also finds that some other code takes an address to 'aa'.) These may also need to be moved. Especially if one is trying to download an updated function into ram before flashing it for permanence as a "code update" procedure. These constants may also be placed either in von-Neumann memory systems and be accessed via PC-relative or absolute memory locations -- itself a potential bag of worms -- or in Harvard code space if the processor supports accessing it or in Harvard data space, otherwise, especially if there is some of that which is non-volatile. static initialized data --> A function may include instanced locations that must be initialized prior to main(), but where the actual values of these instances are located in some general collection place used by who-knows-what code in the crt0 library routine that does this job of pre-initing. Once again, more issues to deal with and wonder about. And that's just what trips off my tongue to start. It's a tough problem to solve generally. To do it right, the language semantics (and syntax, most likely, as well) itself would need to be expanded to support it. That could be done, I suppose. But I imagine a lot of gnashing of teeth along the way. Jon
On 22 Jan, 22:53, john <j...@nospam.com> wrote:

> I need to know the size of a function or module because I need to > temporarily relocate the function or module from flash into sram to > do firmware updates. > > How can I determine that at runtime? The > sizeof( myfunction) > generates an error: "size of function unknown".
... On 24 Jan, 23:37, "bartc" <ba...@freeuk.com> wrote:
> "James Harris" <james.harri...@googlemail.com> wrote in message
...
> > there seems to be no C way to solve the OP's problem.
...
> > That said, what about taking function pointers for all functions and > > sorting their values? It still wouldn't help with the size of the last > > function. Can we assume the data area would follow the code? I guess > > not. > > You'd need to sort *all* the functions of an application (include > non-global functions), and there would still be the possibility that some > function or other stuff you don't know about resides between 'consecutive' > functions f() and g(). > > Reading f() might be alright but overwriting it would be tricky.
Since you've commented, Bart, do you have any thoughts on making metadata about functions available in a programming language? Maybe you already do this in one of your languages. The thread got me thinking that if a function is a first-class object perhaps some of its attributes should be transparent. Certainly its code size and maybe its data size too; possibly its location, maybe a signature for its input and output types. Then there are other attributes such as whether it is in byte code or native code, whether it is relocatable or not, what privilege it needs etc. If portability is not needed a function object could also be decomposed to individual instruction or subordinate function objects. I'm not saying I like this idea - portability is a key goal for me - but I'm just offering some ideas for comment. Any thoughts on what's hot and what's not? Followups set to only comp.lang.misc. James