Position independent code with position dependent data ?| page 2

Reply by D Yuniskis ●July 1, 20102010-07-01

Hi Peter,

Peter Dickerson wrote:
> "D Yuniskis" <not.going.to.be@seen.com> wrote in message 
> news:i0gf3s$kou$1@speranza.aioe.org...
>> nono240 wrote:
>>
>>> My CPU has no MMU, very little RAM (8KB), and is running a modified
>>> FreeRTOS. I'd like to have the ability to "load" and run some code
>>> from USART/DATAFLASH to FLASH as a RTOS task. Of course, for
>>> convenience, the compiled code must be fully position independent.
>>> Using the -fPIC or -fpic option, looking at the assembler, the code
>>> seems OK : a dynamically computed offset is applied to every
>>> operations.
>>>
>>> BUT, looking deeply, both DATA and CODE are applied the base offset !
>>> While this is the expected behavior for CODE (running anywhere in
>>> FLASH), moving my CODE over the entire flash space doesn't mean moving
>>> my RAM ! This make only sense when executing everything from SDRAM !
>>>
>>> I'm looking for a solution to generate position independent *code*,
>>> but with position dependent *data* using GCC/LD... Any help ?
>> I find, in resource starved applications, using interpreters
>> is a big win.  If you're loading apps dynamically, I suspect
>> the speed penalty would be insignificant (esp with careful
>> choice of language)
> 
> Any suggestions for such interpretters, Don? Experiences?

Remember, this is c.a.e so, for the most part, you *know*
what the application is -- and what it will *remain*
(i.e., we're not looking at an environment where you have to
be able to handle infinite variety of applications).

In the past, I've written C-ish, PL/M-ish and BASIC-ish interpreters
along with Forth.  Note that you can use these as guidelines
for a pseudo-language without strictly complying with any
formal language definition.

E.g., you can opt to implement integer only math instead of
supporting "doubles", etc.  You can force limits to be defined
for string lengths (static memory allocation).  You can
discount recursion, etc.

The advantage of interpreters has always seemed to be coming
up with really tight representations of algorithms and
spend "ROM" instead of needing space in (loadable) RAM...

Reply by nono240 ●July 1, 20102010-07-01

Hi there ! Thank you for reply !



>If this is the entire
>application and there is nothing else present, no operating
>system for example, then why do you care?

I'm running FreeRTOS. We need "dynamic task loading".



>Unless you want to download multiple tasks like this, and have them
>stored in arbitrary places in flash (and ram), then there is no need for
>position-independent data or code


IT IS my case. It's a (commercial) product, letting the user to load
multiple (so named) "tasklets" into FLASH, but its only allowed to run
ONE at a time. So, we need those "tasklets" to be CODE position
independent, but share DATA.


>However, with sram fixed in one place and flash in another
>place (I'm assuming that's the case as you point out there is
>no MMU), there is no question about the fact that there are
>at least two separate segments in your situation. The base
>address of the flash-located segment might be the PC register
>so that this flash block can be moved around freely and uses
>the PC register as a cheap way to figure out where it's own
>stuff is at (assuming the processor supports that), but that
>won't work for the sram data instance segment which is
>obviously located "elsewhere." Somehow, a base address for
>that region needs to be made available to your code and
>applied at run time. What mechanism is available for that?
>Any RODATA are stored in FLASH, and the mechanism used for position indepedence is PC relative offset : before any IO operations, the *real* offset from original linkage is computed and added automatically :

For example, the following code :

extern int myarray[];  // @0x4000 (DATA)
int foo()
{
    return myarray[0];
}

Give :

80018196:	lddpc	    r6, 80018204 <---- R6 = 0x80014198 (Load PC
relative)
80018198:	rsub	    r6,pc    <---- R6 = PC - R6 = 0x4000
8001819a:	ld.w	    r12,r6[0]    <---- R12= *(uint32_t *) R6
8001819e:	ret
....
80018204:
               .word      0x80014198

So, if I run my code from elsewhere, let's say 16KB farther, the PIC-
computed address for m myarray is 0x8000, not what I want.

The same is applied for ROM constants (but it's OK in this case).

Reply by Andrew Jackson ●July 1, 20102010-07-01

> For example, the following code :
>
> extern int myarray[];  // @0x4000 (DATA)
> int foo()
> {
>      return myarray[0];
> }
>
> Give :
>
> 80018196:	lddpc	    r6, 80018204<---- R6 = 0x80014198 (Load PC
> relative)
> 80018198:	rsub	    r6,pc<---- R6 = PC - R6 = 0x4000
> 8001819a:	ld.w	    r12,r6[0]<---- R12= *(uint32_t *) R6
> 8001819e:	ret
> ....
> 80018204:
>                 .word      0x80014198
>
> So, if I run my code from elsewhere, let's say 16KB farther, the PIC-
> computed address for m myarray is 0x8000, not what I want.
>
> The same is applied for ROM constants (but it's OK in this case).

Why don't you provide a function call to get the address of the shared 
data in your code and use that within a tasklet.  You might put the 
relevant data into a structure: ditto ROM data.

	Andrew

Reply by Paul Keinanen ●July 1, 20102010-07-01

On Thu, 1 Jul 2010 05:12:36 -0700 (PDT), nono240 <nono240@gmail.com>
wrote:

>>Unless you want to download multiple tasks like this, and have them
>>stored in arbitrary places in flash (and ram), then there is no need for
>>position-independent data or code
>
>
>IT IS my case. It's a (commercial) product, letting the user to load
>multiple (so named) "tasklets" into FLASH, but its only allowed to run
>ONE at a time. So, we need those "tasklets" to be CODE position
>independent, but share DATA.

If you can run only one task at a time, why do you need position
independent code ? Just link each program to the same fixed load
address. You need PIC code only when there are _multiple_ programs to
be loaded somewhere into the RAM.

If you want to share data between these programs, first link the data
area to a fixed address and then link each program to that address.

This is how it was done half a century ago. In FORTRAN, create a
COMMON area, install it into a fixed address (usually at the top of
the core) and then load each "transient program" into low memory,
since the whole program could not fit into the core at once. No
base/stack pointer relative addressing needed, since the data
addresses were known at compile time.

With modern processors with versatile addressing modes, why not
reserve one data area pointer at a known location (such as the first
or last address in RAM or ROM) and use this to access the shared
variables in each program ?

  Data = GetPersistentDataAreaPointer() ;
  ...
  Data->SharedVar1 = Data->SharedVar2 ;

Reply by Albert van der Horst ●July 1, 20102010-07-01

In article <24ad21f6-9326-4fe8-8434-164dd88fdfd8@b35g2000yqi.googlegroups.com>,
nono240  <nono240@gmail.com> wrote:
>Hi there !
>
>My CPU has no MMU, very little RAM (8KB), and is running a modified
>FreeRTOS. I'd like to have the ability to "load" and run some code
>from USART/DATAFLASH to FLASH as a RTOS task. Of course, for
>convenience, the compiled code must be fully position independent.
>Using the -fPIC or -fpic option, looking at the assembler, the code
>seems OK : a dynamically computed offset is applied to every
>operations.
>
>BUT, looking deeply, both DATA and CODE are applied the base offset !
>While this is the expected behavior for CODE (running anywhere in
>FLASH), moving my CODE over the entire flash space doesn't mean moving
>my RAM ! This make only sense when executing everything from SDRAM !
>
>I'm looking for a solution to generate position independent *code*,
>but with position dependent *data* using GCC/LD... Any help ?

A basic concept in linking ( ld program) is the ``section''.
A section is an area of memory belonging together, such that
e.g. distances within the section are fixable.
A section may have a relocation table identifying the places
in the section that still needs to be adjusted to the final
place it will be used in the program.

Now you want to have different sections behave differently regards
location.

What the linker (ld) does is combine sections from different object
modules together into larger sections with names like .bss .text
.data  and possible fixing the relocation table.
From that point whatever went into such a section will
be treated in the same way, i.e. once you combined DATA and CODE
into one section, data and code will be either fixed at a position
or have a relocation table. The linker is blind to the difference
between code and data, the only information it gets is by naming
convention of input sections. This information is generated
by the compiler.

Now you have to understand which sections you have, then tell
the linker what to do with it.
Using the --debug option to the linker you get a so called
linker script which details what the linker does.
What you want can be accomplished by adapting the linker script,
which is -- I admit -- not necessarily easy.

Groetjes Albert

--
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Reply by nono240 ●July 2, 20102010-07-02

Thanks for all your replies !

> If you can run only one task at a time, why do you need position
> independent code ? Just link each program to the same fixed load
> address. You need PIC code only when there are _multiple_ programs to
> be loaded somewhere into the RAM.


Because those "tasklets" are stored in different places in FLASH : we
ship the device with 4 embedded tasklets,
but a dozen more are available to download and free to be uploaded at
any of those 4 slots, we don't want the user to take care about "link
address" !
Moreover, if we update our CPU to more FLASH, we don't want to deal
with multiple tasklets version.


> If you want to share data between these programs, first link the data
> area to a fixed address and then link each program to that address.
>
> This is how it was done half a century ago. In FORTRAN, create a
> COMMON area, install it into a fixed address (usually at the top of
> the core) and then load each "transient program" into low memory,
> since the whole program could not fit into the core at once. No
> base/stack pointer relative addressing needed, since the data
> addresses were known at compile time.

Relocating a tasklet "on demand" to an "predefined fixed area" will
prematurely kill the FLASH since there's not enough RAM to run code
from...

> With modern processors with versatile addressing modes, why not
> reserve one data area pointer at a known location (such as the first
> or last address in RAM or ROM) and use this to access the shared
> variables in each program ?
>
> =A0 Data =3D GetPersistentDataAreaPointer() ;
> =A0 ...
> =A0 Data->SharedVar1 =3D Data->SharedVar2 ;

Because we want the tasklets to be "RTOS" unaware. Our FreeRTOS is
running as an "hypervisor" (we did have an MPU though).

I just need a way to tell LD that my DATA section is ABSOLUTE, and not
relative from CODE..

Reply by Andrew Jackson ●July 2, 20102010-07-02

>> With modern processors with versatile addressing modes, why not
>> reserve one data area pointer at a known location (such as the first
>> or last address in RAM or ROM) and use this to access the shared
>> variables in each program ?
>>
>>    Data = GetPersistentDataAreaPointer() ;
>>    ...
>>    Data->SharedVar1 = Data->SharedVar2 ;
>
> Because we want the tasklets to be "RTOS" unaware. Our FreeRTOS is
> running as an "hypervisor" (we did have an MPU though).

I don't see that Paul's suggestion makes your tasklet RTOS aware. 
Furthermore, I would have thought that the tasklets do need to be RTOS 
aware because they are manipulating a common data area.

> I just need a way to tell LD that my DATA section is ABSOLUTE, and not
> relative from CODE..

What does your link script look like at present?

	Andrew

Reply by Peter Dickerson ●July 2, 20102010-07-02

"D Yuniskis" <not.going.to.be@seen.com> wrote in message 
news:i0hm4l$c6m$1@speranza.aioe.org...
> Hi Peter,
>
> Peter Dickerson wrote:
>> "D Yuniskis" <not.going.to.be@seen.com> wrote in message 
>> news:i0gf3s$kou$1@speranza.aioe.org...
>>> nono240 wrote:
>>>
>>>> My CPU has no MMU, very little RAM (8KB), and is running a modified
>>>> FreeRTOS. I'd like to have the ability to "load" and run some code
>>>> from USART/DATAFLASH to FLASH as a RTOS task. Of course, for
>>>> convenience, the compiled code must be fully position independent.
>>>> Using the -fPIC or -fpic option, looking at the assembler, the code
>>>> seems OK : a dynamically computed offset is applied to every
>>>> operations.
>>>>
>>>> BUT, looking deeply, both DATA and CODE are applied the base offset !
>>>> While this is the expected behavior for CODE (running anywhere in
>>>> FLASH), moving my CODE over the entire flash space doesn't mean moving
>>>> my RAM ! This make only sense when executing everything from SDRAM !
>>>>
>>>> I'm looking for a solution to generate position independent *code*,
>>>> but with position dependent *data* using GCC/LD... Any help ?
>>> I find, in resource starved applications, using interpreters
>>> is a big win.  If you're loading apps dynamically, I suspect
>>> the speed penalty would be insignificant (esp with careful
>>> choice of language)
>>
>> Any suggestions for such interpretters, Don? Experiences?
>
> Remember, this is c.a.e so, for the most part, you *know*
> what the application is -- and what it will *remain*
> (i.e., we're not looking at an environment where you have to
> be able to handle infinite variety of applications).
>
> In the past, I've written C-ish, PL/M-ish and BASIC-ish interpreters
> along with Forth.  Note that you can use these as guidelines
> for a pseudo-language without strictly complying with any
> formal language definition.
>
> E.g., you can opt to implement integer only math instead of
> supporting "doubles", etc.  You can force limits to be defined
> for string lengths (static memory allocation).  You can
> discount recursion, etc.
>
> The advantage of interpreters has always seemed to be coming
> up with really tight representations of algorithms and
> spend "ROM" instead of needing space in (loadable) RAM...

OK, different aim. In my case I have a scientific instrument that is making 
various low level measurements. Users, who are typically chemists or 
biochemists, want real answers not raw measurements. For this the apply 
"Methods" that turn instrumental measurements into stuff like 
concentrations. These methods are all pretty standard but there are lots of 
them, with the occasional new one turning up. I'd prefer the applications 
chemists to be able to implement the methods so that I can concentrate on 
measuring femtoamps. So, I'm looking scriptable.

Peter

Reply by D Yuniskis ●July 2, 20102010-07-02

Hi Peter,

Peter Dickerson wrote:
> "D Yuniskis" <not.going.to.be@seen.com> wrote in message 
> news:i0hm4l$c6m$1@speranza.aioe.org...
>> Hi Peter,
>>
>> Peter Dickerson wrote:
>>> "D Yuniskis" <not.going.to.be@seen.com> wrote in message 
>>> news:i0gf3s$kou$1@speranza.aioe.org...
>>>> nono240 wrote:
>>>>
>>>>> My CPU has no MMU, very little RAM (8KB), and is running a modified
>>>>> FreeRTOS. I'd like to have the ability to "load" and run some code
>>>>> from USART/DATAFLASH to FLASH as a RTOS task. Of course, for
>>>>> convenience, the compiled code must be fully position independent.
>>>>> Using the -fPIC or -fpic option, looking at the assembler, the code
>>>>> seems OK : a dynamically computed offset is applied to every
>>>>> operations.
>>>>>
>>>>> BUT, looking deeply, both DATA and CODE are applied the base offset !
>>>>> While this is the expected behavior for CODE (running anywhere in
>>>>> FLASH), moving my CODE over the entire flash space doesn't mean moving
>>>>> my RAM ! This make only sense when executing everything from SDRAM !
>>>>>
>>>>> I'm looking for a solution to generate position independent *code*,
>>>>> but with position dependent *data* using GCC/LD... Any help ?
>>>> I find, in resource starved applications, using interpreters
>>>> is a big win.  If you're loading apps dynamically, I suspect
>>>> the speed penalty would be insignificant (esp with careful
>>>> choice of language)
>>> Any suggestions for such interpretters, Don? Experiences?
>> Remember, this is c.a.e so, for the most part, you *know*
>> what the application is -- and what it will *remain*
>> (i.e., we're not looking at an environment where you have to
>> be able to handle infinite variety of applications).
>>
>> In the past, I've written C-ish, PL/M-ish and BASIC-ish interpreters
>> along with Forth.  Note that you can use these as guidelines
>> for a pseudo-language without strictly complying with any
>> formal language definition.
>>
>> E.g., you can opt to implement integer only math instead of
>> supporting "doubles", etc.  You can force limits to be defined
>> for string lengths (static memory allocation).  You can
>> discount recursion, etc.
>>
>> The advantage of interpreters has always seemed to be coming
>> up with really tight representations of algorithms and
>> spend "ROM" instead of needing space in (loadable) RAM...
> 
> OK, different aim. In my case I have a scientific instrument that is making 
> various low level measurements. Users, who are typically chemists or 
> biochemists, want real answers not raw measurements. For this the apply 
> "Methods" that turn instrumental measurements into stuff like 
> concentrations. These methods are all pretty standard but there are lots of 
> them, with the occasional new one turning up. I'd prefer the applications 
> chemists to be able to implement the methods so that I can concentrate on 
> measuring femtoamps. So, I'm looking scriptable.

Yes, we wrote/implemented a "QBASIC" for some of our instruments
for just this reason (blood assays).  Allowed the customer to
design new tests without having to contract with us to code
them.  I.e., we just provided a device that came up with
the raw data and let the customer come up with the means
of interpreting that data based on the reagents, etc. that
he was using in the assay.

Note that you can do this two different ways:
- *source* level interpreter in the instrument
- "bytecode" interpreter in the instrument with
   an external "compiler/parser".

(I'm talking *really* limited resources, here)

If you have a more fleshy implementation to work with,
look at Lua.  Lately I am doing a lot with Inferno/Limbo
(but would not suggest it for "end users")

Reply by Peter Dickerson ●July 2, 20102010-07-02

"D Yuniskis" <not.going.to.be@seen.com> wrote in message 
news:i0kg59$cqf$1@speranza.aioe.org...
> Hi Peter,
[snip]

> Yes, we wrote/implemented a "QBASIC" for some of our instruments
> for just this reason (blood assays).  Allowed the customer to
> design new tests without having to contract with us to code
> them.  I.e., we just provided a device that came up with
> the raw data and let the customer come up with the means
> of interpreting that data based on the reagents, etc. that
> he was using in the assay.
>
> Note that you can do this two different ways:
> - *source* level interpreter in the instrument
> - "bytecode" interpreter in the instrument with
>   an external "compiler/parser".

Yes, I'd go for source since that is conceptually the simplest. Otherwise I 
need a bytecode compiler somewhere in the machine or on a PC.

> (I'm talking *really* limited resources, here)
>
> If you have a more fleshy implementation to work with,
> look at Lua.  Lately I am doing a lot with Inferno/Limbo
> (but would not suggest it for "end users")

I did get Lua linked in but ran out of memory almost immediately. In 
particular I couldn't measure anything. The problem seems to be that a lot 
of stuff gets stored in RAM - dictionaries, strings etc. I'd prefer to be 
able to keep that stuff in Flash only even at the cost of performance.

Peter