EmbeddedRelated.com
Forums

Load and run code on arm's sram

Started by Thales March 11, 2011
Hello,

I'm new in this group and a I need some help.

I'm starting to learning Arm mcu lpc2000 family, i wanna know if can anyone explan me how to load and run code in arm's sram

Thanks

An Engineer's Guide to the LPC2100 Series

--- In l..., "Thales" wrote:
>
> Hello,
>
> I'm new in this group and a I need some help.
>
> I'm starting to learning Arm mcu lpc2000 family, i wanna know if can anyone explan me how to load and run code in arm's sram
>
> Thanks
>

Here is a tutorial that covers getting a program to run in ram. However, it also loads the program to ram, not flash. It is not specifically for the LPC devices although somewhere on the net James Lynch has a tutorial specifically for the LPC2106 and LPC2148. It's harder to find.

http://www.sparkfun.com/tutorials/66

You would still need to change the linker script to address the sections to ram (already done in the *_ram scripts) but put them in flash. This is a lot like the .data segment for programs that run out of flash. There is a linker script for this as well. You can see how the .data segment is in ram but loaded to flash. Then you would need to change the startup code to copy the sections from flash to ram and then branch to the ram code. This is shown in the flash linker script where the .data segment (initialized ram) is copied from flash to ram.

But why would you want to do all this? Running out of ram isn't much faster, if at all, and that's because the flash fetches are much wider so fewer are required. Further, most devices have a LOT of flash and not nearly as much ram. 64k bytes of ram seems like a lot until you realize that it is only 16k instructions and the compilers generate a LOT of code.

You really don't want to do this...

Richard

Hello,
That tutorial seens like another one that I have, but have some things on this
tutorial that can't understand very well, like that declarations and .data on
startup code. Could explain me, in detail each part of startup code...

you ask me why do I want to run code from ram...
I'll try to explain what I trying to do, I'm writing a code for a "grafic card"
with arm controller, so I need to transfer a lot of data, that will be made by a
loop. I guess if I run this part of code from ram the program will run faster
than if it was running from flash.
So that tutorial that you send me is intereting, but I guess that it wont help
me to run just a function from the ram and the rest of code from flash....

Thanks

Thales....

________________________________
De: rtstofer
Para: l...
Enviadas: Sexta-feira, 11 de Março de 2011 15:47:15
Assunto: [lpc2000] Re: Load and run code on arm's sram

--- In l..., "Thales" wrote:
>
> Hello,
>
> I'm new in this group and a I need some help.
>
> I'm starting to learning Arm mcu lpc2000 family, i wanna know if can anyone
>explan me how to load and run code in arm's sram
>
> Thanks
>

Here is a tutorial that covers getting a program to run in ram. However, it
also loads the program to ram, not flash. It is not specifically for the LPC
devices although somewhere on the net James Lynch has a tutorial specifically
for the LPC2106 and LPC2148. It's harder to find.

http://www.sparkfun.com/tutorials/66

You would still need to change the linker script to address the sections to ram
(already done in the *_ram scripts) but put them in flash. This is a lot like
the .data segment for programs that run out of flash. There is a linker script
for this as well. You can see how the .data segment is in ram but loaded to
flash. Then you would need to change the startup code to copy the sections from
flash to ram and then branch to the ram code. This is shown in the flash linker
script where the .data segment (initialized ram) is copied from flash to ram.

But why would you want to do all this? Running out of ram isn't much faster, if
at all, and that's because the flash fetches are much wider so fewer are
required. Further, most devices have a LOT of flash and not nearly as much ram.
64k bytes of ram seems like a lot until you realize that it is only 16k
instructions and the compilers generate a LOT of code.

You really don't want to do this...

Richard
Am 14.03.2011 14:32, schrieb Thales Godoy:

> I'll try to explain what I trying to do, I'm writing a code for a "grafic card"
> with arm controller, so I need to transfer a lot of data, that will be made by a
> loop. I guess if I run this part of code from ram the program will run faster
> than if it was running from flash.

Doing "stupid" things like copying data from A to B is what DMA is made
for. Maybe look at this (also).

In general, C-startup copies data or code from load to execution address
and clears .bss .

--
42Bastian
+
| http://www.sciopta.com
| Fastest direct message passing kernel.
| IEC61508 certified.
+
> you ask me why do I want to run code from ram...
> I'll try to explain what I trying to do, I'm writing a code for a
> "grafic card" with arm controller, so I need to transfer a lot of data,
> that will be made by a loop. I guess if I run this part of code from ram
> the program will run faster than if it was running from flash.
> So that tutorial that you send me is intereting, but I guess that it
> wont help me to run just a function from the ram and the rest of code
> from flash....

IIRC there is some GCC-specific attribute that places code in RAM (or
rather, in the data segment, which is copied from ROM to RAM as part of
the startup). For asm I have some macro's in my http://www.voti.nl/mkt/
tool:

// start of a section of assembler code that can be placed
in ROM
.macro mkt_code
.text
.align
.arm
.endm
// start of a section for read/write,
// explicitly initialised data
.macro mkt_data
.data
.align
.endm

.macro mkt_uninitialized
.section .uninitialized
.align
.endm
// start of a section for read/write,
// 0-initialised data
.macro mkt_bss
.bss
.align
.endm
// start of a section for read-only, initialised data
.macro mkt_rodata
.text
.align
.endm
// this used to give problems with Insight, I have
no idea why.
// but with the new gcc and gdb these seem to be solved
.macro mkt_code_separate_section, Name
.text
.section .text.\Name,"ax"
.arm
.align
.endm

// start of a subroutine:
// the label is put in front, and the label is made global
.macro mkt_subroutine, label
mkt_code_separate_section \label
.global \label
\label:
.endm
// start of a code part that must be in RAM
// the user is responsible for jumping to this code!
.macro mkt_code_in_ram
#if mkt_memory == mkt_ram
mkt_code
#else
mkt_code
.data
#endif
.align 4
.endm
// start of an ARM subroutine that must be in RAM
// a trampoline is inserted in ROM so the subroutine
// can be called with a standard BL instruction
.macro mkt_subroutine_in_ram, label, label_in_ram
mkt_subroutine \label
mkt_switch_to_ram \label_in_ram
.endm
// start of an ARM code part that must be in RAM
// the label and a jump to that label are inserted
.macro mkt_switch_to_ram, label
#if mkt_memory == mkt_rom
ldr pc, =\label
mkt_code_in_ram
.section .data.\label
.global \label
.align 4
\label:
#endif
.endm

usage:

mkt_subroutine mkt_busy_wait_us

// calculate the number of CPU cycles * 1e6 to wait in r2, r3
ldr r1, =mkt_cclk
umull r2, r3, r0, r1
ldr r0, =(5 * 1000000 ) // the loop is 3 + 2 CPU cycles

// switch to RAM to avoid unpredictable FLASH code fetch delays
// does not work reliable :(
mkt_switch_to_ram mkt_RAM_busy_wait_us

// wovo
// ldr pc, = joop
// .section .data.joop // this is OK with -sections, but not
without!
//.data
//.global joop
//joop:

// spend the calculated number of cycles
busy_wait_us_loop:

subs r2, r2, r0
sbcs r3, r3, #0
bcs busy_wait_us_loop

// LR contains the full 32-bit return address,
// so no special actions needed here.
mov pc, lr

--

Wouter van Ooijen

-- -------
Van Ooijen Technische Informatica: www.voti.nl
consultancy, development, PICmicro products
docent Hogeschool van Utrecht: www.voti.nl/hvu

can any guide me which to know abt lpc xpresso
that's right, but I need to process the data and transfer it to an i/o port
where wiil be a DAC. So I think that I can't to use DMA to do this.
Thales

________________________________
De: 42Bastian
Para: l...
Enviadas: Segunda-feira, 14 de Março de 2011 10:44:27
Assunto: Re: Res: [lpc2000] Re: Load and run code on arm's sram

Am 14.03.2011 14:32, schrieb Thales Godoy:

> I'll try to explain what I trying to do, I'm writing a code for a "grafic card"
>
> with arm controller, so I need to transfer a lot of data, that will be made by
>a
>
> loop. I guess if I run this part of code from ram the program will run faster
> than if it was running from flash.

Doing "stupid" things like copying data from A to B is what DMA is made
for. Maybe look at this (also).

In general, C-startup copies data or code from load to execution address
and clears .bss .

--
42Bastian
+
| http://www.sciopta.com
| Fastest direct message passing kernel.
| IEC61508 certified.
+
--- In l..., Thales Godoy wrote:
>
> that's right, but I need to process the data and transfer it to an i/o port
> where wiil be a DAC. So I think that I can't to use DMA to do this.
> Thales
>

You're making the ASSUMPTION that code will run faster out of RAM. I believe this has been discredited over the last few years but I'm not certain. The flash is WIDE - multiple instructions are fetched as a single access. NXP states that flash access can keep up with the core at maximum clock rates - at least on the chips I am using (LPC2106 and LPC2148). There is always the issue of executing just one instruction of the several that were fetched and then having a branch. But that's the same problem in SRAM. If you have to flush the pipeline, you have to flush the pipeline.

The neat thing about the ARM is that instructions are conditional. Instead of a branch, the instruction is just ignored. It takes its' turn through the pipeline but nothing is done. It's faster to execute two or three (essentially) NOPs than it is to flush and refill the pipeline.

OK, let's assume I can't talk you out of this insanity. You need to understand all there is to know about segments, how to declare them in code (from GCC manual) and how to place them (GNU LD manual). RTFM comes to mind...

In the normal course of events, all code (often including the startup code - sometimes this has its own segment called .startup) is placed in the .text segment. All initialized data is placed in the .data segment and all unitialized data is allocated in the .bss segment. The .text segment is targeted to flash, unconditionally. The .data segment is targeted to ram but placed in flash and the .bss is just initialized to 0 in the startup code (this is REQUIRED by C which doesn't force the user to initialize ram). Still, we only clear as much as needed. Stack space is NOT initialized.

ENTRY(_startup)

This defines the startup entry point and is actually defined in the startup code.
We have a memory map which declares the location, type and size of each block of memory:

MEMORY
{
flash : ORIGIN = 0, LENGTH = 128K
ram_isp_low(A) : ORIGIN = 0x40000120, LENGTH = 223
ram : ORIGIN = 0x40000200, LENGTH = 64992
ram_isp_high(A) : ORIGIN = 0x4000FFE0, LENGTH = 32
}

I leave the isp areas unused. Maybe it isn't necessary but that's a choice I make. The only memory that will be allocated is flash and ram.

Now we place the various segments in memory:

SECTIONS
{
. = 0; /* set location counter to address zero */

startup : { *(.startup)} >flash /* startup code goes into FLASH */

_start_text = .; // we will need this to copy code to ram

.text : /* collect all sections that should go into FLASH {
*(.text) /* all .text sections (code) */
*(.rodata) /* all .rodata sections (constants, strings, etc.) */
*(.rodata*) /* all .rodata* sections (constants, strings, etc.) */
*(.glue_7) /* all .glue_7 sections (no idea what these are) */
*(.glue_7t)/* all .glue_7t sections (no idea what these are) */
_etext = .; /* define a global symbol _etext just after the last code byte */
} >flash /* put all the above into FLASH */

.data : /* collect all initialized .data sections that go into RAM */
{
_data = .; /* create a global symbol marking the start of the .data section */
*(.data) /* all .data sections */
_edata = .; /* define a global symbol marking the end of the .data section */
} >ram AT >flash /* put all the above into RAM (but load the LMA copy into FLASH) */

.bss : /* collect all uninitialized .bss sections that go into RAM */
{
_bss_start = .; /* define a global symbol marking the start of the .bss section */
*(.bss) /* all .bss sections */
} >ram /* put all the above in RAM (it will be cleared in the startup code */

. = ALIGN(8); /* advance location counter to the next 32-bit boundary */
_bss_end = . ; /* define a global symbol marking the end of the .bss section */
}
_end = .; /* define a global symbol marking the end of application RAM */
PROVIDE (end = .); /* for sbrk */

The .startup segment is the first thing targeted for flash. It MUST be the first code in flash.

You can see how the .text segment is placed in flash and you can contrast that with the way the .data segment is targeted to ram but actually placed in flash (>ram AT >flash )

You can see various symbols defined that help the startup code:
_data and _edata are the beginning and end of the data segment and that's how much is copied from flash to ram.

_bss_start and _bss_end define the limits of uninitialized ram and this area needs to be zeroed in the startup code.

I added _start_text so that the code to copy is between _start_text and _etext.

Look at your startup code and you will see how it deals with these segments.

I suppose you could just target .text (and friends) to >ram AT >flash and have the startup code copy this before it starts copying .data.

You REALLY don't want to do this...

Richard

> You're making the ASSUMPTION that code will run faster out of RAM. I
> believe this has been discredited over the last few years but I'm not
> certain. The flash is WIDE - multiple instructions are fetched as a
> single access. NXP states that flash access can keep up with the core at
> maximum clock rates - at least on the chips I am using (LPC2106 and
> LPC2148). There is always the issue of executing just one instruction of
> the several that were fetched and then having a branch. But that's the
> same problem in SRAM. If you have to flush the pipeline, you have to
> flush the pipeline.

On the LPC2106/2148 code will always run full-speed from RAM, but not
always so from flash (depending on waits, MAM setting, and - for some
MAM settings - how well the accelerator works).

--

Wouter van Ooijen

-- -------
Van Ooijen Technische Informatica: www.voti.nl
consultancy, development, PICmicro products
docent Hogeschool van Utrecht: www.voti.nl/hvu

Hello,
I'm just starting to learn ARM controller and LPC family, and I read in some
manual that the flash's access time LPC controllers is 200ns, so we need of MAM
to run the LPC controller at full speed( I don't know if it still working like
this, because my reference is outdated), so if it's true, when the
program branchs the fist instruction after it will take 200ns to be fetched. how
it don't happend on ram, run into ram is faster then flash...
________________________________
De: Wouter van Ooijen
Para: l...
Enviadas: Segunda-feira, 14 de Março de 2011 16:30:57
Assunto: Re: Res: Res: [lpc2000] Re: Load and run code on arm's sram

> You're making the ASSUMPTION that code will run faster out of RAM. I
> believe this has been discredited over the last few years but I'm not
> certain. The flash is WIDE - multiple instructions are fetched as a
> single access. NXP states that flash access can keep up with the core at
> maximum clock rates - at least on the chips I am using (LPC2106 and
> LPC2148). There is always the issue of executing just one instruction of
> the several that were fetched and then having a branch. But that's the
> same problem in SRAM. If you have to flush the pipeline, you have to
> flush the pipeline.

On the LPC2106/2148 code will always run full-speed from RAM, but not
always so from flash (depending on waits, MAM setting, and - for some
MAM settings - how well the accelerator works).

--

Wouter van Ooijen

-- -------
Van Ooijen Technische Informatica: www.voti.nl
consultancy, development, PICmicro products
docent Hogeschool van Utrecht: www.voti.nl/hvu