Re: PicBASIC speed and code efficiency

Started by xob_jt August 10, 2005


Why not look at Proton? It covers all the Pic ranges from 12 to 18
series. It even comes with an optimiser.

Tim Box --- In piclist@picl..., Eirik Karlsen <eikarlse@o...> wrote:
> I'm about to start on a major program, which I
> expect to be some 32K in assembly.
> To save me getting some additional gray hairs
> I'm sniffing at BASIC for this project.
> But I need -SPEED- can't afford to loose much
> of it compared to a well-written assembler program. > Can anyone say something about PicBASIC's
> execution speed and code efficiency, compared
> to manually (well)written assembly programs ?
>
> --
> *******************************************
> VISIT MY HOME PAGE:
> <http://home.online.no/~eikarlse/index.htm>
> LAST UPDATED: 23/08/2003
> *******************************************
> Regards
> Eirik Karlsen


--- In piclist@picl..., "xob_jt" <tim@t...> wrote:
>
>
> Why not look at Proton? It covers all the Pic ranges from 12 to 18
> series. It even comes with an optimiser.
>
> Tim Box

That's an interesting comment: "comes with an optimiser"

Does this mean the optimiser runs as a seperate stage?

Regards
Sergio Masci

http://www.xcprod.com/titan/XCSB - optimising PIC compiler
FREE for personal non-commercial use
.




Hi sergio

As you will well know that writing a compiler for a 16 series pic is
a nightmare. You have to be aware that any jump could take you over
a page boundary so you have to set the PCLATH before you do the jump
even if your already in the same page as you have no idea in advance
what page the label is in. Unless you have like you do a bespoke
Assembler or you adapt your code to use the linker in Mpasm you have
to carry these overheads.

The 18 series is better as the memory is page-less but you still can
gain the benefits of using Bra as opposed to goto so shave the code
used down to a minimum.

What proton does on the 16 series is set markers etc in the code and
lets Mpasm do it's thing, once it has produced the lst file it will
analyse the address's and markers to work out where the end labels
are and rewrite it so only where it has to jump it actually mess's
with the pclath. 18 series again it does the same kind of thing and
replaces Goto with Bra etc.

That was a very simple explanation of how it handles the paging
issues I was told more but it gets quite complex in detail not an
easy subject. It does not stop at playing with the pclath, there are
a number of tricks you can use to create compact code from looking
to not reload the Wreg with data it already has to removing
consecutive CALLs, RETURN mnemonics and replace them with a single
GOTO mnemonic.

As the code is being compiled there is always a level of
optimisation going on. This also includes a complex bank management
system. Other compilers get round this problem by resetting the bank
to 0 after every label. There are so may ways that Proton saves code
one last example

Printing or serial transmission of quoted strings of characters on
devices that can access their own code memory Proton places the
ASCII text in code memory as a string of text then uses a label
print system so you don't get the Movlw "H", Call dataout,
Movlw "E", Call dataout etc system as you could do. Basically if it
is more than 7 chars it's worth doing.

Re levels the best thing I thing is to let the compiler writer
explain it all. This is taken from the manual.

USING the OPTIMISER 12 14 16

The underlying assembler code produced by the compiler is the single
most important element to a good language, because compact assembler
not only means more can be squeezed into the tight confines of the
PICmicro, but also the code runs faster which allows more complex
operations to be performed. And even though the compiler already
produces good underlying assembler mnemonics, there is always room
for improvement and that improvement is achieved by a separate
optimising pass.
The optimiser is enabled by issuing the DECLARE: -

OPTIMISER_LEVEL = n

Where n is the level of optimisation required.

The DECLARE should be placed at the top of the BASIC program, but
anywhere in the code is actually acceptable because once the
optimiser is enabled it cannot be disabled later in the same program.

As of version 3.1 of the compiler, the optimiser has 6 levels, 7 if
you include OFF as a level.

Level 0 disables the optimiser.
Level 1 Chooses the appropriate branching mnemonics when using a 16-
bit core (18F) device, and actively chooses the appropriate page
switching mnemonics when using a 14-bit core (16F) device.

This is the single most important optimising pass for larger
PICmicrotm devices, especially 14-bit core (16F) types because page
switching is a very code hungry operation. For 16-bit core (18F)
types it will replace CALL with RCALL and GOTO with BRA whenever
appropriate, saving 1 byte of code space every time.

Level 2 Re-arranges some branching operations on both 14 and 16-bit
core devices. Again, this is an important optimising pass because a
single program can implement many decision making mnemonics.
Level 3 Removes consecutive CALL, RETURN mnemonics and replaces them
with a single GOTO mnemonic.
Level 4 Looks for a MOVF VAR,W mnemonic followed by a MOVWF VAR
mnemonic, each using the same variable or register. If found, the
MOVWF VAR mnemonic is removed because both the WREG and the variable
are already loaded and the following mnemonic is not required.

This sequence of mnemonics is not common in programs, but does
happen from time to time. Therefore level 4 optimisation may not
show any decrease in code size.

Level 5 Looks for a MOVWF VAR,W mnemonic followed by a MOVF VAR,W
mnemonic, each using the same variable or register. If found, the
MOVF VAR,W mnemonic is removed because both the WREG and the
variable are already loaded and the following mnemonic is not
required.

As with level 4, this sequence of mnemonics is not common in
programs, but does happen from time to time. Therefore level 5
optimisation may not show any decrease in code size.

Level 6 Looks for an ANDLW CONSTANT mnemonic followed by another
ANDLW CONSTANT mnemonic. If found, the first ANDLW CONSTANT mnemonic
is removed because the second mnemonic will override the first
anyway.

As with level 4 and 5, this sequence of mnemonics is not common in
programs, very rare in fact, but does happen from time to time.
Therefore level 6 optimisation may not show any decrease in code
size.

Each optimiser level uses the previous one, so level 3 implements
level 1 and level 2 as well as level 3.

You must be aware that optimising code, especially paged code found
in the larger 14-bit core (16F) devices can, in some circumstances,
have a detrimental effect on a program if it misses a page boundary,
this is true of all optimisation on all compilers and is something
that you should take into account.

Always try to write and test your program without the optimiser
pass. Then once it's working as expected, enable the optimiser a
level at a time. However, this is not always possible with larger
programs that will not fit within the PICmicrotm without
optimisation. In this circumstance, choose level 1 optimisation
whenever the code is reaching the limits of the PICmicro, testing
the code as you go along.

Caveats
Of course there's no such thing as a free lunch, and there are
some
features that cannot be used when implementing the optimiser.

The main one is that the optimiser is not supported with 12-bit core
devices.

Also, the ORG directive is not allowed with 14-bit core (16F)
devices when using the optimiser, but can be used with 16-bit core
(18F) devices.

When using 16-bit core devices, do not use the MOVFW macro as this
will cause problems withing the ASM listing, use the correct
mnemonic of MOVF VAR , W.

On all devices, do not use the assembler LIST and NOLIST directives,
as the optimiser uses these to sculpt the final ASM used.
--- In piclist@picl..., "smxcu" <smypl@x> wrote:
> --- In piclist@picl..., "xob_jt" <tim@t...> wrote:
> >
> >
> > Why not look at Proton? It covers all the Pic ranges from 12 to
18
> > series. It even comes with an optimiser.
> >
> > Tim Box
>
> That's an interesting comment: "comes with an optimiser"
>
> Does this mean the optimiser runs as a seperate stage?
>
> Regards
> Sergio Masci
>
> http://www.xcprod.com/titan/XCSB - optimising PIC compiler
> FREE for personal non-commercial use >
> .


--- In piclist@picl..., "xob_jt" <tim@t...> wrote:
>
> Hi sergio
>
> As you will well know that writing a compiler for a 16 series pic is
> a nightmare. You have to be aware that any jump could take you over
> a page boundary so you have to set the PCLATH before you do the jump
> even if your already in the same page as you have no idea in advance
> what page the label is in. Unless you have like you do a bespoke
> Assembler or you adapt your code to use the linker in Mpasm you have
> to carry these overheads.

Hi Tim,

The bespoke assembler of which you speek called XCASM performs a
phenonminal job of optimising RAM page and code bank management
because it profiles the executable during each pass and inserts select
instructions only where they are absolutly needed. It tracks the
active page and bank through subroutine calls, jumps, skips etc. It
looks at all possible execution paths through an instruction and
determins if that instruction is safe to execute without inserting
bank and page select instructions before it.

A side effect of this is that the assembler can output a profile as
part of the assembly listing which shows all paths through an
instruction so it is actually possible for the user to work backwards
and find obscure paths and consequently locate bugs easier.

Real world tests (not contrived benchmarks) show that increasing the
size of a program from a single code bank to multiple code banks adds
a bank select overhead of between just 3 and 4%

>
> The 18 series is better as the memory is page-less but you still can
> gain the benefits of using Bra as opposed to goto so shave the code
> used down to a minimum.
>
> What proton does on the 16 series is set markers etc in the code and
> lets Mpasm do it's thing, once it has produced the lst file it will
> analyse the address's and markers to work out where the end labels
> are and rewrite it so only where it has to jump it actually mess's
> with the pclath. 18 series again it does the same kind of thing and
> replaces Goto with Bra etc.

Yes the 18 series is a DIFFERENT processor to the 16 series so yes you
can optimise a few bits here and there BUT these processors optimise
in different way. If you write a compiler for the 16 series with a
dedicated optimiser, tweaking it a bit will NOT produce optimal 18
series executables. You need a dedicated 18 series optimiser for that.

>
> That was a very simple explanation of how it handles the paging
> issues I was told more but it gets quite complex in detail not an
> easy subject. It does not stop at playing with the pclath, there are
> a number of tricks you can use to create compact code from looking
> to not reload the Wreg with data it already has

XCSB does not do this because it optimises the way data is moved
through the W register and it organises instruction sequences so that
partial results stay in the W register as long as possible. If you write:

A = B + C

the compiler will generate

movf B,w
addwf C,w
addwf D,w
movwf A > to removing
> consecutive CALLs, RETURN mnemonics and replace them with a single
> GOTO mnemonic.

XCSB does not do this. Instead it performs other optimisations which
give much greater savings. For example it looks at the way functions
are used, how their parameters and return results are used. It will
actually remove some parameters and implant the variables passed
directly into the function that is being called if it is safe to do
so. It will promote the return result if it is safe to do so removing
return overheads. Some optimisations are very complicated and lead the
compiler to generate the type of monalithic lump of code that an
expert would produce to remove high level overheads.

So by breaking your code down into managable well written functions
you do NOT incure many of the penalties you would with other compilers.

I thank you for your input on the this. You may like to look at a
discussion that was held in a newsgroup some time ago comparing the
the output of the Hitech C compiler and the XCSB compiler

http://groups.google.co.uk/group/sci.electronics.basics/browse_thread/thread/32e180c24bba32c0/5ee1af94d6a54312?lnk=st&q=sergio+masci+crc&rnum=1&hl=en#5ee1af94d6a54312

In this thread it became aparent that the XCSB compiler could do with
a further loop optimisation which would further reduce the size of the
generated executable. This has now been done and the XCSB generated
executable is now actually 4 instructions shorter. So whereas it
kicked a*s before, it does so with steel toe cap shoes now :-)

Regards
Sergio Masci

http://www.xcprod.com/titan/XCSB - optimising PIC compiler
FREE for personal non-commercial use

.