Folk,
I'm doing a little project, my first with Arduinos. I had
assumed that the choice to use C++ meant that the APIs would
be fancy object-oriented APIs that generate inline assembly
code for performance. I normally do more bare-metal stuff,
including building C++ APIs for the peripherals of the
MC68HC11 more than a decade ago, so I was keen to see what can
be achieve using more modern C++ compilers.
To say I've been disappointed is an understatement. The
standard of the code is simply awful. The g++ compiler is
fantastic, but the Arduino APIs just don't use that power.
As an example, "digitalWrite" takes over 50 cycles, compared
to the expected 2. I know that there are libraries that work
faster, but why are the default libraries so bad? Even calling
these methods takes at least *three* times the code space
that's required. I drilled in to see what's going on, but
that's not the topic here. I wanted to show how things could
be better, and to see if anyone here is interested in making
it happen (personally I actually want to do this for ST's ARM
range, but will assist if someone wants to do AVR versions).
Using template metaprogramming, we can get nice object-oriented
APIs that also map directly to the hardware instructions.
Unfortunately it's not easy to use the existing Arduino port
definitions as template parameters, which might mean having to
redefine some of the #defines of the low- level hardware (more
below). So here's a minimal example that works, and shows what
could be achieved by following this route:
template <int Port, uint8_t Mask>
class Pin
{
public:
Pin& operator=(bool b)
{
if (b)
*(volatile uint8_t*)Port |= Mask;
else
*(volatile uint8_t*)Port &= ~Mask;
return *this;
}
};
Pin<0x25, 0x01> portBp0;
Folk,
I'm doing a little project, my first with Arduinos. I had
assumed that the choice to use C++ meant that the APIs would
be fancy object-oriented APIs that generate inline assembly
code for performance. I normally do more bare-metal stuff,
including building C++ APIs for the peripherals of the
MC68HC11 more than a decade ago, so I was keen to see what can
be achieve using more modern C++ compilers.
To say I've been disappointed is an understatement. The
standard of the code is simply awful. The g++ compiler is
fantastic, but the Arduino APIs just don't use that power.
As an example, "digitalWrite" takes over 50 cycles, compared
to the expected 2. I know that there are libraries that work
faster, but why are the default libraries so bad? Even calling
these methods takes at least *three* times the code space
that's required. I drilled in to see what's going on, but
that's not the topic here. I wanted to show how things could
be better, and to see if anyone here is interested in making
it happen (personally I actually want to do this for ST's ARM
range, but will assist if someone wants to do AVR versions).
Using template metaprogramming, we can get nice object-oriented
APIs that also map directly to the hardware instructions.
Unfortunately it's not easy to use the existing Arduino port
definitions as template parameters, which might mean having to
redefine some of the #defines of the low- level hardware (more
below). So here's a minimal example that works, and shows what
could be achieved by following this route:
template <int Port, uint8_t Mask>
class Pin
{
public:
Pin& operator=(bool b)
{
if (b)
*(volatile uint8_t*)Port |= Mask;
else
*(volatile uint8_t*)Port &= ~Mask;
return *this;
}
};
Pin<0x25, 0x01> portBp0;
Note that the 0x25 is the memory-mapped address of PORTB (its
I/O address is 0x05, but memory-mapping adds an offset of
0x20, if I understand the AVR hardware correctly).
Now, when I write "portBp0 = 1;" I get exactly one instruction
emitted ("sbi") which takes the expected 2 cycles (1 in -Mega).
Same deal for "portBp0 = 0;", the instruction is "cbi". Both
are single-word instructions, whereas a call to digitalWrite
takes three or four words of code space.
Note that I would have preferred to define the template like
this:
template <volatile uint8_t* Port, uint8_t Mask>
class Pin
{...};
Which allows removing the casts on uses of Port, but to be
able to instantiate the template requires a cast:
Pin<PORTB, 0x01> portBp0;
which translates roughly to:
Pin<(volatile uint8_t*)0x25, 0x01> portBp0;
... and that's not valid for a template parameter. The only
method I know that does work is to define the port variable as
extern, in a particular section, and use the linker script or
the linker option --just-symbols to define the location. This
means we can also use a C++ reference instead of a pointer:
extern volatile uint8_t PortB; // address provided to the linker
template <volatile uint8_t& Port, uint8_t Mask>
class Pin
{...};
Pin<PortB, 0x01> portBp1;
It's quite a lot of fiddling to use a linker script, but
using --just-symbols is easy enough; either way you can't
use the standard AVR header files for the values :(.
One option might be to define a structure for all the
registers in a given AVR variant (and just locate the
structure using --just-symbols), e.g.
extern struct {
...
volatile uint8_t PortB; // ... at address 0x25 in the structure.
...
} CPU;
void clear_B()
{
CPU.PortB = 0;
}
The other advantage of using templates is that we can
specialise them to set up the port correctly, and to check for
collisions in port usage:
template <volatile uint8_t& Port, uint8_t Mask>
class OutputPin : public Pin<Port, Mask>
{
OutputPin()
{
// (Check with a pin registry that this pin
// isn't already assigned to something else?)
// Set up port direction...
}
...
};
This also means that you can dynamically assign port pins just
by defining a local variable in a function, and the pin will
be set up for you when you hit that function.
With more work, you could set up templates for whole ports, or
for ranges of pins on the same port:
template <volatile uint8_t& Port, uint8_t Mask, int Shift>
class PinRange
{
public:
operator int()
{
return (Port&Mask) >> Shift;
};
PinRange& operator=(int val)
{
Port = (Port&~Mask) | ((val << Shift)&Mask);
return *this;
};
Pin& operator++()
extern struct {
...
volatile uint8_t PortB; // ... at address 0x25 in the structure.
...
} CPU;
void clear_B()
{
CPU.PortB = 0;
}
The other advantage of using templates is that we can
specialise them to set up the port correctly, and to check for
collisions in port usage:
template <volatile uint8_t& Port, uint8_t Mask>
class OutputPin : public Pin<Port, Mask>
{
OutputPin()
{
// (Check with a pin registry that this pin
// isn't already assigned to something else?)
// Set up port direction...
}
...
};
This also means that you can dynamically assign port pins just
by defining a local variable in a function, and the pin will
be set up for you when you hit that function.
With more work, you could set up templates for whole ports, or
for ranges of pins on the same port:
template <volatile uint8_t& Port, uint8_t Mask, int Shift>
class PinRange
{
public:
operator int()
{
return (Port&Mask) >> Shift;
};
PinRange& operator=(int val)
{
Port = (Port&~Mask) | ((val << Shift)&Mask);
return *this;
};
Pin& operator++()
{
*this = (int)*this + 1;
return *this;
};
// ..., etc
};
PinRange<CPU.PortB, 0x16, 2> portBpins23and4;
The G++ compiler is theoretically quite capable of turning all
these templates and meta-programming into the most efficient
possible inline assembly code, with none of the downsides of
the Arduino approach.
I spent a few hours playing with this approach, and when you
go to "extern" definitions with the address provided to the
linker, the compiler no longer recognises that it can
substitute "sbi" for "ldw", "or" and "stw", so you get
long-form code.
I tried to force the issue using inline "asm" calls to the SBI
instruction, but then gcc won't coerce the (unknown, but
possibly 16-bit) address into the 6-bit field, even when I try
various ways to force it. I think that Atmel have hacked gcc
just enough to work for the cases they care about.
The upshot of that is I can't make proper use of a "struct"
(because I can't locate it in memory). The address must be
a constant whose value is visible to the compiler, not just
constant at link time.
I.e. I can't see any way to use SBI/CLI instructions on
registers in this struct:
struct __attribute__((packed)) AvrIOPort {
uint8_t pin;
uint8_t ddr;
uint8_t data;
};
extern volatile AvrIOPort PortB; // Address set by a linker option
or the low 0x20 bytes of my much larger "CPU" structure (which
maps the entire 0xFF block).
Here is the code which fails:
{
*this = (int)*this + 1;
return *this;
};
// ..., etc
};
PinRange<CPU.PortB, 0x16, 2> portBpins23and4;
The G++ compiler is theoretically quite capable of turning all
these templates and meta-programming into the most efficient
possible inline assembly code, with none of the downsides of
the Arduino approach.
I spent a few hours playing with this approach, and when you
go to "extern" definitions with the address provided to the
linker, the compiler no longer recognises that it can
substitute "sbi" for "ldw", "or" and "stw", so you get
long-form code.
I tried to force the issue using inline "asm" calls to the SBI
instruction, but then gcc won't coerce the (unknown, but
possibly 16-bit) address into the 6-bit field, even when I try
various ways to force it. I think that Atmel have hacked gcc
just enough to work for the cases they care about.
The upshot of that is I can't make proper use of a "struct"
(because I can't locate it in memory). The address must be
a constant whose value is visible to the compiler, not just
constant at link time.
I.e. I can't see any way to use SBI/CLI instructions on
registers in this struct:
struct __attribute__((packed)) AvrIOPort {
uint8_t pin;
uint8_t ddr;
uint8_t data;
};
extern volatile AvrIOPort PortB; // Address set by a linker option
or the low 0x20 bytes of my much larger "CPU" structure (which
maps the entire 0xFF block).
Here is the code which fails:
template <volatile AvrIOPort& Port, uint8_t Number>
class Pin
{
public:
Pin& operator=(bool b)
{
if (b)
// Port.data |= (01<<Number);
asm volatile(
" sbi %[portdata],%[portbit]\n"
: // Output Operands
: // Input Operands
[portdata] "I" (&Port.data),
[portbit] "I" (01<<Number)
:
);
else
// Port.data &= ~(01<< Number);
asm volatile(
" cbi %[portdata],%[portbit]\n"
: // Output Operands
: // Input Operands
[portdata] "I" (&Port.data),
[portbit] "I" (01<<Number)
: // Clobbers
);
return *this;
}
void output() { }
};
Pin<PortB, 0> portBp0;
The compiler can't see that the (external) address of
"Port.data" can be fit into a 6-bit field (specified by the
"I" parameter type), so it complains "impossible constraint".
It's still faster than calling digitalWrite, but not much
smaller.
I can still make this all work using #defines for all the
register addresses, but it's a lot uglier than using structs.
I hope I don't have the same problem with the ARM version of
gcc.
Anyhow, I hope I've piqued someone's interest. Your comments
would be welcome.
Clifford Heath.