EmbeddedRelated.com
Forums

A new benchmark suitable for small systems: stdcbench

Started by Philipp Klaus Krause February 7, 2018
raimond.dragomir@gmail.com writes:
> For example, it seems that the STM8S 16MHz performs better than the C8051 > at 100MHz... an 8051 is almost unbeatable for small control applications > of under 8K program size and max. 256 bytes of internal ram.
I've always had the impression the 8051 was not well suited for commonly used C coding styles and datatypes. It was always intended to be programmed in assembler, has good support for single-bit operations but not much for 16 bit (usual C int type), etc. The STM8 is interesting. I found out about it fairly recently and got some of the small STM8S103F3 boards for various purposes. In small cheap 8 bitters it's often quite attractive compared with AVR and the like. I'm not sure what else is out there that's comparable, except maybe PIC. I see from the colecovision site that Philipp Klaus Krause did most of the SDCC back end, so thanks Philipp!
On 2018-02-11, Philipp Klaus Krause <pkk@spth.de> wrote:
> Here is a small comparison of STM8 results with various current > compilers (all done on the STM8AF5288). > > SDCC 3.7.0 RC1 with optimization for code size (-mstm8 --opt-code-size > --max-allocs-per-node 100000), binary size 20953 B: > > stdcbench 0.3 > > stdcbench c90base score: 106 > stdcbench c90lib score: 87 > stdcbench final score: 193
Are bigger scores better or worse? -- Grant
Am 12.02.2018 um 00:29 schrieb Grant Edwards:
> On 2018-02-11, Philipp Klaus Krause <pkk@spth.de> wrote: >> Here is a small comparison of STM8 results with various current >> compilers (all done on the STM8AF5288). >> >> SDCC 3.7.0 RC1 with optimization for code size (-mstm8 --opt-code-size >> --max-allocs-per-node 100000), binary size 20953 B: >> >> stdcbench 0.3 >> >> stdcbench c90base score: 106 >> stdcbench c90lib score: 87 >> stdcbench final score: 193 > > Are bigger scores better or worse? > > -- > Grant >
Bigger scores are better. Philipp
Am 09.02.2018 um 22:28 schrieb Paul Rubin:
> Philipp Klaus Krause <pkk@spth.de> writes: >> Output for a 98 Mhz C8051F120 (compiled via sdcc -mmcs51 --model-large > > Was that really supposed to say 98 mhz? > > Can you say the code size for the different compiler outputs? > > Could you do the AVR8 the and MSP430 with gcc, if you happen to have > those available? Would the ARM Cortex M0 be getting outside the > intended range of this benchmark? > > Thanks! >
Here's a first reuslt from an ARM Cortex-M0 (the STM32F051R8 at 48 Mhz) compiled using GCC 6.3.1 with -O2 and using newlib-nano: stdcbench 0.3 stdcbench c90base score: 1141 stdcbench c90lib score: 651 stdcbench final score: 1792 Per clock cycle, this Cortex-M0 with GCC gets about twice the score compared to an STM8 with IAR. Interesting is the large difference between the c90base and the c90lib score. I guess newlib-nanao is optimized for code size at the expense of speed (even though it is surprising to see that much of a difference vs. the situation for the STM8). Philipp
mar&#539;i, 20 februarie 2018, 15:00:41 UTC+2, Philipp Klaus Krause a scris:
> Am 09.02.2018 um 22:28 schrieb Paul Rubin: > > Philipp Klaus Krause <pkk@spth.de> writes: > >> Output for a 98 Mhz C8051F120 (compiled via sdcc -mmcs51 --model-large > > > > Was that really supposed to say 98 mhz? > > > > Can you say the code size for the different compiler outputs? > > > > Could you do the AVR8 the and MSP430 with gcc, if you happen to have > > those available? Would the ARM Cortex M0 be getting outside the > > intended range of this benchmark? > > > > Thanks! > > > > Here's a first reuslt from an ARM Cortex-M0 (the STM32F051R8 at 48 Mhz) > compiled using GCC 6.3.1 with -O2 and using newlib-nano: > > stdcbench 0.3 > stdcbench c90base score: 1141 > stdcbench c90lib score: 651 > stdcbench final score: 1792 > > Per clock cycle, this Cortex-M0 with GCC gets about twice the score > compared to an STM8 with IAR. > Interesting is the large difference between the c90base and the c90lib > score. I guess newlib-nanao is optimized for code size at the expense of > speed (even though it is surprising to see that much of a difference vs. > the situation for the STM8). > > Philipp
You can try an -Os variant to see what difference you get. It would be in fact quite interesting. I always thought that -O2 speed gain is not much than the -Os, and so always use -Os ... Can you do some (8bit) AVR tests?
Am 20.02.2018 um 14:13 schrieb raimond.dragomir@gmail.com:
> mar&#539;i, 20 februarie 2018, 15:00:41 UTC+2, Philipp Klaus Krause a scris: >> Am 09.02.2018 um 22:28 schrieb Paul Rubin: >>> Philipp Klaus Krause <pkk@spth.de> writes: >>>> Output for a 98 Mhz C8051F120 (compiled via sdcc -mmcs51 --model-large >>> >>> Was that really supposed to say 98 mhz? >>> >>> Can you say the code size for the different compiler outputs? >>> >>> Could you do the AVR8 the and MSP430 with gcc, if you happen to have >>> those available? Would the ARM Cortex M0 be getting outside the >>> intended range of this benchmark? >>> >>> Thanks! >>> >> >> Here's a first reuslt from an ARM Cortex-M0 (the STM32F051R8 at 48 Mhz) >> compiled using GCC 6.3.1 with -O2 and using newlib-nano: >> >> stdcbench 0.3 >> stdcbench c90base score: 1141 >> stdcbench c90lib score: 651 >> stdcbench final score: 1792 >> >> Per clock cycle, this Cortex-M0 with GCC gets about twice the score >> compared to an STM8 with IAR. >> Interesting is the large difference between the c90base and the c90lib >> score. I guess newlib-nanao is optimized for code size at the expense of >> speed (even though it is surprising to see that much of a difference vs. >> the situation for the STM8). >> >> Philipp > > You can try an -Os variant to see what difference you get.
Not much: stdcbench 0.3 stdcbench c90base score: 1047 stdcbench c90lib score: 607 stdcbench final score: 1654
> It would be in fact quite interesting. I always thought that > -O2 speed gain is not much than the -Os, and so always use -Os ... > > Can you do some (8bit) AVR tests? >
'Don't have one yet, but intend to do a Cortex-M3 and maybe a Z80 test later this week. At some point, I should also put a list of results on http://stdcbench.org/ Philipp
Am 20.02.2018 um 14:38 schrieb Philipp Klaus Krause:
> > 'Don't have one yet, but intend to do a Cortex-M3 and maybe a Z80 test > later this week. At some point, I should also put a list of results on > http://stdcbench.org/ > > Philipp >
Here is a Cortex-M4 (the STM32F302R8 at 64 Mhz - it could do 72 Mhz, but not with the internal oscillator, and my board doesn't have a crystal), using GCC -O2 -mcpu=cortex-m4 -mthumb with newlib-nano: stdcbench 0.3 stdcbench c90base score: 1693 stdcbench c90lib score: 864 stdcbench final score: 2557 Looks only 15% faster per clock cycle than the Cortex-M0. Philipp
Il giorno marted&igrave; 20 febbraio 2018 15:27:28 UTC+1, Philipp Klaus Krause ha scritto:
> Am 20.02.2018 um 14:38 schrieb Philipp Klaus Krause: > > > > 'Don't have one yet, but intend to do a Cortex-M3 and maybe a Z80 test > > later this week. At some point, I should also put a list of results on > > http://stdcbench.org/ > > > > Philipp > > > > Here is a Cortex-M4 (the STM32F302R8 at 64 Mhz - it could do 72 Mhz, but > not with the internal oscillator, and my board doesn't have a crystal), > using GCC -O2 -mcpu=cortex-m4 -mthumb with newlib-nano: > > stdcbench 0.3 > stdcbench c90base score: 1693 > stdcbench c90lib score: 864 > stdcbench final score: 2557 > > Looks only 15% faster per clock cycle than the Cortex-M0. > > Philipp
Following the "Definitive Guide to ARM CortexM0-M0+" the performance of avrious Cortex M are: Features Cortex-M0 Cortex-M0+ Cortex-M3 Cortex-M4 Cortex-M7 Dhrystone 2.1 (per MHz) 0.9 0.95 1.25 1.25 2.14 CoreMark 1.0 (per MHz) 2.33 2.46 3.34 3.40 5.01 So maybe there is something that the M4 doesn't like too much or gcc doesn't optimize very well for the M4 (with the option used). Bye Jack
Am 20.02.2018 um 16:14 schrieb Jack:
> Il giorno marted&igrave; 20 febbraio 2018 15:27:28 UTC+1, Philipp Klaus Krause ha scritto: >> Am 20.02.2018 um 14:38 schrieb Philipp Klaus Krause: >>> >>> 'Don't have one yet, but intend to do a Cortex-M3 and maybe a Z80 test >>> later this week. At some point, I should also put a list of results on >>> http://stdcbench.org/ >>> >>> Philipp >>> >> >> Here is a Cortex-M4 (the STM32F302R8 at 64 Mhz - it could do 72 Mhz, but >> not with the internal oscillator, and my board doesn't have a crystal), >> using GCC -O2 -mcpu=cortex-m4 -mthumb with newlib-nano: >> >> stdcbench 0.3 >> stdcbench c90base score: 1693 >> stdcbench c90lib score: 864 >> stdcbench final score: 2557 >> >> Looks only 15% faster per clock cycle than the Cortex-M0. >> >> Philipp > > Following the "Definitive Guide to ARM CortexM0-M0+" the performance of avrious Cortex M are: > > Features Cortex-M0 Cortex-M0+ Cortex-M3 Cortex-M4 Cortex-M7 > Dhrystone 2.1 (per MHz) 0.9 0.95 1.25 1.25 2.14 > CoreMark 1.0 (per MHz) 2.33 2.46 3.34 3.40 5.01 > > So maybe there is something that the M4 doesn't like too much or gcc doesn't optimize very well for the M4 (with the option used). > > Bye Jack >
I probably won't find time for testing until later this week, but for now I suspect it is the flash: The flash in the STM32F051 (Cortex-M0) at 48 Mhz needs 1 wait state, while the flash in the STM32F302 (Cortex-M4) at 64 Mhz needs 2 wait states. Both devices have a prefetch buffer that is supposed to somewhat reduce the effect of the wait states on program execution. Philipp
On 20.2.18 15:00, Philipp Klaus Krause wrote:
> Am 09.02.2018 um 22:28 schrieb Paul Rubin: >> Philipp Klaus Krause <pkk@spth.de> writes: >>> Output for a 98 Mhz C8051F120 (compiled via sdcc -mmcs51 --model-large >> >> Was that really supposed to say 98 mhz? >> >> Can you say the code size for the different compiler outputs? >> >> Could you do the AVR8 the and MSP430 with gcc, if you happen to have >> those available? Would the ARM Cortex M0 be getting outside the >> intended range of this benchmark? >> >> Thanks! >> > > Here's a first reuslt from an ARM Cortex-M0 (the STM32F051R8 at 48 Mhz) > compiled using GCC 6.3.1 with -O2 and using newlib-nano: > > stdcbench 0.3 > stdcbench c90base score: 1141 > stdcbench c90lib score: 651 > stdcbench final score: 1792 > > Per clock cycle, this Cortex-M0 with GCC gets about twice the score > compared to an STM8 with IAR. > Interesting is the large difference between the c90base and the c90lib > score. I guess newlib-nanao is optimized for code size at the expense of > speed (even though it is surprising to see that much of a difference vs. > the situation for the STM8). > > Philipp >
GCC -Os does wonders compared to the standard compiled newlib. -- -TV