Reply by Philipp Klaus Krause March 5, 20182018-03-05
Am 20.02.2018 um 17:30 schrieb Philipp Klaus Krause:
> > I probably won't find time for testing until later this week, but for > now I suspect it is the flash: The flash in the STM32F051 (Cortex-M0) at > 48 Mhz needs 1 wait state, while the flash in the STM32F302 (Cortex-M4) > at 64 Mhz needs 2 wait states. Both devices have a prefetch buffer that > is supposed to somewhat reduce the effect of the wait states on program > execution. > > Philipp >
Flash wait cycles indeed make a big difference with the STM32s: At stdcbench 0.4, the STM32F103 (Cortex-M3) at 36 Mhz with prefetch buffer enabled gets 22% higher scores with 1 wait state vs. 2 wait states. Philipp
Reply by David Brown February 21, 20182018-02-21
On 21/02/18 08:37, raimond.dragomir@gmail.com wrote:
> marți, 20 februarie 2018, 21:58:12 UTC+2, Philipp Klaus Krause a scris: >> Am 20.02.2018 um 15:27 schrieb Philipp Klaus Krause: >>> Am 20.02.2018 um 14:38 schrieb Philipp Klaus Krause: >>>> >>>> 'Don't have one yet, but intend to do a Cortex-M3 and maybe a Z80 test >>>> later this week. At some point, I should also put a list of results on >>>> http://stdcbench.org/ >>>> >>>> Philipp >>>> >>> >>> Here is a Cortex-M4 (the STM32F302R8 at 64 Mhz - it could do 72 Mhz, but >>> not with the internal oscillator, and my board doesn't have a crystal), >>> using GCC -O2 -mcpu=cortex-m4 -mthumb with newlib-nano: >>> >>> stdcbench 0.3 >>> stdcbench c90base score: 1693 >>> stdcbench c90lib score: 864 >>> stdcbench final score: 2557 >>> >>> Looks only 15% faster per clock cycle than the Cortex-M0. >>> >>> Philipp >>> >> >> >> And from the opposite end of the performance spectrum, a Cycpress EZ-USB >> FX2LP at 48 Mhz, compiled using SDCC 3.7.0 RC2, sdcc -mmcs51 >> --model-large --stack-auto --code-loc 0x0000 --code-size 0x3500 >> --xram-loc 0x3500 --xram-size 0x0b00 --opt-code-speed >> --max-allocs-per-node 10000: >> >> stdcbench 0.3 >> stdcbench c90base score: 12 >> stdcbench final score: 12 >> >> Philipp > > Is it 12clk/instruction or something? In this case 48MHz is misleading. > Same for PICs which are 4clks/instr. Microchip usually give another > number, the Mips, for example 64MHz/16Mips. > > So the Cypress chips is 48MHz/4Mips maybe? > > When I worked with 12clks/instr 8051 I always talked about it as 1Mips cpu. >
It is common for modern 8051 implementations to have 4 oscillator clocks per instruction clocks. The original 8051 had 12 oscillator clocks per instruction clock. Most single byte register-to-register operations take 1 instruction clock. Memory access adds to that, as do multi-byte instructions, jumps, calls, etc.
Reply by Philipp Klaus Krause February 21, 20182018-02-21
Am 21.02.2018 um 08:37 schrieb raimond.dragomir@gmail.com:
>> And from the opposite end of the performance spectrum, a Cycpress EZ-USB >> FX2LP at 48 Mhz, compiled using SDCC 3.7.0 RC2, sdcc -mmcs51 >> --model-large --stack-auto --code-loc 0x0000 --code-size 0x3500 >> --xram-loc 0x3500 --xram-size 0x0b00 --opt-code-speed >> --max-allocs-per-node 10000: >> >> stdcbench 0.3 >> stdcbench c90base score: 12 >> stdcbench final score: 12 >> >> Philipp > > Is it 12clk/instruction or something? In this case 48MHz is misleading. > Same for PICs which are 4clks/instr. Microchip usually give another > number, the Mips, for example 64MHz/16Mips. > > So the Cypress chips is 48MHz/4Mips maybe? > > When I worked with 12clks/instr 8051 I always talked about it as 1Mips cpu. >
The Cypress EZ-USB can execute most 1-byte instructions in 4 clock cycles. Most 2-byte instructions take 8 clock cycles. Branch instructions tend to take 16 clock cycles. A few instrcutions take even longer. I gave the 48 Mhz figure mostly for reproduction of results (though the port can be found in the examples for stdcbench now anyway). Philipp
Reply by February 21, 20182018-02-21
marți, 20 februarie 2018, 21:37:33 UTC+2, Tauno Voipio a scris:
> On 20.2.18 15:00, Philipp Klaus Krause wrote: > > Am 09.02.2018 um 22:28 schrieb Paul Rubin: > >> Philipp Klaus Krause <pkk@spth.de> writes: > >>> Output for a 98 Mhz C8051F120 (compiled via sdcc -mmcs51 --model-large > >> > >> Was that really supposed to say 98 mhz? > >> > >> Can you say the code size for the different compiler outputs? > >> > >> Could you do the AVR8 the and MSP430 with gcc, if you happen to have > >> those available? Would the ARM Cortex M0 be getting outside the > >> intended range of this benchmark? > >> > >> Thanks! > >> > > > > Here's a first reuslt from an ARM Cortex-M0 (the STM32F051R8 at 48 Mhz) > > compiled using GCC 6.3.1 with -O2 and using newlib-nano: > > > > stdcbench 0.3 > > stdcbench c90base score: 1141 > > stdcbench c90lib score: 651 > > stdcbench final score: 1792 > > > > Per clock cycle, this Cortex-M0 with GCC gets about twice the score > > compared to an STM8 with IAR. > > Interesting is the large difference between the c90base and the c90lib > > score. I guess newlib-nanao is optimized for code size at the expense of > > speed (even though it is surprising to see that much of a difference vs. > > the situation for the STM8). > > > > Philipp > > > > GCC -Os does wonders compared to the standard compiled newlib. > > -- > > -TV
Speaking of that, we always take the newlib/nelib-nano/whatever-lib for granted. An interesting possibility would be to compile the same lib bench without lib calls (of course, another version of the program doing the exact same thing but with no lib calls) Just to see how we stand. It would be quite a good indicator of the performance of the lib.
Reply by February 21, 20182018-02-21
mar&#539;i, 20 februarie 2018, 21:58:12 UTC+2, Philipp Klaus Krause a scris:
> Am 20.02.2018 um 15:27 schrieb Philipp Klaus Krause: > > Am 20.02.2018 um 14:38 schrieb Philipp Klaus Krause: > >> > >> 'Don't have one yet, but intend to do a Cortex-M3 and maybe a Z80 test > >> later this week. At some point, I should also put a list of results on > >> http://stdcbench.org/ > >> > >> Philipp > >> > > > > Here is a Cortex-M4 (the STM32F302R8 at 64 Mhz - it could do 72 Mhz, but > > not with the internal oscillator, and my board doesn't have a crystal), > > using GCC -O2 -mcpu=cortex-m4 -mthumb with newlib-nano: > > > > stdcbench 0.3 > > stdcbench c90base score: 1693 > > stdcbench c90lib score: 864 > > stdcbench final score: 2557 > > > > Looks only 15% faster per clock cycle than the Cortex-M0. > > > > Philipp > > > > > And from the opposite end of the performance spectrum, a Cycpress EZ-USB > FX2LP at 48 Mhz, compiled using SDCC 3.7.0 RC2, sdcc -mmcs51 > --model-large --stack-auto --code-loc 0x0000 --code-size 0x3500 > --xram-loc 0x3500 --xram-size 0x0b00 --opt-code-speed > --max-allocs-per-node 10000: > > stdcbench 0.3 > stdcbench c90base score: 12 > stdcbench final score: 12 > > Philipp
Is it 12clk/instruction or something? In this case 48MHz is misleading. Same for PICs which are 4clks/instr. Microchip usually give another number, the Mips, for example 64MHz/16Mips. So the Cypress chips is 48MHz/4Mips maybe? When I worked with 12clks/instr 8051 I always talked about it as 1Mips cpu.
Reply by Philipp Klaus Krause February 20, 20182018-02-20
Am 20.02.2018 um 15:27 schrieb Philipp Klaus Krause:
> Am 20.02.2018 um 14:38 schrieb Philipp Klaus Krause: >> >> 'Don't have one yet, but intend to do a Cortex-M3 and maybe a Z80 test >> later this week. At some point, I should also put a list of results on >> http://stdcbench.org/ >> >> Philipp >> > > Here is a Cortex-M4 (the STM32F302R8 at 64 Mhz - it could do 72 Mhz, but > not with the internal oscillator, and my board doesn't have a crystal), > using GCC -O2 -mcpu=cortex-m4 -mthumb with newlib-nano: > > stdcbench 0.3 > stdcbench c90base score: 1693 > stdcbench c90lib score: 864 > stdcbench final score: 2557 > > Looks only 15% faster per clock cycle than the Cortex-M0. > > Philipp >
And from the opposite end of the performance spectrum, a Cycpress EZ-USB FX2LP at 48 Mhz, compiled using SDCC 3.7.0 RC2, sdcc -mmcs51 --model-large --stack-auto --code-loc 0x0000 --code-size 0x3500 --xram-loc 0x3500 --xram-size 0x0b00 --opt-code-speed --max-allocs-per-node 10000: stdcbench 0.3 stdcbench c90base score: 12 stdcbench final score: 12 Philipp
Reply by Tauno Voipio February 20, 20182018-02-20
On 20.2.18 15:00, Philipp Klaus Krause wrote:
> Am 09.02.2018 um 22:28 schrieb Paul Rubin: >> Philipp Klaus Krause <pkk@spth.de> writes: >>> Output for a 98 Mhz C8051F120 (compiled via sdcc -mmcs51 --model-large >> >> Was that really supposed to say 98 mhz? >> >> Can you say the code size for the different compiler outputs? >> >> Could you do the AVR8 the and MSP430 with gcc, if you happen to have >> those available? Would the ARM Cortex M0 be getting outside the >> intended range of this benchmark? >> >> Thanks! >> > > Here's a first reuslt from an ARM Cortex-M0 (the STM32F051R8 at 48 Mhz) > compiled using GCC 6.3.1 with -O2 and using newlib-nano: > > stdcbench 0.3 > stdcbench c90base score: 1141 > stdcbench c90lib score: 651 > stdcbench final score: 1792 > > Per clock cycle, this Cortex-M0 with GCC gets about twice the score > compared to an STM8 with IAR. > Interesting is the large difference between the c90base and the c90lib > score. I guess newlib-nanao is optimized for code size at the expense of > speed (even though it is surprising to see that much of a difference vs. > the situation for the STM8). > > Philipp >
GCC -Os does wonders compared to the standard compiled newlib. -- -TV
Reply by Philipp Klaus Krause February 20, 20182018-02-20
Am 20.02.2018 um 16:14 schrieb Jack:
> Il giorno marted&igrave; 20 febbraio 2018 15:27:28 UTC+1, Philipp Klaus Krause ha scritto: >> Am 20.02.2018 um 14:38 schrieb Philipp Klaus Krause: >>> >>> 'Don't have one yet, but intend to do a Cortex-M3 and maybe a Z80 test >>> later this week. At some point, I should also put a list of results on >>> http://stdcbench.org/ >>> >>> Philipp >>> >> >> Here is a Cortex-M4 (the STM32F302R8 at 64 Mhz - it could do 72 Mhz, but >> not with the internal oscillator, and my board doesn't have a crystal), >> using GCC -O2 -mcpu=cortex-m4 -mthumb with newlib-nano: >> >> stdcbench 0.3 >> stdcbench c90base score: 1693 >> stdcbench c90lib score: 864 >> stdcbench final score: 2557 >> >> Looks only 15% faster per clock cycle than the Cortex-M0. >> >> Philipp > > Following the "Definitive Guide to ARM CortexM0-M0+" the performance of avrious Cortex M are: > > Features Cortex-M0 Cortex-M0+ Cortex-M3 Cortex-M4 Cortex-M7 > Dhrystone 2.1 (per MHz) 0.9 0.95 1.25 1.25 2.14 > CoreMark 1.0 (per MHz) 2.33 2.46 3.34 3.40 5.01 > > So maybe there is something that the M4 doesn't like too much or gcc doesn't optimize very well for the M4 (with the option used). > > Bye Jack >
I probably won't find time for testing until later this week, but for now I suspect it is the flash: The flash in the STM32F051 (Cortex-M0) at 48 Mhz needs 1 wait state, while the flash in the STM32F302 (Cortex-M4) at 64 Mhz needs 2 wait states. Both devices have a prefetch buffer that is supposed to somewhat reduce the effect of the wait states on program execution. Philipp
Reply by Jack February 20, 20182018-02-20
Il giorno marted&igrave; 20 febbraio 2018 15:27:28 UTC+1, Philipp Klaus Krause ha scritto:
> Am 20.02.2018 um 14:38 schrieb Philipp Klaus Krause: > > > > 'Don't have one yet, but intend to do a Cortex-M3 and maybe a Z80 test > > later this week. At some point, I should also put a list of results on > > http://stdcbench.org/ > > > > Philipp > > > > Here is a Cortex-M4 (the STM32F302R8 at 64 Mhz - it could do 72 Mhz, but > not with the internal oscillator, and my board doesn't have a crystal), > using GCC -O2 -mcpu=cortex-m4 -mthumb with newlib-nano: > > stdcbench 0.3 > stdcbench c90base score: 1693 > stdcbench c90lib score: 864 > stdcbench final score: 2557 > > Looks only 15% faster per clock cycle than the Cortex-M0. > > Philipp
Following the "Definitive Guide to ARM CortexM0-M0+" the performance of avrious Cortex M are: Features Cortex-M0 Cortex-M0+ Cortex-M3 Cortex-M4 Cortex-M7 Dhrystone 2.1 (per MHz) 0.9 0.95 1.25 1.25 2.14 CoreMark 1.0 (per MHz) 2.33 2.46 3.34 3.40 5.01 So maybe there is something that the M4 doesn't like too much or gcc doesn't optimize very well for the M4 (with the option used). Bye Jack
Reply by Philipp Klaus Krause February 20, 20182018-02-20
Am 20.02.2018 um 14:38 schrieb Philipp Klaus Krause:
> > 'Don't have one yet, but intend to do a Cortex-M3 and maybe a Z80 test > later this week. At some point, I should also put a list of results on > http://stdcbench.org/ > > Philipp >
Here is a Cortex-M4 (the STM32F302R8 at 64 Mhz - it could do 72 Mhz, but not with the internal oscillator, and my board doesn't have a crystal), using GCC -O2 -mcpu=cortex-m4 -mthumb with newlib-nano: stdcbench 0.3 stdcbench c90base score: 1693 stdcbench c90lib score: 864 stdcbench final score: 2557 Looks only 15% faster per clock cycle than the Cortex-M0. Philipp