EmbeddedRelated.com
Forums

$0.03 microcontroller

Started by Clifford Heath October 9, 2018
On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
gnuarm.deletethisbit@gmail.com wrote:

>On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote: >> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote: >> > On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote: >> >> Am 12.10.2018 um 01:08 schrieb Paul Rubin: >> >>> upsidedown@downunder.com writes: >> >>>> There is a lot of operations that will update memory locations, so why >> >>>> would you need a lot of CPU registers. >> >>> >> >>> Being able to (say) add register to register saves traffic through the >> >>> accumulator and therefore instructions. >> >>> >> >>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >> >>>> assembly program listing. >> >>> >> >>> It would be nice to have a C compiler, and registers help with that. >> >>> >> >> >> >> Looking at the instruction set, it should be possible to make a backend >> >> for this in SDCC; the architecture looks more C-friendly than the >> >> existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >> >> or z80. >> >> reentrant functions will be inefficent: No registers, and no sp-relative >> >> adressing mode. On would want to reserve a few memory locations as >> >> pseudo-registers to help with that, but that only goes so far. >> > >> > CPUs like this (and others that aren't like this) should be >> > programmed in Forth. It's a great tool for small MCUs and many times can be hosted >> > on the target although not likely in this case. Still, you can bring >> > enough functionality onto the MCU to allow direct downloads and many >> > debugging features without an ICE. >> > >> > Rick C. >> > >> >> Forth is a good language for very small devices, but there are details >> that can make a huge difference in how efficient it is. To make Forth >> work well on a small chip you need a Forth-specific instruction set to >> target the stack processing. For example, adding two numbers in this >> chip is two instructions - load accumulator from memory X, add >> accumulator to memory Y. In a Forth cpu, you'd have a single >> instruction that does "pop two numbers, add them, push the result". >> That gives a very efficient and compact instruction set. But it is hard >> to get the same results from a chip that doesn't have this kind of >> stack-based instruction set. > >Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So? > >I believe others have said the instruction set is memory oriented with no registers.
Depending how you look at it, you could claim that it has 64 registers and no RAM. It is a quite orthogonal single address architecture. You can do practically all single operand instructions (like inc/dec, shift/rotate etc.) either in the accumulator but equally well in any of the 64 "registers". For two operand instructions (such as add/sub, and/or etc,), either the source or destination can be in the memory "register". Both Acc = Acc Op Memory or alternatively Memory = Acc Op Memory are valid. Thus the accumulator is needed only for two operand instructions, but not for single operand instructions.
>I think that means in general the CPU will be slow compared to a register based design.
What is the difference, you have 64 on chip RAM bytes or 64 single byte on chip registers. The situation would have been different with on-chip registers and off chip RAM, with the memory bottleneck. Of course, there were odd architectures like the TI 9900 with a set of sixteen 16 bit general purpose register in RAN. The set could be switched fast in interrupts, but slowed down any general purpose register access.
>That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack.
For a stack computer you need a pointer register with preferably autoincrement/decrement support. This processor has indirect access and single instruction increment or decrement support without disturbing the accumulator.Thus not so bad after all for stack computing.
On 18-10-12 19:11 , gnuarm.deletethisbit@gmail.com wrote:
> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote: >> Am 12.10.2018 um 01:08 schrieb Paul Rubin: >>> upsidedown@downunder.com writes: >>>> There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers. >>> >>> Being able to (say) add register to register saves traffic through the >>> accumulator and therefore instructions. >>> >>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>> assembly program listing.
The data-sheet describes the OTP program memory as "1KW", probably meaning 1024 instructions. The length of an instruction is not defined, as far as I could see.
>>> It would be nice to have a C compiler, and registers help with that.
The data-sheet mentions something they call "Mini-C".
>> Looking at the instruction set, it should be possible to make a backend >> for this in SDCC; the architecture looks more C-friendly than the >> existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >> or z80. >> reentrant functions will be inefficent: No registers, and no sp-relative >> adressing mode. On would want to reserve a few memory locations as >> pseudo-registers to help with that, but that only goes so far. > > CPUs like this (and others that aren't like this) should be programmed > in Forth.
I don't think that an interpreted Forth is feasible for this particular MCU. Where would the Forth program (= list of pointers to "words") be stored? I found no instructions for reading data from the OTP program memory, and the 64-byte RAM will not hold a non-trivial program together with the data for that program. Moreover, there is no indirect jump instruction -- "jump to a computed address". The closest is "pcadd a", which can be used to implement a 256-entry case statement. You would be limited to a total of 256 words. Moreover, each RAM-resident pointer to RAM uses 2 octets of RAM, giving a 16-bit RAM address, although for this MCU a 6-bit address would be enough. Apparently the same architecture has implementations with more RAM and 16-bit RAM addresses. That said, one could perhaps implement a compiled Forth for this machine. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On 13/10/18 14:06, gnuarm.deletethisbit@gmail.com wrote:
> On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown > wrote: >> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote: >>> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus >>> Krause wrote: >>>> Am 12.10.2018 um 01:08 schrieb Paul Rubin: >>>>> upsidedown@downunder.com writes: >>>>>> There is a lot of operations that will update memory >>>>>> locations, so why would you need a lot of CPU registers. >>>>> >>>>> Being able to (say) add register to register saves traffic >>>>> through the accumulator and therefore instructions. >>>>> >>>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of >>>>>> commented assembly program listing. >>>>> >>>>> It would be nice to have a C compiler, and registers help >>>>> with that. >>>>> >>>> >>>> Looking at the instruction set, it should be possible to make a >>>> backend for this in SDCC; the architecture looks more >>>> C-friendly than the existing pic14 and pic16 backends. But it >>>> surely isn't as nice as stm8 or z80. reentrant functions will >>>> be inefficent: No registers, and no sp-relative adressing mode. >>>> On would want to reserve a few memory locations as >>>> pseudo-registers to help with that, but that only goes so far. >>> >>> CPUs like this (and others that aren't like this) should be >>> programmed in Forth. It's a great tool for small MCUs and many >>> times can be hosted on the target although not likely in this >>> case. Still, you can bring enough functionality onto the MCU to >>> allow direct downloads and many debugging features without an >>> ICE. >>> >>> Rick C. >>> >> >> Forth is a good language for very small devices, but there are >> details that can make a huge difference in how efficient it is. To >> make Forth work well on a small chip you need a Forth-specific >> instruction set to target the stack processing. For example, >> adding two numbers in this chip is two instructions - load >> accumulator from memory X, add accumulator to memory Y. In a Forth >> cpu, you'd have a single instruction that does "pop two numbers, >> add them, push the result". That gives a very efficient and compact >> instruction set. But it is hard to get the same results from a >> chip that doesn't have this kind of stack-based instruction set. > > Your point is what exactly? You are comparing running forth on some > other chip to running forth on this chip. How is that useful? There > are many other chips that run very fast. So?
My point is that /this/ CPU is not a good match for Forth, though many other very cheap CPUs are. Whether or not you think that matches "CPUs like this should be programmed in Forth" depends on what you mean by "CPUs like this", and what you think the benefits of Forth are.
> > I believe others have said the instruction set is memory oriented > with no registers. I think that means in general the CPU will be > slow compared to a register based design. That actually means it is > easier to have a fast Forth implementation compared to other > compilers since there won't be a significant penalty for using a > stack. >
It has a single register, not unlike the "W" register in small PIC devices. Yes, I expect it is going to be slower than you would get from having a few more registers. But it is missing (AFAICS) auto-increment and decrement modes, and has only load/store operations with indirect access. So if you have two 8-bit bytes x and y, then adding them as "x += y;" is: mov a, y; // 1 clock add x, a; // 1 clock If you have a data stack pointer "dsp", and want a standard Forth "+" operation, you have: idxm a, dsp; // 2 clock mov temp, a; // 1 clock dec dsp; // 1 clock idxm a, dsp; // 2 clock add a, temp; // 1 clock idxm dsp, a; // 2 clock That is 9 clocks, instead of 2, and 6 instructions instead of 3. Of course you could make a Forth compiler for the device - but you would have to make an optimising Forth compiler that avoids needing a data stack, just as you do on many other small microcontollers (and just as a C compiler would do). This is /not/ a processor that fits well with Forth or that would give a clear translation from Forth to assembly, as is the case on some very small microcontrollers.
On 13/10/18 17:00, upsidedown@downunder.com wrote:
> On Sat, 13 Oct 2018 05:06:23 -0700 (PDT), > gnuarm.deletethisbit@gmail.com wrote: > >> On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote: >>> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote: >>>> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote: >>>>> Am 12.10.2018 um 01:08 schrieb Paul Rubin: >>>>>> upsidedown@downunder.com writes: >>>>>>> There is a lot of operations that will update memory locations, so why >>>>>>> would you need a lot of CPU registers. >>>>>> >>>>>> Being able to (say) add register to register saves traffic through the >>>>>> accumulator and therefore instructions. >>>>>> >>>>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>>>>> assembly program listing. >>>>>> >>>>>> It would be nice to have a C compiler, and registers help with that. >>>>>> >>>>> >>>>> Looking at the instruction set, it should be possible to make a backend >>>>> for this in SDCC; the architecture looks more C-friendly than the >>>>> existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >>>>> or z80. >>>>> reentrant functions will be inefficent: No registers, and no sp-relative >>>>> adressing mode. On would want to reserve a few memory locations as >>>>> pseudo-registers to help with that, but that only goes so far. >>>> >>>> CPUs like this (and others that aren't like this) should be >>>> programmed in Forth. It's a great tool for small MCUs and many times can be hosted >>>> on the target although not likely in this case. Still, you can bring >>>> enough functionality onto the MCU to allow direct downloads and many >>>> debugging features without an ICE. >>>> >>>> Rick C. >>>> >>> >>> Forth is a good language for very small devices, but there are details >>> that can make a huge difference in how efficient it is. To make Forth >>> work well on a small chip you need a Forth-specific instruction set to >>> target the stack processing. For example, adding two numbers in this >>> chip is two instructions - load accumulator from memory X, add >>> accumulator to memory Y. In a Forth cpu, you'd have a single >>> instruction that does "pop two numbers, add them, push the result". >>> That gives a very efficient and compact instruction set. But it is hard >>> to get the same results from a chip that doesn't have this kind of >>> stack-based instruction set. >> >> Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So? >> >> I believe others have said the instruction set is memory oriented with no registers. > > Depending how you look at it, you could claim that it has 64 registers > and no RAM. It is a quite orthogonal single address architecture. You > can do practically all single operand instructions (like inc/dec, > shift/rotate etc.) either in the accumulator but equally well in any > of the 64 "registers". For two operand instructions (such as add/sub, > and/or etc,), either the source or destination can be in the memory > "register".
Not quite, no. Only the first 16 memory addresses are directly accessible for most instructions, with the first 32 addresses being available for word-based instructions. So you could liken it to a device with 16 registers and indirect memory access to the rest of ram.
> > Both Acc = Acc Op Memory or alternatively Memory = Acc Op Memory are > valid. > > Thus the accumulator is needed only for two operand instructions, but > not for single operand instructions. > >> I think that means in general the CPU will be slow compared to a register based design. > > What is the difference, you have 64 on chip RAM bytes or 64 single > byte on chip registers. The situation would have been different with > on-chip registers and off chip RAM, with the memory bottleneck. > > Of course, there were odd architectures like the TI 9900 with a set of > sixteen 16 bit general purpose register in RAN. The set could be > switched fast in interrupts, but slowed down any general purpose > register access. > >> That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack. > > For a stack computer you need a pointer register with preferably > autoincrement/decrement support. This processor has indirect access > and single instruction increment or decrement support without > disturbing the accumulator.Thus not so bad after all for stack > computing. >
But you can't use the indirect memory accesses for any ALU instructions - only for loading or saving the accumulator. So all indirect accesses need to go via the accumulator - and if you want to operate on two indirect accesses (like adding the top two elements on the stack), you have to use another "register" address to store one element temporarily. Yes, it would be bad for stack computing.
On 18-10-13 18:31 , Niklas Holsti wrote:

> I don't think that an interpreted Forth is feasible for this particular > MCU. ... > Moreover, there is no indirect jump instruction -- "jump to a computed > address".
Ok, before anyone else notices, I admit I forgot about implementing an indirect jump by pushing the target address on the stack and executing a return instruction. That would work for this machine. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
And one more iteration (sorry...)

On 18-10-13 19:46 , Niklas Holsti wrote:
> On 18-10-13 18:31 , Niklas Holsti wrote: > >> I don't think that an interpreted Forth is feasible for this particular >> MCU. ... >> Moreover, there is no indirect jump instruction -- "jump to a computed >> address". > > Ok, before anyone else notices, I admit I forgot about implementing an > indirect jump by pushing the target address on the stack and executing a > return instruction. That would work for this machine.
Except that one can only "push" the accumulator and flag registers, combined, and the flag register cannot be set directly, and has only 4 working bits. What would work, as an indirect jump, is to set the Stack Pointer (sp) to point at a RAM word that contains the target address, and then execute a return. But then one has lost the actual Stack Pointer value. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On Sat, 13 Oct 2018 18:27:13 +0200, David Brown
<david.brown@hesbynett.no> wrote:

>On 13/10/18 17:00, upsidedown@downunder.com wrote: >> On Sat, 13 Oct 2018 05:06:23 -0700 (PDT), >> gnuarm.deletethisbit@gmail.com wrote: >> >>> On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote: >>>> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote: >>>>> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote: >>>>>> Am 12.10.2018 um 01:08 schrieb Paul Rubin: >>>>>>> upsidedown@downunder.com writes: >>>>>>>> There is a lot of operations that will update memory locations, so why >>>>>>>> would you need a lot of CPU registers. >>>>>>> >>>>>>> Being able to (say) add register to register saves traffic through the >>>>>>> accumulator and therefore instructions. >>>>>>> >>>>>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>>>>>> assembly program listing. >>>>>>> >>>>>>> It would be nice to have a C compiler, and registers help with that. >>>>>>> >>>>>> >>>>>> Looking at the instruction set, it should be possible to make a backend >>>>>> for this in SDCC; the architecture looks more C-friendly than the >>>>>> existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >>>>>> or z80. >>>>>> reentrant functions will be inefficent: No registers, and no sp-relative >>>>>> adressing mode. On would want to reserve a few memory locations as >>>>>> pseudo-registers to help with that, but that only goes so far. >>>>> >>>>> CPUs like this (and others that aren't like this) should be >>>>> programmed in Forth. It's a great tool for small MCUs and many times can be hosted >>>>> on the target although not likely in this case. Still, you can bring >>>>> enough functionality onto the MCU to allow direct downloads and many >>>>> debugging features without an ICE. >>>>> >>>>> Rick C. >>>>> >>>> >>>> Forth is a good language for very small devices, but there are details >>>> that can make a huge difference in how efficient it is. To make Forth >>>> work well on a small chip you need a Forth-specific instruction set to >>>> target the stack processing. For example, adding two numbers in this >>>> chip is two instructions - load accumulator from memory X, add >>>> accumulator to memory Y. In a Forth cpu, you'd have a single >>>> instruction that does "pop two numbers, add them, push the result". >>>> That gives a very efficient and compact instruction set. But it is hard >>>> to get the same results from a chip that doesn't have this kind of >>>> stack-based instruction set. >>> >>> Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So? >>> >>> I believe others have said the instruction set is memory oriented with no registers. >> >> Depending how you look at it, you could claim that it has 64 registers >> and no RAM. It is a quite orthogonal single address architecture. You >> can do practically all single operand instructions (like inc/dec, >> shift/rotate etc.) either in the accumulator but equally well in any >> of the 64 "registers". For two operand instructions (such as add/sub, >> and/or etc,), either the source or destination can be in the memory >> "register". > >Not quite, no. Only the first 16 memory addresses are directly >accessible for most instructions, with the first 32 addresses being >available for word-based instructions. So you could liken it to a >device with 16 registers and indirect memory access to the rest of ram.
Really ? In the manual
> M.n Only addressed in 0~0xF (0~15) is allowed
The M.n notation is for bit operations, in which M is the byte address and n is the bit number in byte. Restricting M to 4 bits makes sense, since n requires 3 bits, thus the total address size for bit operations would be 7 bits. I couldn't find a reference that the restriction on M also applies to byte access. Where is it ?
On Sat, 13 Oct 2018 18:31:25 +0300, Niklas Holsti
<niklas.holsti@tidorum.invalid> wrote:

>On 18-10-12 19:11 , gnuarm.deletethisbit@gmail.com wrote: >> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote: >>> Am 12.10.2018 um 01:08 schrieb Paul Rubin: >>>> upsidedown@downunder.com writes: >>>>> There is a lot of operations that will update memory locations, so why >>>>> would you need a lot of CPU registers. >>>> >>>> Being able to (say) add register to register saves traffic through the >>>> accumulator and therefore instructions. >>>> >>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>>> assembly program listing. > >The data-sheet describes the OTP program memory as "1KW", probably >meaning 1024 instructions. The length of an instruction is not defined, >as far as I could see.
Yes, I misread the data sheet. It is really 1 kW. The nice feature about Harvard architecture is that the data and instruction size can be different. I have tried to locate the bit allocation of various fields (opcode, address etc.) ut no luck.
gnuarm.deletethisbit@gmail.com writes:
> That actually means it is easier to have a fast Forth implementation > compared to other compilers since there won't be a significant penalty > for using a stack.
I think this chip is too small for traditional Forth implementation methods. Just 64 bytes of ram and no registers. If you have 16 bit cells and 8 levels of return and data stacks, half the ram is already used by the stacks. An F18 processor (GA144 node for those not familiar) has around 3x as much ram including the stacks, and it doesn't pretend to be a complete MCU (you usually split your application across multiple nodes). Plus it has that very efficient 5-bit instruction encoding. On the other hand, you have to use ram as program memory. You might be able to concoct some usable Forth dialect compiled with an optimizing compiler and using 8-bit data when possible, but it doesn't seem that useful for a chip like this.
On Sat, 13 Oct 2018 19:59:06 +0300, Niklas Holsti
<niklas.holsti@tidorum.invalid> wrote:

>And one more iteration (sorry...) > >On 18-10-13 19:46 , Niklas Holsti wrote: >> On 18-10-13 18:31 , Niklas Holsti wrote: >> >>> I don't think that an interpreted Forth is feasible for this particular >>> MCU. ... >>> Moreover, there is no indirect jump instruction -- "jump to a computed >>> address". >> >> Ok, before anyone else notices, I admit I forgot about implementing an >> indirect jump by pushing the target address on the stack and executing a >> return instruction. That would work for this machine. > >Except that one can only "push" the accumulator and flag registers, >combined, and the flag register cannot be set directly, and has only 4 >working bits. > >What would work, as an indirect jump, is to set the Stack Pointer (sp) >to point at a RAM word that contains the target address, and then >execute a return. But then one has lost the actual Stack Pointer value.
Just call a "Jumper" routine, the call pushes the return address on stack. In "Jumper" read SP from IO address space, indirectly modify the return address on stack as needed and perform a ret instruction, causing a jump to the modified return address and it also restores the SP to the value before the call.