$0.03 microcontroller| page 3

Reply by ●October 13, 20182018-10-13

On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
gnuarm.deletethisbit@gmail.com wrote:

>On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:
>> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
>> > On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
>> >> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
>> >>> upsidedown@downunder.com writes:
>> >>>> There is a lot of operations that will update memory locations, so why
>> >>>> would you need a lot of CPU registers.
>> >>>
>> >>> Being able to (say) add register to register saves traffic through the
>> >>> accumulator and therefore instructions.
>> >>>
>> >>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
>> >>>> assembly program listing.
>> >>>
>> >>> It would be nice to have a C compiler, and registers help with that.
>> >>>
>> >>
>> >> Looking at the instruction set, it should be possible to make a backend
>> >> for this in SDCC; the architecture looks more C-friendly than the
>> >> existing pic14 and pic16 backends. But it surely isn't as nice as stm8
>> >> or z80.
>> >> reentrant functions will be inefficent: No registers, and no sp-relative
>> >> adressing mode. On would want to reserve a few memory locations as
>> >> pseudo-registers to help with that, but that only goes so far.
>> > 
>> > CPUs like this (and others that aren't like this) should be
>> > programmed  in Forth. It's a great tool for small MCUs and many times can be hosted
>> > on the target although not likely in this case. Still, you can bring
>> > enough functionality onto the MCU to allow direct downloads and many
>> > debugging features without an ICE.
>> > 
>> > Rick C.
>> > 
>> 
>> Forth is a good language for very small devices, but there are details 
>> that can make a huge difference in how efficient it is.  To make Forth 
>> work well on a small chip you need a Forth-specific instruction set to 
>> target the stack processing.  For example, adding two numbers in this 
>> chip is two instructions - load accumulator from memory X, add 
>> accumulator to memory Y.  In a Forth cpu, you'd have a single 
>> instruction that does "pop two numbers, add them, push the result". 
>> That gives a very efficient and compact instruction set.  But it is hard 
>> to get the same results from a chip that doesn't have this kind of 
>> stack-based instruction set.
>
>Your point is what exactly?  You are comparing running forth on some other chip to running forth on this chip.  How is that useful?  There are many other chips that run very fast.  So? 
>
>I believe others have said the instruction set is memory oriented with no registers.  

Depending how you look at it, you could claim that it has 64 registers
and no RAM. It is a quite orthogonal single address architecture. You
can do practically all single operand instructions (like inc/dec,
shift/rotate etc.) either in the accumulator but equally well in any
of the 64 "registers". For two operand instructions (such as add/sub,
and/or etc,), either the source or destination can be in the memory
"register". 

Both  Acc = Acc Op Memory or alternatively  Memory = Acc Op Memory are
valid. 

Thus the accumulator is needed only for two operand instructions, but
not for single operand instructions.

>I think that means in general the CPU will be slow compared to a register based design.

What is the difference, you have 64 on chip RAM bytes or 64 single
byte on chip registers. The situation would have been different with
on-chip registers and off chip RAM, with the memory bottleneck. 

Of course, there were odd architectures like the TI 9900 with a set of
sixteen 16 bit general purpose register in RAN. The set could be
switched fast in interrupts, but slowed down any general purpose
register access.

>That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack. 

For a stack computer you need a pointer register with preferably
autoincrement/decrement support. This processor has indirect access
and single instruction increment or decrement support without
disturbing the accumulator.Thus not so bad after all for stack
computing.

Reply by Niklas Holsti ●October 13, 20182018-10-13

On 18-10-12 19:11 , gnuarm.deletethisbit@gmail.com wrote:
> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
>> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
>>> upsidedown@downunder.com writes:
>>>> There is a lot of operations that will update memory locations, so why
>>>> would you need a lot of CPU registers.
>>>
>>> Being able to (say) add register to register saves traffic through the
>>> accumulator and therefore instructions.
>>>
>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
>>>> assembly program listing.

The data-sheet describes the OTP program memory as "1KW", probably 
meaning 1024 instructions. The length of an instruction is not defined, 
as far as I could see.

>>> It would be nice to have a C compiler, and registers help with that.

The data-sheet mentions something they call "Mini-C".

>> Looking at the instruction set, it should be possible to make a backend
>> for this in SDCC; the architecture looks more C-friendly than the
>> existing pic14 and pic16 backends. But it surely isn't as nice as stm8
>> or z80.
>> reentrant functions will be inefficent: No registers, and no sp-relative
>> adressing mode. On would want to reserve a few memory locations as
>> pseudo-registers to help with that, but that only goes so far.
>
> CPUs like this (and others that aren't like this) should be programmed
> in Forth.

I don't think that an interpreted Forth is feasible for this particular 
MCU. Where would the Forth program (= list of pointers to "words") be 
stored? I found no instructions for reading data from the OTP program 
memory, and the 64-byte RAM will not hold a non-trivial program together 
with the data for that program.

Moreover, there is no indirect jump instruction -- "jump to a computed 
address". The closest is "pcadd a", which can be used to implement a 
256-entry case statement. You would be limited to a total of 256 words.

Moreover, each RAM-resident pointer to RAM uses 2 octets of RAM, giving 
a 16-bit RAM address, although for this MCU a 6-bit address would be 
enough. Apparently the same architecture has implementations with more 
RAM and 16-bit RAM addresses.

That said, one could perhaps implement a compiled Forth for this machine.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by David Brown ●October 13, 20182018-10-13

On 13/10/18 14:06, gnuarm.deletethisbit@gmail.com wrote:
> On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown
> wrote:
>> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
>>> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus
>>> Krause wrote:
>>>> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
>>>>> upsidedown@downunder.com writes:
>>>>>> There is a lot of operations that will update memory
>>>>>> locations, so why would you need a lot of CPU registers.
>>>>> 
>>>>> Being able to (say) add register to register saves traffic
>>>>> through the accumulator and therefore instructions.
>>>>> 
>>>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of
>>>>>> commented assembly program listing.
>>>>> 
>>>>> It would be nice to have a C compiler, and registers help
>>>>> with that.
>>>>> 
>>>> 
>>>> Looking at the instruction set, it should be possible to make a
>>>> backend for this in SDCC; the architecture looks more
>>>> C-friendly than the existing pic14 and pic16 backends. But it
>>>> surely isn't as nice as stm8 or z80. reentrant functions will
>>>> be inefficent: No registers, and no sp-relative adressing mode.
>>>> On would want to reserve a few memory locations as 
>>>> pseudo-registers to help with that, but that only goes so far.
>>> 
>>> CPUs like this (and others that aren't like this) should be 
>>> programmed  in Forth. It's a great tool for small MCUs and many
>>> times can be hosted on the target although not likely in this
>>> case. Still, you can bring enough functionality onto the MCU to
>>> allow direct downloads and many debugging features without an
>>> ICE.
>>> 
>>> Rick C.
>>> 
>> 
>> Forth is a good language for very small devices, but there are
>> details that can make a huge difference in how efficient it is.  To
>> make Forth work well on a small chip you need a Forth-specific
>> instruction set to target the stack processing.  For example,
>> adding two numbers in this chip is two instructions - load
>> accumulator from memory X, add accumulator to memory Y.  In a Forth
>> cpu, you'd have a single instruction that does "pop two numbers,
>> add them, push the result". That gives a very efficient and compact
>> instruction set.  But it is hard to get the same results from a
>> chip that doesn't have this kind of stack-based instruction set.
> 
> Your point is what exactly?  You are comparing running forth on some
> other chip to running forth on this chip.  How is that useful?  There
> are many other chips that run very fast.  So?

My point is that /this/ CPU is not a good match for Forth, though many
other very cheap CPUs are.  Whether or not you think that matches "CPUs
like this should be programmed in Forth" depends on what you mean by
"CPUs like this", and what you think the benefits of Forth are.

> 
> I believe others have said the instruction set is memory oriented
> with no registers.  I think that means in general the CPU will be
> slow compared to a register based design.  That actually means it is
> easier to have a fast Forth implementation compared to other
> compilers since there won't be a significant penalty for using a
> stack.
> 

It has a single register, not unlike the "W" register in small PIC 
devices.  Yes, I expect it is going to be slower than you would get from 
having a few more registers.  But it is missing (AFAICS) auto-increment 
and decrement modes, and has only load/store operations with indirect 
access.

So if you have two 8-bit bytes x and y, then adding them as "x += y;" is:

	mov a, y;	// 1 clock
	add x, a;	// 1 clock

If you have a data stack pointer "dsp", and want a standard Forth "+" 
operation, you have:

	idxm a, dsp;	// 2 clock
	mov temp, a;	// 1 clock
	dec dsp;	// 1 clock
	idxm a, dsp;	// 2 clock
	add a, temp;	// 1 clock
	idxm dsp, a;	// 2 clock

That is 9 clocks, instead of 2, and 6 instructions instead of 3.

Of course you could make a Forth compiler for the device - but you would 
have to make an optimising Forth compiler that avoids needing a data 
stack, just as you do on many other small microcontollers (and just as a 
C compiler would do).  This is /not/ a processor that fits well with 
Forth or that would give a clear translation from Forth to assembly, as 
is the case on some very small microcontrollers.

Reply by David Brown ●October 13, 20182018-10-13

On 13/10/18 17:00, upsidedown@downunder.com wrote:
> On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
> gnuarm.deletethisbit@gmail.com wrote:
> 
>> On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:
>>> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
>>>> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
>>>>> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
>>>>>> upsidedown@downunder.com writes:
>>>>>>> There is a lot of operations that will update memory locations, so why
>>>>>>> would you need a lot of CPU registers.
>>>>>>
>>>>>> Being able to (say) add register to register saves traffic through the
>>>>>> accumulator and therefore instructions.
>>>>>>
>>>>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
>>>>>>> assembly program listing.
>>>>>>
>>>>>> It would be nice to have a C compiler, and registers help with that.
>>>>>>
>>>>>
>>>>> Looking at the instruction set, it should be possible to make a backend
>>>>> for this in SDCC; the architecture looks more C-friendly than the
>>>>> existing pic14 and pic16 backends. But it surely isn't as nice as stm8
>>>>> or z80.
>>>>> reentrant functions will be inefficent: No registers, and no sp-relative
>>>>> adressing mode. On would want to reserve a few memory locations as
>>>>> pseudo-registers to help with that, but that only goes so far.
>>>>
>>>> CPUs like this (and others that aren't like this) should be
>>>> programmed  in Forth. It's a great tool for small MCUs and many times can be hosted
>>>> on the target although not likely in this case. Still, you can bring
>>>> enough functionality onto the MCU to allow direct downloads and many
>>>> debugging features without an ICE.
>>>>
>>>> Rick C.
>>>>
>>>
>>> Forth is a good language for very small devices, but there are details
>>> that can make a huge difference in how efficient it is.  To make Forth
>>> work well on a small chip you need a Forth-specific instruction set to
>>> target the stack processing.  For example, adding two numbers in this
>>> chip is two instructions - load accumulator from memory X, add
>>> accumulator to memory Y.  In a Forth cpu, you'd have a single
>>> instruction that does "pop two numbers, add them, push the result".
>>> That gives a very efficient and compact instruction set.  But it is hard
>>> to get the same results from a chip that doesn't have this kind of
>>> stack-based instruction set.
>>
>> Your point is what exactly?  You are comparing running forth on some other chip to running forth on this chip.  How is that useful?  There are many other chips that run very fast.  So?
>>
>> I believe others have said the instruction set is memory oriented with no registers.
> 
> Depending how you look at it, you could claim that it has 64 registers
> and no RAM. It is a quite orthogonal single address architecture. You
> can do practically all single operand instructions (like inc/dec,
> shift/rotate etc.) either in the accumulator but equally well in any
> of the 64 "registers". For two operand instructions (such as add/sub,
> and/or etc,), either the source or destination can be in the memory
> "register".

Not quite, no.  Only the first 16 memory addresses are directly 
accessible for most instructions, with the first 32 addresses being 
available for word-based instructions.  So you could liken it to a 
device with 16 registers and indirect memory access to the rest of ram.

> 
> Both  Acc = Acc Op Memory or alternatively  Memory = Acc Op Memory are
> valid.
> 
> Thus the accumulator is needed only for two operand instructions, but
> not for single operand instructions.
> 
>> I think that means in general the CPU will be slow compared to a register based design.
> 
> What is the difference, you have 64 on chip RAM bytes or 64 single
> byte on chip registers. The situation would have been different with
> on-chip registers and off chip RAM, with the memory bottleneck.
> 
> Of course, there were odd architectures like the TI 9900 with a set of
> sixteen 16 bit general purpose register in RAN. The set could be
> switched fast in interrupts, but slowed down any general purpose
> register access.
>    
>> That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack.
> 
> For a stack computer you need a pointer register with preferably
> autoincrement/decrement support. This processor has indirect access
> and single instruction increment or decrement support without
> disturbing the accumulator.Thus not so bad after all for stack
> computing.
> 

But you can't use the indirect memory accesses for any ALU instructions 
- only for loading or saving the accumulator.  So all indirect accesses 
need to go via the accumulator - and if you want to operate on two 
indirect accesses (like adding the top two elements on the stack), you 
have to use another "register" address to store one element temporarily. 
  Yes, it would be bad for stack computing.

Reply by Niklas Holsti ●October 13, 20182018-10-13

On 18-10-13 18:31 , Niklas Holsti wrote:

> I don't think that an interpreted Forth is feasible for this particular
> MCU. ...
> Moreover, there is no indirect jump instruction -- "jump to a computed
> address".

Ok, before anyone else notices, I admit I forgot about implementing an 
indirect jump by pushing the target address on the stack and executing a 
return instruction. That would work for this machine.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by Niklas Holsti ●October 13, 20182018-10-13

And one more iteration (sorry...)

On 18-10-13 19:46 , Niklas Holsti wrote:
> On 18-10-13 18:31 , Niklas Holsti wrote:
>
>> I don't think that an interpreted Forth is feasible for this particular
>> MCU. ...
>> Moreover, there is no indirect jump instruction -- "jump to a computed
>> address".
>
> Ok, before anyone else notices, I admit I forgot about implementing an
> indirect jump by pushing the target address on the stack and executing a
> return instruction. That would work for this machine.

Except that one can only "push" the accumulator and flag registers, 
combined, and the flag register cannot be set directly, and has only 4 
working bits.

What would work, as an indirect jump, is to set the Stack Pointer (sp) 
to point at a RAM word that contains the target address, and then 
execute a return. But then one has lost the actual Stack Pointer value.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by ●October 13, 20182018-10-13

On Sat, 13 Oct 2018 18:27:13 +0200, David Brown
<david.brown@hesbynett.no> wrote:

>On 13/10/18 17:00, upsidedown@downunder.com wrote:
>> On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
>> gnuarm.deletethisbit@gmail.com wrote:
>> 
>>> On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:
>>>> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
>>>>> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
>>>>>> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
>>>>>>> upsidedown@downunder.com writes:
>>>>>>>> There is a lot of operations that will update memory locations, so why
>>>>>>>> would you need a lot of CPU registers.
>>>>>>>
>>>>>>> Being able to (say) add register to register saves traffic through the
>>>>>>> accumulator and therefore instructions.
>>>>>>>
>>>>>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
>>>>>>>> assembly program listing.
>>>>>>>
>>>>>>> It would be nice to have a C compiler, and registers help with that.
>>>>>>>
>>>>>>
>>>>>> Looking at the instruction set, it should be possible to make a backend
>>>>>> for this in SDCC; the architecture looks more C-friendly than the
>>>>>> existing pic14 and pic16 backends. But it surely isn't as nice as stm8
>>>>>> or z80.
>>>>>> reentrant functions will be inefficent: No registers, and no sp-relative
>>>>>> adressing mode. On would want to reserve a few memory locations as
>>>>>> pseudo-registers to help with that, but that only goes so far.
>>>>>
>>>>> CPUs like this (and others that aren't like this) should be
>>>>> programmed  in Forth. It's a great tool for small MCUs and many times can be hosted
>>>>> on the target although not likely in this case. Still, you can bring
>>>>> enough functionality onto the MCU to allow direct downloads and many
>>>>> debugging features without an ICE.
>>>>>
>>>>> Rick C.
>>>>>
>>>>
>>>> Forth is a good language for very small devices, but there are details
>>>> that can make a huge difference in how efficient it is.  To make Forth
>>>> work well on a small chip you need a Forth-specific instruction set to
>>>> target the stack processing.  For example, adding two numbers in this
>>>> chip is two instructions - load accumulator from memory X, add
>>>> accumulator to memory Y.  In a Forth cpu, you'd have a single
>>>> instruction that does "pop two numbers, add them, push the result".
>>>> That gives a very efficient and compact instruction set.  But it is hard
>>>> to get the same results from a chip that doesn't have this kind of
>>>> stack-based instruction set.
>>>
>>> Your point is what exactly?  You are comparing running forth on some other chip to running forth on this chip.  How is that useful?  There are many other chips that run very fast.  So?
>>>
>>> I believe others have said the instruction set is memory oriented with no registers.
>> 
>> Depending how you look at it, you could claim that it has 64 registers
>> and no RAM. It is a quite orthogonal single address architecture. You
>> can do practically all single operand instructions (like inc/dec,
>> shift/rotate etc.) either in the accumulator but equally well in any
>> of the 64 "registers". For two operand instructions (such as add/sub,
>> and/or etc,), either the source or destination can be in the memory
>> "register".
>
>Not quite, no.  Only the first 16 memory addresses are directly 
>accessible for most instructions, with the first 32 addresses being 
>available for word-based instructions.  So you could liken it to a 
>device with 16 registers and indirect memory access to the rest of ram.

Really ?

In the manual

> M.n   Only addressed in 0~0xF (0~15) is allowed 

The M.n notation is for bit operations, in which M is the byte address
and n is the bit number in byte. Restricting M to 4 bits makes sense,
since n requires 3 bits, thus the total address size for bit
operations would be 7 bits.

I couldn't find a reference that the restriction on M also applies to
byte access. Where is it ?

Reply by ●October 13, 20182018-10-13

On Sat, 13 Oct 2018 18:31:25 +0300, Niklas Holsti
<niklas.holsti@tidorum.invalid> wrote:

>On 18-10-12 19:11 , gnuarm.deletethisbit@gmail.com wrote:
>> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
>>> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
>>>> upsidedown@downunder.com writes:
>>>>> There is a lot of operations that will update memory locations, so why
>>>>> would you need a lot of CPU registers.
>>>>
>>>> Being able to (say) add register to register saves traffic through the
>>>> accumulator and therefore instructions.
>>>>
>>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
>>>>> assembly program listing.
>
>The data-sheet describes the OTP program memory as "1KW", probably 
>meaning 1024 instructions. The length of an instruction is not defined, 
>as far as I could see.

Yes, I misread the data sheet. It is really 1 kW. 

The nice feature about Harvard architecture is that the data and
instruction size can be different.

I have tried to locate the bit allocation of various fields (opcode,
address etc.) ut no luck.

Reply by Paul Rubin ●October 13, 20182018-10-13

gnuarm.deletethisbit@gmail.com writes:
> That actually means it is easier to have a fast Forth implementation
> compared to other compilers since there won't be a significant penalty
> for using a stack.

I think this chip is too small for traditional Forth implementation
methods.  Just 64 bytes of ram and no registers.  If you have 16 bit
cells and 8 levels of return and data stacks, half the ram is already
used by the stacks.

An F18 processor (GA144 node for those not familiar) has around 3x as
much ram including the stacks, and it doesn't pretend to be a complete
MCU (you usually split your application across multiple nodes).  Plus it
has that very efficient 5-bit instruction encoding.  On the other hand,
you have to use ram as program memory.

You might be able to concoct some usable Forth dialect compiled with an
optimizing compiler and using 8-bit data when possible, but it doesn't
seem that useful for a chip like this.

Reply by ●October 13, 20182018-10-13

On Sat, 13 Oct 2018 19:59:06 +0300, Niklas Holsti
<niklas.holsti@tidorum.invalid> wrote:

>And one more iteration (sorry...)
>
>On 18-10-13 19:46 , Niklas Holsti wrote:
>> On 18-10-13 18:31 , Niklas Holsti wrote:
>>
>>> I don't think that an interpreted Forth is feasible for this particular
>>> MCU. ...
>>> Moreover, there is no indirect jump instruction -- "jump to a computed
>>> address".
>>
>> Ok, before anyone else notices, I admit I forgot about implementing an
>> indirect jump by pushing the target address on the stack and executing a
>> return instruction. That would work for this machine.
>
>Except that one can only "push" the accumulator and flag registers, 
>combined, and the flag register cannot be set directly, and has only 4 
>working bits.
>
>What would work, as an indirect jump, is to set the Stack Pointer (sp) 
>to point at a RAM word that contains the target address, and then 
>execute a return. But then one has lost the actual Stack Pointer value.

Just call a "Jumper" routine, the call pushes the return address on
stack. In "Jumper" read SP from IO address space, indirectly modify
the return address on stack as needed and perform a ret instruction,
causing a jump to the modified return address and it also restores the
SP to the value before the call.