$0.03 microcontroller| page 5

Reply by Philipp Klaus Krause ●October 14, 20182018-10-14

Am 14.10.2018 um 03:20 schrieb gnuarm.deletethisbit@gmail.com:
> 
> How fast are instructions that access memory?  Most MCUs will perform
> register operations in a single cycle.  Even though RAM may be on
> chip, it typically is not as fast as registers because it is usually
> not multiported.  DSP chips are an exception with dual and even
> triple ported on chip RAM.

All instructions except for jumps are 1 cycle. Jumps if taken are 2
cycles, 1 otherwise.

Philipp

Reply by Philipp Klaus Krause ●October 14, 20182018-10-14

Am 14.10.2018 um 08:53 schrieb Philipp Klaus Krause:
> Am 14.10.2018 um 03:20 schrieb gnuarm.deletethisbit@gmail.com:
>>
>> How fast are instructions that access memory?  Most MCUs will perform
>> register operations in a single cycle.  Even though RAM may be on
>> chip, it typically is not as fast as registers because it is usually
>> not multiported.  DSP chips are an exception with dual and even
>> triple ported on chip RAM.
> 
> All instructions except for jumps are 1 cycle. Jumps if taken are 2
> cycles, 1 otherwise.
> 
> Philipp
> 

idxm and ldxm seem to be 2 cycles, too.

Philipp

Reply by ●October 14, 20182018-10-14

On Sat, 13 Oct 2018 21:47:48 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

>Am 12.10.2018 um 22:45 schrieb upsidedown@downunder.com:
>> On Fri, 12 Oct 2018 22:06:02 +0200, Philipp Klaus Krause <pkk@spth.de>
>> wrote:
>> 
>>> Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:
>>>>
>>>> The real issue would be the small RAM size.
>>>
>>> Devices with this architecture go up to 256 B of RAM (but they then cost
>>> a few cent more).
>>>
>>> Philipp
>> 
>> Did you find the binary encoding of various instruction formats, i.e
>> how many bits allocated to the operation code and how many for the
>> address field ?
>> 
>> My initial guess was that the instruction word is simple 8 bit opcode
>> + 8 bit address, but the bit and word address limits for the smaller
>> models would suggest that for some op-codes, the op-code field might
>> be wider than 8 bits and address fields narrower than 8 bits (e.g. bit
>> and word addressing).
>> 
>
>People have tried before (https://www.mikrocontroller.net/topic/449689,
>https://stackoverflow.com/questions/49842256/reverse-engineer-assembler-which-probably-encrypts-code).
>Apparently, even with access to the tools it is not obvious.
>
>However, a Chinese manual contains these examples:
>
>5E0A MOV A BB1
>1B21 COMP A #0x21
>2040 T0SN CF
>5C0B MOV BB2 A
>C028 GOTO 0x28
>0030 WDRESET
>1F00 MOV A #0x0
>0082 MOV SP A
>
>Philipp

Interesting, this at least confirms that the instruction word is 16
bits. In a Harvard  architecture, the word length could have been
13-17 bits, with some dirty encodings in 113 bit case., but a cleaner
encoding with 14-17 bit instruction words.

Assuming one would like to make an encoding for exactly 1024 code
words and 64 byte data memory, a tighter encoding would be possible.
Of course a manufacturer with small and larger processors, would make
sense to use the same encoding for all processors, which is slightly
inefficient for smaller models.

Anyway 1 kW/64 byes case, the following code points would be required:

2048 =  2 x 1024      call, goto
1792 =  7 x  256      Immediate data (8 bit)
2304 = 36 x   64      M-referense (6 bit)
1024 =  8 x  128      Bit ref (M and IO 3+4 bits
               others

This might barely fit into 13 bits, with some nasty encoding.

Limiting M-refeence to 4 bits (0-15), but you still can't fit into 12
bit instruction length.

So with 16 bit word length, I do not understand why word reference is
limited to 4-5 bits.The bit address limit makes more sense, so that it
would not consume 4096 code points.

Reply by Theo ●October 14, 20182018-10-14

Tim <cpldcpu+usenet@gmail.com> wrote:
> This is quite curious. I wonder
> 
>    - Has anyone actually received the devices they ordered? The cheaper 
> variants seem to be sold out.

I think they've sold out since they went viral.  EEVblog did a video showing
550 in stock - that's only $16 worth of parts, not hard to imagine they've
been bought up.

The other option is they're some kind of EOL part and 3c is the 'reduced to
clear' price - which they have done, very successfully.

Theo

Reply by Michael Kellett ●October 14, 20182018-10-14

On 13/10/2018 21:06, Paul Rubin wrote:
> Michael Kellett <mk@mkesc.co.uk> writes:
>> If you want a hardware minimal processor the Maxim 32660 looks like fun
>> 3mm square, 24 pin Cortex M4, 96MHz, 256k flash, 96k RAM, &pound;1.16 (10 off).
> 
> That's not minimal ;).  More practically, the 3mm square package sounds
> like a WLCSP which I think requires specialized ($$$) board fab
> facilities (it can't be hand soldered or done with normal reflow
> processes).  Part of the Padauk part's attraction is the 6-pin SOT23
> package.
> 
> Here's a complete STM8 board for 0.77 USD shipped:
> 
>    https://www.aliexpress.com/item//32527571163.html
> 
> It has 8k of program flash and 1k of ram and can run a resident Forth
> interpreter.  I think they also make a SOIC-8 version of the cpu.  I
> bought a few of those boards for around 0.50 each last year so I guess
> they have gotten a bit more expensive since then.
> 

No - the BGA part is 1.6mm square (0.3mm pitch) - the 3mm is for 0.4mm 
pitch QFN and there is a 0.5mm pitch QFN part at 4mm square.
The QFNs are reasonably prototype-able - needing only 0.15mm track and 
gap design rules and no filled vias in pads or other horrors.
The point about the 32660 is that it is HARDWARE minimal but not 
constrained in software. At low volumes cost of the parts is nothing - a 
day of effort is $500 or more, in that context the difference between a 
free processor and a $2 processor is invisible.

MK

Reply by David Brown ●October 14, 20182018-10-14

On 13/10/18 19:50, upsidedown@downunder.com wrote:
> On Sat, 13 Oct 2018 18:27:13 +0200, David Brown
> <david.brown@hesbynett.no> wrote:
> 
>> On 13/10/18 17:00, upsidedown@downunder.com wrote:
>>> On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
>>> gnuarm.deletethisbit@gmail.com wrote:
>>>
>>>> On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:
>>>>> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
>>>>>> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
>>>>>>> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
>>>>>>>> upsidedown@downunder.com writes:
>>>>>>>>> There is a lot of operations that will update memory locations, so why
>>>>>>>>> would you need a lot of CPU registers.
>>>>>>>>
>>>>>>>> Being able to (say) add register to register saves traffic through the
>>>>>>>> accumulator and therefore instructions.
>>>>>>>>
>>>>>>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
>>>>>>>>> assembly program listing.
>>>>>>>>
>>>>>>>> It would be nice to have a C compiler, and registers help with that.
>>>>>>>>
>>>>>>>
>>>>>>> Looking at the instruction set, it should be possible to make a backend
>>>>>>> for this in SDCC; the architecture looks more C-friendly than the
>>>>>>> existing pic14 and pic16 backends. But it surely isn't as nice as stm8
>>>>>>> or z80.
>>>>>>> reentrant functions will be inefficent: No registers, and no sp-relative
>>>>>>> adressing mode. On would want to reserve a few memory locations as
>>>>>>> pseudo-registers to help with that, but that only goes so far.
>>>>>>
>>>>>> CPUs like this (and others that aren't like this) should be
>>>>>> programmed  in Forth. It's a great tool for small MCUs and many times can be hosted
>>>>>> on the target although not likely in this case. Still, you can bring
>>>>>> enough functionality onto the MCU to allow direct downloads and many
>>>>>> debugging features without an ICE.
>>>>>>
>>>>>> Rick C.
>>>>>>
>>>>>
>>>>> Forth is a good language for very small devices, but there are details
>>>>> that can make a huge difference in how efficient it is.  To make Forth
>>>>> work well on a small chip you need a Forth-specific instruction set to
>>>>> target the stack processing.  For example, adding two numbers in this
>>>>> chip is two instructions - load accumulator from memory X, add
>>>>> accumulator to memory Y.  In a Forth cpu, you'd have a single
>>>>> instruction that does "pop two numbers, add them, push the result".
>>>>> That gives a very efficient and compact instruction set.  But it is hard
>>>>> to get the same results from a chip that doesn't have this kind of
>>>>> stack-based instruction set.
>>>>
>>>> Your point is what exactly?  You are comparing running forth on some other chip to running forth on this chip.  How is that useful?  There are many other chips that run very fast.  So?
>>>>
>>>> I believe others have said the instruction set is memory oriented with no registers.
>>>
>>> Depending how you look at it, you could claim that it has 64 registers
>>> and no RAM. It is a quite orthogonal single address architecture. You
>>> can do practically all single operand instructions (like inc/dec,
>>> shift/rotate etc.) either in the accumulator but equally well in any
>>> of the 64 "registers". For two operand instructions (such as add/sub,
>>> and/or etc,), either the source or destination can be in the memory
>>> "register".
>>
>> Not quite, no.  Only the first 16 memory addresses are directly
>> accessible for most instructions, with the first 32 addresses being
>> available for word-based instructions.  So you could liken it to a
>> device with 16 registers and indirect memory access to the rest of ram.
> 
> Really ?
> 
> In the manual
> 
>> M.n   Only addressed in 0~0xF (0~15) is allowed
> 
> The M.n notation is for bit operations, in which M is the byte address
> and n is the bit number in byte. Restricting M to 4 bits makes sense,
> since n requires 3 bits, thus the total address size for bit
> operations would be 7 bits.
> 
> I couldn't find a reference that the restriction on M also applies to
> byte access. Where is it ?
> 

My interpretation of the manual was that you only had access to the 
first 16 addresses with the M instructions.  But it is entirely possible 
that I am wrong and your interpretation is right.  I haven't tried the 
devices, or the IDE, and the manual does not have details of things like 
instruction format.

Certainly it would be nicer for the chip if you are right!

Reply by David Brown ●October 14, 20182018-10-14

On 14/10/18 03:20, gnuarm.deletethisbit@gmail.com wrote:
> On Saturday, October 13, 2018 at 11:00:30 AM UTC-4,
> upsid...@downunder.com wrote:
>> On Sat, 13 Oct 2018 05:06:23 -0700 (PDT), 
>> gnuarm.deletethisbit@gmail.com wrote:
>> 
>>> On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown
>>> wrote:
>>>> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
>>>>> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp
>>>>> Klaus Krause wrote:
>>>>>> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
>>>>>>> upsidedown@downunder.com writes:
>>>>>>>> There is a lot of operations that will update memory
>>>>>>>> locations, so why would you need a lot of CPU
>>>>>>>> registers.
>>>>>>> 
>>>>>>> Being able to (say) add register to register saves
>>>>>>> traffic through the accumulator and therefore
>>>>>>> instructions.
>>>>>>> 
>>>>>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages
>>>>>>>> of commented assembly program listing.
>>>>>>> 
>>>>>>> It would be nice to have a C compiler, and registers help
>>>>>>> with that.
>>>>>>> 
>>>>>> 
>>>>>> Looking at the instruction set, it should be possible to
>>>>>> make a backend for this in SDCC; the architecture looks
>>>>>> more C-friendly than the existing pic14 and pic16 backends.
>>>>>> But it surely isn't as nice as stm8 or z80. reentrant
>>>>>> functions will be inefficent: No registers, and no
>>>>>> sp-relative adressing mode. On would want to reserve a few
>>>>>> memory locations as pseudo-registers to help with that, but
>>>>>> that only goes so far.
>>>>> 
>>>>> CPUs like this (and others that aren't like this) should be 
>>>>> programmed  in Forth. It's a great tool for small MCUs and
>>>>> many times can be hosted on the target although not likely in
>>>>> this case. Still, you can bring enough functionality onto the
>>>>> MCU to allow direct downloads and many debugging features
>>>>> without an ICE.
>>>>> 
>>>>> Rick C.
>>>>> 
>>>> 
>>>> Forth is a good language for very small devices, but there are
>>>> details that can make a huge difference in how efficient it is.
>>>> To make Forth work well on a small chip you need a
>>>> Forth-specific instruction set to target the stack processing.
>>>> For example, adding two numbers in this chip is two
>>>> instructions - load accumulator from memory X, add accumulator
>>>> to memory Y.  In a Forth cpu, you'd have a single instruction
>>>> that does "pop two numbers, add them, push the result". That
>>>> gives a very efficient and compact instruction set.  But it is
>>>> hard to get the same results from a chip that doesn't have this
>>>> kind of stack-based instruction set.
>>> 
>>> Your point is what exactly?  You are comparing running forth on
>>> some other chip to running forth on this chip.  How is that
>>> useful?  There are many other chips that run very fast.  So?
>>> 
>>> I believe others have said the instruction set is memory oriented
>>> with no registers.
>> 
>> Depending how you look at it, you could claim that it has 64
>> registers and no RAM. It is a quite orthogonal single address
>> architecture. You can do practically all single operand
>> instructions (like inc/dec, shift/rotate etc.) either in the
>> accumulator but equally well in any of the 64 "registers". For two
>> operand instructions (such as add/sub, and/or etc,), either the
>> source or destination can be in the memory "register".
>> 
>> Both  Acc = Acc Op Memory or alternatively  Memory = Acc Op Memory
>> are valid.
>> 
>> Thus the accumulator is needed only for two operand instructions,
>> but not for single operand instructions.
> 
> How fast are instructions that access memory?  Most MCUs will perform
> register operations in a single cycle.  Even though RAM may be on
> chip, it typically is not as fast as registers because it is usually
> not multiported.  DSP chips are an exception with dual and even
> triple ported on chip RAM.

Single cycle, according to the manual.  Instructions involving 16-bit 
values are two cycle, the conditional branch instructions may be one or 
two cycles, and everything else is one cycle.

It is not so hard to make the RAM dual ported when there is only 64 
bytes of it.  Or perhaps the core is clocked on both falling and rising 
edges, so that the instructions are effectively 2/4 clocks rather than 1 
or two.  We can only guess.

Reply by David Brown ●October 14, 20182018-10-14

On 14/10/18 03:32, gnuarm.deletethisbit@gmail.com wrote:
> On Saturday, October 13, 2018 at 12:21:51 PM UTC-4, David Brown wrote:
>> On 13/10/18 14:06, gnuarm.deletethisbit@gmail.com wrote:
>>> On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown
>>> wrote:
>>>> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
>>>>> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus
>>>>> Krause wrote:
>>>>>> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
>>>>>>> upsidedown@downunder.com writes:
>>>>>>>> There is a lot of operations that will update memory
>>>>>>>> locations, so why would you need a lot of CPU registers.
>>>>>>>
>>>>>>> Being able to (say) add register to register saves traffic
>>>>>>> through the accumulator and therefore instructions.
>>>>>>>
>>>>>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of
>>>>>>>> commented assembly program listing.
>>>>>>>
>>>>>>> It would be nice to have a C compiler, and registers help
>>>>>>> with that.
>>>>>>>
>>>>>>
>>>>>> Looking at the instruction set, it should be possible to make a
>>>>>> backend for this in SDCC; the architecture looks more
>>>>>> C-friendly than the existing pic14 and pic16 backends. But it
>>>>>> surely isn't as nice as stm8 or z80. reentrant functions will
>>>>>> be inefficent: No registers, and no sp-relative adressing mode.
>>>>>> On would want to reserve a few memory locations as
>>>>>> pseudo-registers to help with that, but that only goes so far.
>>>>>
>>>>> CPUs like this (and others that aren't like this) should be
>>>>> programmed  in Forth. It's a great tool for small MCUs and many
>>>>> times can be hosted on the target although not likely in this
>>>>> case. Still, you can bring enough functionality onto the MCU to
>>>>> allow direct downloads and many debugging features without an
>>>>> ICE.
>>>>>
>>>>> Rick C.
>>>>>
>>>>
>>>> Forth is a good language for very small devices, but there are
>>>> details that can make a huge difference in how efficient it is.  To
>>>> make Forth work well on a small chip you need a Forth-specific
>>>> instruction set to target the stack processing.  For example,
>>>> adding two numbers in this chip is two instructions - load
>>>> accumulator from memory X, add accumulator to memory Y.  In a Forth
>>>> cpu, you'd have a single instruction that does "pop two numbers,
>>>> add them, push the result". That gives a very efficient and compact
>>>> instruction set.  But it is hard to get the same results from a
>>>> chip that doesn't have this kind of stack-based instruction set.
>>>
>>> Your point is what exactly?  You are comparing running forth on some
>>> other chip to running forth on this chip.  How is that useful?  There
>>> are many other chips that run very fast.  So?
>>
>> My point is that /this/ CPU is not a good match for Forth, though many
>> other very cheap CPUs are.  Whether or not you think that matches "CPUs
>> like this should be programmed in Forth" depends on what you mean by
>> "CPUs like this", and what you think the benefits of Forth are.
>>
>>>
>>> I believe others have said the instruction set is memory oriented
>>> with no registers.  I think that means in general the CPU will be
>>> slow compared to a register based design.  That actually means it is
>>> easier to have a fast Forth implementation compared to other
>>> compilers since there won't be a significant penalty for using a
>>> stack.
>>>
>>
>> It has a single register, not unlike the "W" register in small PIC
>> devices.  Yes, I expect it is going to be slower than you would get from
>> having a few more registers.  But it is missing (AFAICS) auto-increment
>> and decrement modes, and has only load/store operations with indirect
>> access.
>>
>> So if you have two 8-bit bytes x and y, then adding them as "x += y;" is:
>>
>> 	mov a, y;	// 1 clock
>> 	add x, a;	// 1 clock
> 
> Keep the TOS in the accumulator and I think you end up with
> 
>   	add a, x; 	// 1 clock
>   	inc DSTKPTR; 	// adjust stack pointer - 1 clock?
> 
> Does that work?  Reading below, I guess not.
> 
> 
>> If you have a data stack pointer "dsp", and want a standard Forth "+"
>> operation, you have:
>>
>> 	idxm a, dsp;	// 2 clock
>> 	mov temp, a;	// 1 clock
>> 	dec dsp;	// 1 clock
>> 	idxm a, dsp;	// 2 clock
>> 	add a, temp;	// 1 clock
>> 	idxm dsp, a;	// 2 clock
>>
>> That is 9 clocks, instead of 2, and 6 instructions instead of 3.
> 
> What does idxm do? Looks like an indirect load? Can this address
> mode be combined with any operations? Are operations limited in the
> addressing modes? This seems like a very, very simple CPU, but for the
> money, I guess I get it.

"idxm" is an indirect load or store (depending on the order of the 
operands).  No, there are no other operations that can be combined with 
indirect accesses.

If you want to keep the TOS in the accumulator, then Forth "+" becomes:

	mov temp, a;	// 1 clock
	dec dsp;	// 1 clock
	idxm a, dsp;	// 2 clock
	add a, temp;	// 1 clock

5 clocks is a good deal better than 9 clocks, but still a good deal 
worse than 2 clocks.

> 
> 
>> Of course you could make a Forth compiler for the device - but you would
>> have to make an optimising Forth compiler that avoids needing a data
>> stack, just as you do on many other small microcontollers (and just as a
>> C compiler would do).  This is /not/ a processor that fits well with
>> Forth or that would give a clear translation from Forth to assembly, as
>> is the case on some very small microcontrollers.
> 
> OK
> 

A stack-based system is often a good choice for very small cpus - it is 
certainly popular for 4-bit microcontrollers.  But it seems that the 
designers of this device simply haven't considered support for 
Forth-style coding to be important.

Reply by David Brown ●October 14, 20182018-10-14

On 13/10/18 18:59, Niklas Holsti wrote:
> And one more iteration (sorry...)
> 
> On 18-10-13 19:46 , Niklas Holsti wrote:
>> On 18-10-13 18:31 , Niklas Holsti wrote:
>>
>>> I don't think that an interpreted Forth is feasible for this particular
>>> MCU. ...
>>> Moreover, there is no indirect jump instruction -- "jump to a computed
>>> address".
>>
>> Ok, before anyone else notices, I admit I forgot about implementing an
>> indirect jump by pushing the target address on the stack and executing a
>> return instruction. That would work for this machine.
> 
> Except that one can only "push" the accumulator and flag registers, 
> combined, and the flag register cannot be set directly, and has only 4 
> working bits.
> 
> What would work, as an indirect jump, is to set the Stack Pointer (sp) 
> to point at a RAM word that contains the target address, and then 
> execute a return. But then one has lost the actual Stack Pointer value.
> 

Or you could read the SP, put that address into a different word memory 
location, and use that for indirect access to write to the stack.

It is all possible, but not particularly efficient.

Reply by Philipp Klaus Krause ●October 14, 20182018-10-14

Am 14.10.2018 um 14:37 schrieb David Brown:
> 
> A stack-based system is often a good choice for very small cpus - it is
> certainly popular for 4-bit microcontrollers.&nbsp; But it seems that the
> designers of this device simply haven't considered support for
> Forth-style coding to be important.

Efficient stack acccess is important for C, too. Putting local variables
on the stack makes functions reentrant (not so important for small
devices), and also saves memory (bery important for small devices).

The STM8 and S08 with their efficent sp-relative adressing and the Z80
with the index registers thus make better targets for C compilers than
the MCS-51 and HC08.

Philipp