$0.03 microcontroller| page 4

Reply by Niklas Holsti ●October 13, 20182018-10-13

On 18-10-13 21:31 , upsidedown@downunder.com wrote:
> On Sat, 13 Oct 2018 19:59:06 +0300, Niklas Holsti
> <niklas.holsti@tidorum.invalid> wrote:
>
>> And one more iteration (sorry...)
>>
>> On 18-10-13 19:46 , Niklas Holsti wrote:
>>> On 18-10-13 18:31 , Niklas Holsti wrote:
>>>
>>>> I don't think that an interpreted Forth is feasible for this particular
>>>> MCU. ...
>>>> Moreover, there is no indirect jump instruction -- "jump to a computed
>>>> address".
>>>
>>> Ok, before anyone else notices, I admit I forgot about implementing an
>>> indirect jump by pushing the target address on the stack and executing a
>>> return instruction. That would work for this machine.
>>
>> Except that one can only "push" the accumulator and flag registers,
>> combined, and the flag register cannot be set directly, and has only 4
>> working bits.
>>
>> What would work, as an indirect jump, is to set the Stack Pointer (sp)
>> to point at a RAM word that contains the target address, and then
>> execute a return. But then one has lost the actual Stack Pointer value.
>
> Just call a "Jumper" routine, the call pushes the return address on
> stack. In "Jumper" read SP from IO address space, indirectly modify
> the return address on stack as needed and perform a ret instruction,
> causing a jump to the modified return address and it also restores the
> SP to the value before the call.

Right, that sounds possible. But wow what a circumlocution.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by Philipp Klaus Krause ●October 13, 20182018-10-13

Am 13.10.2018 um 18:59 schrieb Niklas Holsti:
> 
> Except that one can only "push" the accumulator and flag registers,
> combined, and the flag register cannot be set directly, and has only 4
> working bits.

It seems unclear to me which of acc and sp is pushed first.
But if acc is pushed first, one could do

pushaf;
mov a, sp;
inc a;
mov sp, a;

to push any desired byte onto the stack.

Philipp

Reply by Philipp Klaus Krause ●October 13, 20182018-10-13

Am 12.10.2018 um 22:45 schrieb upsidedown@downunder.com:
> On Fri, 12 Oct 2018 22:06:02 +0200, Philipp Klaus Krause <pkk@spth.de>
> wrote:
> 
>> Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:
>>>
>>> The real issue would be the small RAM size.
>>
>> Devices with this architecture go up to 256 B of RAM (but they then cost
>> a few cent more).
>>
>> Philipp
> 
> Did you find the binary encoding of various instruction formats, i.e
> how many bits allocated to the operation code and how many for the
> address field ?
> 
> My initial guess was that the instruction word is simple 8 bit opcode
> + 8 bit address, but the bit and word address limits for the smaller
> models would suggest that for some op-codes, the op-code field might
> be wider than 8 bits and address fields narrower than 8 bits (e.g. bit
> and word addressing).
> 

People have tried before (https://www.mikrocontroller.net/topic/449689,
https://stackoverflow.com/questions/49842256/reverse-engineer-assembler-which-probably-encrypts-code).
Apparently, even with access to the tools it is not obvious.

However, a Chinese manual contains these examples:

5E0A MOV A BB1
1B21 COMP A #0x21
2040 T0SN CF
5C0B MOV BB2 A
C028 GOTO 0x28
0030 WDRESET
1F00 MOV A #0x0
0082 MOV SP A

Philipp

Reply by Niklas Holsti ●October 13, 20182018-10-13

On 18-10-13 22:24 , Philipp Klaus Krause wrote:
> Am 13.10.2018 um 18:59 schrieb Niklas Holsti:
>>
>> Except that one can only "push" the accumulator and flag registers,
>> combined, and the flag register cannot be set directly, and has only 4
>> working bits.
>
> It seems unclear to me which of acc and sp is pushed first.
> But if acc is pushed first, one could do
>
> pushaf;
> mov a, sp;
> inc a;
> mov sp, a;
>
> to push any desired byte onto the stack.

There's also a rule that the sp must always contain an even address, at 
least if interrupts are enabled, as I understand it.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by Paul Rubin ●October 13, 20182018-10-13

Michael Kellett <mk@mkesc.co.uk> writes:
> If you want a hardware minimal processor the Maxim 32660 looks like fun
> 3mm square, 24 pin Cortex M4, 96MHz, 256k flash, 96k RAM, &pound;1.16 (10 off).

That's not minimal ;).  More practically, the 3mm square package sounds
like a WLCSP which I think requires specialized ($$$) board fab
facilities (it can't be hand soldered or done with normal reflow
processes).  Part of the Padauk part's attraction is the 6-pin SOT23
package.

Here's a complete STM8 board for 0.77 USD shipped:

  https://www.aliexpress.com/item//32527571163.html

It has 8k of program flash and 1k of ram and can run a resident Forth
interpreter.  I think they also make a SOIC-8 version of the cpu.  I
bought a few of those boards for around 0.50 each last year so I guess
they have gotten a bit more expensive since then.

Reply by Tim ●October 13, 20182018-10-13

On 10/10/2018 03:05 AM, Clifford Heath wrote:
> <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf> 
> 


This is quite curious. I wonder

   - Has anyone actually received the devices they ordered? The cheaper 
variants seem to be sold out.
   - Any success in setting up a programmer?

Reply by ●October 13, 20182018-10-13

On Saturday, October 13, 2018 at 11:00:30 AM UTC-4, upsid...@downunder.com wrote:
> On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
> gnuarm.deletethisbit@gmail.com wrote:
> 
> >On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:
> >> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
> >> > On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
> >> >> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
> >> >>> upsidedown@downunder.com writes:
> >> >>>> There is a lot of operations that will update memory locations, so why
> >> >>>> would you need a lot of CPU registers.
> >> >>>
> >> >>> Being able to (say) add register to register saves traffic through the
> >> >>> accumulator and therefore instructions.
> >> >>>
> >> >>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
> >> >>>> assembly program listing.
> >> >>>
> >> >>> It would be nice to have a C compiler, and registers help with that.
> >> >>>
> >> >>
> >> >> Looking at the instruction set, it should be possible to make a backend
> >> >> for this in SDCC; the architecture looks more C-friendly than the
> >> >> existing pic14 and pic16 backends. But it surely isn't as nice as stm8
> >> >> or z80.
> >> >> reentrant functions will be inefficent: No registers, and no sp-relative
> >> >> adressing mode. On would want to reserve a few memory locations as
> >> >> pseudo-registers to help with that, but that only goes so far.
> >> > 
> >> > CPUs like this (and others that aren't like this) should be
> >> > programmed  in Forth. It's a great tool for small MCUs and many times can be hosted
> >> > on the target although not likely in this case. Still, you can bring
> >> > enough functionality onto the MCU to allow direct downloads and many
> >> > debugging features without an ICE.
> >> > 
> >> > Rick C.
> >> > 
> >> 
> >> Forth is a good language for very small devices, but there are details 
> >> that can make a huge difference in how efficient it is.  To make Forth 
> >> work well on a small chip you need a Forth-specific instruction set to 
> >> target the stack processing.  For example, adding two numbers in this 
> >> chip is two instructions - load accumulator from memory X, add 
> >> accumulator to memory Y.  In a Forth cpu, you'd have a single 
> >> instruction that does "pop two numbers, add them, push the result". 
> >> That gives a very efficient and compact instruction set.  But it is hard 
> >> to get the same results from a chip that doesn't have this kind of 
> >> stack-based instruction set.
> >
> >Your point is what exactly?  You are comparing running forth on some other chip to running forth on this chip.  How is that useful?  There are many other chips that run very fast.  So? 
> >
> >I believe others have said the instruction set is memory oriented with no registers.  
> 
> Depending how you look at it, you could claim that it has 64 registers
> and no RAM. It is a quite orthogonal single address architecture. You
> can do practically all single operand instructions (like inc/dec,
> shift/rotate etc.) either in the accumulator but equally well in any
> of the 64 "registers". For two operand instructions (such as add/sub,
> and/or etc,), either the source or destination can be in the memory
> "register". 
> 
> Both  Acc = Acc Op Memory or alternatively  Memory = Acc Op Memory are
> valid. 
> 
> Thus the accumulator is needed only for two operand instructions, but
> not for single operand instructions.

How fast are instructions that access memory?  Most MCUs will perform register operations in a single cycle.  Even though RAM may be on chip, it typically is not as fast as registers because it is usually not multiported.  DSP chips are an exception with dual and even triple ported on chip RAM. 


> >I think that means in general the CPU will be slow compared to a register based design.
> 
> What is the difference, you have 64 on chip RAM bytes or 64 single
> byte on chip registers. The situation would have been different with
> on-chip registers and off chip RAM, with the memory bottleneck. 
> 
> Of course, there were odd architectures like the TI 9900 with a set of
> sixteen 16 bit general purpose register in RAN. The set could be
> switched fast in interrupts, but slowed down any general purpose
> register access.

Yeah, I'm familiar with the 9900.  In the 990 it worked well because the CPU was TTL and not so fast.  Once the CPU was on a single chip the external RAM was not fast enough to keep up really and instruction timings were dominated by the memory. 

 
> >That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack. 
> 
> For a stack computer you need a pointer register with preferably
> autoincrement/decrement support. This processor has indirect access
> and single instruction increment or decrement support without
> disturbing the accumulator.Thus not so bad after all for stack
> computing.

The stack in memory is usually a bottle neck because memory is typically slow so optimizations would be done to keep operands in registers.  In this chip no optimizations are possible, but likely it wouldn't be too bad as long as the stack operations are flexible enough.  But then I don't think you said this CPU has the sort of addressing that allows an operand in memory to be used and popped off the stack in one opcode as many, higher level CPUs do.  So adding the two numbers on the stack would involve keeping the top of stack in the accumulator, adding the next item on the stack from memory to the accumulator, then another instruction to adjust the stack pointer which is also in memory.  So two instructions?  How many clock cycles? 

What happens when there is a change in the instruction pointer of the Forth virtual machine?  Calling a new word would require saving the current value of the Forth IP on the return stack (separate from the data stack) and loading a new value into the Forth IP?  This is a piece of code typically called "next".  It varies a bit between indirect and direct threaded code.  Then there is subroutine threaded code that just uses the CPU IP as the Forth IP and each address is actually a CPU call instruction. 

Rick C.

Reply by ●October 13, 20182018-10-13

On Saturday, October 13, 2018 at 11:31:30 AM UTC-4, Niklas Holsti wrote:
> On 18-10-12 19:11 , gnuarm.deletethisbit@gmail.com wrote:
> > On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
> >> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
> >>> upsidedown@downunder.com writes:
> >>>> There is a lot of operations that will update memory locations, so why
> >>>> would you need a lot of CPU registers.
> >>>
> >>> Being able to (say) add register to register saves traffic through the
> >>> accumulator and therefore instructions.
> >>>
> >>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
> >>>> assembly program listing.
> 
> The data-sheet describes the OTP program memory as "1KW", probably 
> meaning 1024 instructions. The length of an instruction is not defined, 
> as far as I could see.
> 
> >>> It would be nice to have a C compiler, and registers help with that.
> 
> The data-sheet mentions something they call "Mini-C".
> 
> >> Looking at the instruction set, it should be possible to make a backend
> >> for this in SDCC; the architecture looks more C-friendly than the
> >> existing pic14 and pic16 backends. But it surely isn't as nice as stm8
> >> or z80.
> >> reentrant functions will be inefficent: No registers, and no sp-relative
> >> adressing mode. On would want to reserve a few memory locations as
> >> pseudo-registers to help with that, but that only goes so far.
> >
> > CPUs like this (and others that aren't like this) should be programmed
> > in Forth.
> 
> I don't think that an interpreted Forth is feasible for this particular 
> MCU. Where would the Forth program (= list of pointers to "words") be 
> stored? I found no instructions for reading data from the OTP program 
> memory, and the 64-byte RAM will not hold a non-trivial program together 
> with the data for that program.
> 
> Moreover, there is no indirect jump instruction -- "jump to a computed 
> address". The closest is "pcadd a", which can be used to implement a 
> 256-entry case statement. You would be limited to a total of 256 words.

For programs on such a small MCU 256 words is likely much overkill.  But you don't need to have the above features for Forth.  Subroutine threading uses call and return instructions instead of an address list. 


> Moreover, each RAM-resident pointer to RAM uses 2 octets of RAM, giving 
> a 16-bit RAM address, although for this MCU a 6-bit address would be 
> enough. Apparently the same architecture has implementations with more 
> RAM and 16-bit RAM addresses.
> 
> That said, one could perhaps implement a compiled Forth for this machine.

Yeah, I'm pretty sure it is too small for a resident Forth, so a host would be required and a Forth can be compiled and subroutine threaded.

Rick C.

Reply by ●October 13, 20182018-10-13

On Saturday, October 13, 2018 at 12:21:51 PM UTC-4, David Brown wrote:
> On 13/10/18 14:06, gnuarm.deletethisbit@gmail.com wrote:
> > On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown
> > wrote:
> >> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
> >>> On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus
> >>> Krause wrote:
> >>>> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
> >>>>> upsidedown@downunder.com writes:
> >>>>>> There is a lot of operations that will update memory
> >>>>>> locations, so why would you need a lot of CPU registers.
> >>>>> 
> >>>>> Being able to (say) add register to register saves traffic
> >>>>> through the accumulator and therefore instructions.
> >>>>> 
> >>>>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of
> >>>>>> commented assembly program listing.
> >>>>> 
> >>>>> It would be nice to have a C compiler, and registers help
> >>>>> with that.
> >>>>> 
> >>>> 
> >>>> Looking at the instruction set, it should be possible to make a
> >>>> backend for this in SDCC; the architecture looks more
> >>>> C-friendly than the existing pic14 and pic16 backends. But it
> >>>> surely isn't as nice as stm8 or z80. reentrant functions will
> >>>> be inefficent: No registers, and no sp-relative adressing mode.
> >>>> On would want to reserve a few memory locations as 
> >>>> pseudo-registers to help with that, but that only goes so far.
> >>> 
> >>> CPUs like this (and others that aren't like this) should be 
> >>> programmed  in Forth. It's a great tool for small MCUs and many
> >>> times can be hosted on the target although not likely in this
> >>> case. Still, you can bring enough functionality onto the MCU to
> >>> allow direct downloads and many debugging features without an
> >>> ICE.
> >>> 
> >>> Rick C.
> >>> 
> >> 
> >> Forth is a good language for very small devices, but there are
> >> details that can make a huge difference in how efficient it is.  To
> >> make Forth work well on a small chip you need a Forth-specific
> >> instruction set to target the stack processing.  For example,
> >> adding two numbers in this chip is two instructions - load
> >> accumulator from memory X, add accumulator to memory Y.  In a Forth
> >> cpu, you'd have a single instruction that does "pop two numbers,
> >> add them, push the result". That gives a very efficient and compact
> >> instruction set.  But it is hard to get the same results from a
> >> chip that doesn't have this kind of stack-based instruction set.
> > 
> > Your point is what exactly?  You are comparing running forth on some
> > other chip to running forth on this chip.  How is that useful?  There
> > are many other chips that run very fast.  So?
> 
> My point is that /this/ CPU is not a good match for Forth, though many
> other very cheap CPUs are.  Whether or not you think that matches "CPUs
> like this should be programmed in Forth" depends on what you mean by
> "CPUs like this", and what you think the benefits of Forth are.
> 
> > 
> > I believe others have said the instruction set is memory oriented
> > with no registers.  I think that means in general the CPU will be
> > slow compared to a register based design.  That actually means it is
> > easier to have a fast Forth implementation compared to other
> > compilers since there won't be a significant penalty for using a
> > stack.
> > 
> 
> It has a single register, not unlike the "W" register in small PIC 
> devices.  Yes, I expect it is going to be slower than you would get from 
> having a few more registers.  But it is missing (AFAICS) auto-increment 
> and decrement modes, and has only load/store operations with indirect 
> access.
> 
> So if you have two 8-bit bytes x and y, then adding them as "x += y;" is:
> 
> 	mov a, y;	// 1 clock
> 	add x, a;	// 1 clock

Keep the TOS in the accumulator and I think you end up with 

 	add a, x; 	// 1 clock
 	inc DSTKPTR; 	// adjust stack pointer - 1 clock? 

Does that work?  Reading below, I guess not. 


> If you have a data stack pointer "dsp", and want a standard Forth "+" 
> operation, you have:
> 
> 	idxm a, dsp;	// 2 clock
> 	mov temp, a;	// 1 clock
> 	dec dsp;	// 1 clock
> 	idxm a, dsp;	// 2 clock
> 	add a, temp;	// 1 clock
> 	idxm dsp, a;	// 2 clock
> 
> That is 9 clocks, instead of 2, and 6 instructions instead of 3.

What does idxm do?  Looks like an indirect load?  Can this address mode be combined with any operations?  Are operations limited in the addressing modes?  This seems like a very, very simple CPU, but for the money, I guess I get it. 


> Of course you could make a Forth compiler for the device - but you would 
> have to make an optimising Forth compiler that avoids needing a data 
> stack, just as you do on many other small microcontollers (and just as a 
> C compiler would do).  This is /not/ a processor that fits well with 
> Forth or that would give a clear translation from Forth to assembly, as 
> is the case on some very small microcontrollers.

OK

Rick C.

Reply by Paul Rubin ●October 13, 20182018-10-13

gnuarm.deletethisbit@gmail.com writes:
> Keep the TOS in the accumulator

Do you mean you want a Forth with 8-bit data cells?  What about the
cells on the return stack, if there is one?

> What does idxm do?  Looks like an indirect load?

Yes.

> Can this address mode be combined with any operations?

No.  Just load or store.