Reply by Klaus Kragelund December 26, 20182018-12-26
A STM8 or other 8 bit device can be had for about 15 cents in volume, 8 kB flash

Dev of a 1 bit device with NRE and low volume since you will be the only customer will be much more

Cheers

Klaus
Reply by Philipp Klaus Krause November 11, 20182018-11-11
Am 12.10.18 um 22:45 schrieb upsidedown@downunder.com:
> On Fri, 12 Oct 2018 22:06:02 +0200, Philipp Klaus Krause <pkk@spth.de> > wrote: > >> Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com: >>> >>> The real issue would be the small RAM size. >> >> Devices with this architecture go up to 256 B of RAM (but they then cost >> a few cent more). >> >> Philipp > > Did you find the binary encoding of various instruction formats, i.e > how many bits allocated to the operation code and how many for the > address field ? > > My initial guess was that the instruction word is simple 8 bit opcode > + 8 bit address, but the bit and word address limits for the smaller > models would suggest that for some op-codes, the op-code field might > be wider than 8 bits and address fields narrower than 8 bits (e.g. bit > and word addressing). >
It is more complicated. Apparently the encoding changed from a 16-bit instruction word used by older types (https://www.mikrocontroller.net/topic/461002#5616813) to a 14-bit instruction word used by newer types (https://www.mikrocontroller.net/topic/461002#5616603). Padauk also dropped and added various instructions at some points (e.g. ldtabh, ldtabl, mul, pushw, popw). Philipp
Reply by Philipp Klaus Krause November 9, 20182018-11-09
Am 08.11.18 um 23:35 schrieb upsidedown@downunder.com:
>>>> And the linker would have to analyze the call graph to always >>>> call the correct function for each thread. >>> >>> Linker for such small target ? >> >> Of course. The support routines the compiler uses reside in some >> library, the linker links them in if necessary. Also, the larger >> variants are not that small, with up to 256 B of RAM and 8 KB of ROM. >> One might want to e.g. have one .c file for handling I&sup2;", one for the >> soft UART, etc. > > A linker is required, if the libraries are (for copyright reasons) > delivered as binary object code only. > > However, if the library are delivered as source files and the > compiler/assembler has even a rudimentary #include mechanism, just > include those library files you need. With a include or macro > processor with parameter passing, just invoke same include file or > macro twice with different parameters for different static variable > instances. > > Of course, linkers are also needed, if very primitive compilation > machines are used, such as floppy based Intellecs or Exorcisers. It > could take a day to compile a large program all the way from sources, > with multiple floppy changes to get the final absolute file to a > single floppy, ready to be burnt into EPROMS for an additional hour or > two. In such environment compiling, linking and burning only the > source file changed would speed up program development a lot. > > When using a modern PC for compilation, there are no such issues. >
Separate compilation and then linking is the normal thing to, and a common workflow for small devices. This is e.g. how most people use SDCC, a mainstream free compiler targeting various 8-bit architectures. That doesn't mean it is the only way (and since SDCC does not have link-time optimization it might not be the optimal way either). But it is something people use and expect to work reasonably well. So for anyone designing an architecture it would be wise to not put too many obstacles into that workflow. Philipp
Reply by November 8, 20182018-11-08
On Thu, 8 Nov 2018 21:56:16 +0100, Philipp Klaus Krause <pkk@spth.de>
wrote:

>Am 08.11.18 um 20:52 schrieb upsidedown@downunder.com: >>> >>> But static memory allocation would require one copy of each function per >>> thread. >> >> For a foreground/background monitor, the worst case would be two >> copies of static data, if both threads use the same rubroutine. >> >>> And the linker would have to analyze the call graph to always >>> call the correct function for each thread. >> >> Linker for such small target ? > >Of course. The support routines the compiler uses reside in some >library, the linker links them in if necessary. Also, the larger >variants are not that small, with up to 256 B of RAM and 8 KB of ROM. >One might want to e.g. have one .c file for handling I&#4294967295;", one for the >soft UART, etc.
A linker is required, if the libraries are (for copyright reasons) delivered as binary object code only. However, if the library are delivered as source files and the compiler/assembler has even a rudimentary #include mechanism, just include those library files you need. With a include or macro processor with parameter passing, just invoke same include file or macro twice with different parameters for different static variable instances. Of course, linkers are also needed, if very primitive compilation machines are used, such as floppy based Intellecs or Exorcisers. It could take a day to compile a large program all the way from sources, with multiple floppy changes to get the final absolute file to a single floppy, ready to be burnt into EPROMS for an additional hour or two. In such environment compiling, linking and burning only the source file changed would speed up program development a lot. When using a modern PC for compilation, there are no such issues.
Reply by Philipp Klaus Krause November 8, 20182018-11-08
Am 08.11.18 um 20:52 schrieb upsidedown@downunder.com:
>> >> But static memory allocation would require one copy of each function per >> thread. > > For a foreground/background monitor, the worst case would be two > copies of static data, if both threads use the same rubroutine. > >> And the linker would have to analyze the call graph to always >> call the correct function for each thread. > > Linker for such small target ?
Of course. The support routines the compiler uses reside in some library, the linker links them in if necessary. Also, the larger variants are not that small, with up to 256 B of RAM and 8 KB of ROM. One might want to e.g. have one .c file for handling I&sup2;", one for the soft UART, etc.
> > With such small processor, just track any dependencies manually.
See above.
> >> Function pointers get complicated. > > Do you really insist of using function pointer with such small > targets? >
I want to have C, function pointers are part of it.
>> >> Unfortunately, reentrancy becomes even harder with >> hardware-multithreading: > > With two hardware threads, you would need at most two copies of static > data.
Padauk still makes one chip with 8 hardware threads (and it looks to me as if there were more in the past, though they are not currently listed on their website, one can find them e.g. in their IDE).
> >> TO access the stack, one has to construct a >> pointer to the stack location in a memory location. > > Why would you want to access the stack ?
For reentrency, so I can use one function implementation for all threads. It would also be useful to dynamically assign threads to hardware threads (so no thread is tied to specific hardware, and some OS schedules them).
> > The stack is usable for handling return addresses, but I guess that a > hardware thread must have its own return address stack pointer.
Each hardware thread has its flag register (4 bits) accumulator (8 bits), pc (12 bits) and stack pointer (8 bits).
> >> That memory location >> (as any pseudo-registers) is then shared among all running instances of >> the function. So it needs to be protected (e.g. with a spinlock), making >> access even more inefficient. And that spinlock will cause issues with >> interrupts (a solution might be to heavily restrict interrupt routines, >> essentially allowing not much more than setting some global variables). > > Disabling all interrupts for the duration of some critical operations > is often enough, but of course, the number of instructions executed > during interrupt disabled should be minimized.
Disabling interrupts any time a spinlock is held or a thread is wating for one might be too much, especially if there are many threads, so the spinlock is held often. Philipp
Reply by November 8, 20182018-11-08
On Thu, 8 Nov 2018 13:53:48 +0100, Philipp Klaus Krause <pkk@spth.de>
wrote:

>Am 12.10.18 um 20:39 schrieb upsidedown@downunder.com: >> On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de> >> wrote: >> >>> Am 10.10.2018 um 03:05 schrieb Clifford Heath: >>>> <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf> >>>> >>>> >>>> OTP, no SPI, UART or I&#4294967295;C, but still... >>>> >>>> Clifford Heath >>> >>> They even make dual-core variants (the part where the first digit in the >>> part number is '2'). It seems program counter, stack pointer, flag >>> register and accumulator are per-core, while the rest, including the ALU >>> is shared. In particular, the I/O registers are also shared, which means >>> some multiplier registers would also be - but currently all variants >>> with integrated multiplier are single-core. >>> Use of the ALU is shared byt he two cores, alternating by clock cycle. >>> >>> Philipp >> >> >> Interesting, that would make it easy to run a multitasking RTOS >> (foreground/background) monitor, which might justify the use of some >> reentrant library routines :-). But in reality, the available memory >> (ROM/RAM) is so small so that you could easily manage this with static >> memory allocations. >> >> > >But static memory allocation would require one copy of each function per >thread.
For a foreground/background monitor, the worst case would be two copies of static data, if both threads use the same rubroutine.
>And the linker would have to analyze the call graph to always >call the correct function for each thread.
Linker for such small target ? With such small processor, just track any dependencies manually.
>Function pointers get complicated.
Do you really insist of using function pointer with such small targets?
> >Unfortunately, reentrancy becomes even harder with >hardware-multithreading:
With two hardware threads, you would need at most two copies of static data.
>TO access the stack, one has to construct a >pointer to the stack location in a memory location.
Why would you want to access the stack ? The stack is usable for handling return addresses, but I guess that a hardware thread must have its own return address stack pointer. In fact many minicomputers from the 1960's did not even have a stack at all. The calling program just stored the return address in the first word of the subroutine and the at the end o the subroutine, performed an indirect jump through the first word of the subroutine to return to the calling program. Of course, this is not re-entrant and in those days one did not have to worry about multiple CPUs accessing the same routines:-). BTW, who needs a program counter (PC), many microprograms run without a PC, with the next instruction address stored at the end of the long instruction word :-)
>That memory location >(as any pseudo-registers) is then shared among all running instances of >the function. So it needs to be protected (e.g. with a spinlock), making >access even more inefficient. And that spinlock will cause issues with >interrupts (a solution might be to heavily restrict interrupt routines, >essentially allowing not much more than setting some global variables).
Disabling all interrupts for the duration of some critical operations is often enough, but of course, the number of instructions executed during interrupt disabled should be minimized. In MACRO-11 assembler, the standard practice was to start the comment field with a semicolon, when task switching was disabled with two semicolons and when interrupt disabled with three semicolons, it was visually easy to detect when interrupts were disabled and not mess too much with such code sections.
> >The there is the trade-off of using one such memory location per >function vs. per program (the latter reducing memroy usage, but >resulting in less paralellism). > >The pseudo-registers one would want to use are not so much a problem for >interrupt routines (they would just need saving and thus increase >interrupt overhead a bit), but for hardware parallelism. Essentially all >access to them would again have to be protected by a spinlock. > >All these problems could have relatively easily been avoided by >providing an efficient stack-pointer-relative addressing mode. Having a >few general-purpose or index registers would have somewhat helped as well. > >Philipp
Reply by Philipp Klaus Krause November 8, 20182018-11-08
Am 08.11.18 um 14:08 schrieb Tauno Voipio:
> > > And you'll end up with a low-end Cortex ... >
A low-end Cortex would still be far heavier than a Padauk variant with an sp-relative adressing mode or a few registers added. I think a more multithreading-friendly variant of the Padauk would even still be simpler than an STM8. But one could surely create a nice STM8-like (with a few STM8 weaknesses fixed) processor with hardware multihreading. Philipp
Reply by Tauno Voipio November 8, 20182018-11-08
On 8.11.18 14:53, Philipp Klaus Krause wrote:
> Am 12.10.18 um 20:39 schrieb upsidedown@downunder.com: >> On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de> >> wrote: >> >>> Am 10.10.2018 um 03:05 schrieb Clifford Heath: >>>> <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf> >>>> >>>> >>>> OTP, no SPI, UART or I&sup2;C, but still... >>>> >>>> Clifford Heath >>> >>> They even make dual-core variants (the part where the first digit in the >>> part number is '2'). It seems program counter, stack pointer, flag >>> register and accumulator are per-core, while the rest, including the ALU >>> is shared. In particular, the I/O registers are also shared, which means >>> some multiplier registers would also be - but currently all variants >>> with integrated multiplier are single-core. >>> Use of the ALU is shared byt he two cores, alternating by clock cycle. >>> >>> Philipp >> >> >> Interesting, that would make it easy to run a multitasking RTOS >> (foreground/background) monitor, which might justify the use of some >> reentrant library routines :-). But in reality, the available memory >> (ROM/RAM) is so small so that you could easily manage this with static >> memory allocations. >> >> > > But static memory allocation would require one copy of each function per > thread. And the linker would have to analyze the call graph to always > call the correct function for each thread. Function pointers get > complicated. > > Unfortunately, reentrancy becomes even harder with > hardware-multithreading: TO access the stack, one has to construct a > pointer to the stack location in a memory location. That memory location > (as any pseudo-registers) is then shared among all running instances of > the function. So it needs to be protected (e.g. with a spinlock), making > access even more inefficient. And that spinlock will cause issues with > interrupts (a solution might be to heavily restrict interrupt routines, > essentially allowing not much more than setting some global variables). > > The there is the trade-off of using one such memory location per > function vs. per program (the latter reducing memroy usage, but > resulting in less paralellism). > > The pseudo-registers one would want to use are not so much a problem for > interrupt routines (they would just need saving and thus increase > interrupt overhead a bit), but for hardware parallelism. Essentially all > access to them would again have to be protected by a spinlock. > > All these problems could have relatively easily been avoided by > providing an efficient stack-pointer-relative addressing mode. Having a > few general-purpose or index registers would have somewhat helped as well. > > Philipp
And you'll end up with a low-end Cortex ... -- -TV
Reply by Philipp Klaus Krause November 8, 20182018-11-08
Am 12.10.18 um 20:39 schrieb upsidedown@downunder.com:
> On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de> > wrote: > >> Am 10.10.2018 um 03:05 schrieb Clifford Heath: >>> <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf> >>> >>> >>> OTP, no SPI, UART or I&sup2;C, but still... >>> >>> Clifford Heath >> >> They even make dual-core variants (the part where the first digit in the >> part number is '2'). It seems program counter, stack pointer, flag >> register and accumulator are per-core, while the rest, including the ALU >> is shared. In particular, the I/O registers are also shared, which means >> some multiplier registers would also be - but currently all variants >> with integrated multiplier are single-core. >> Use of the ALU is shared byt he two cores, alternating by clock cycle. >> >> Philipp > > > Interesting, that would make it easy to run a multitasking RTOS > (foreground/background) monitor, which might justify the use of some > reentrant library routines :-). But in reality, the available memory > (ROM/RAM) is so small so that you could easily manage this with static > memory allocations. > >
But static memory allocation would require one copy of each function per thread. And the linker would have to analyze the call graph to always call the correct function for each thread. Function pointers get complicated. Unfortunately, reentrancy becomes even harder with hardware-multithreading: TO access the stack, one has to construct a pointer to the stack location in a memory location. That memory location (as any pseudo-registers) is then shared among all running instances of the function. So it needs to be protected (e.g. with a spinlock), making access even more inefficient. And that spinlock will cause issues with interrupts (a solution might be to heavily restrict interrupt routines, essentially allowing not much more than setting some global variables). The there is the trade-off of using one such memory location per function vs. per program (the latter reducing memroy usage, but resulting in less paralellism). The pseudo-registers one would want to use are not so much a problem for interrupt routines (they would just need saving and thus increase interrupt overhead a bit), but for hardware parallelism. Essentially all access to them would again have to be protected by a spinlock. All these problems could have relatively easily been avoided by providing an efficient stack-pointer-relative addressing mode. Having a few general-purpose or index registers would have somewhat helped as well. Philipp
Reply by Philipp Klaus Krause November 5, 20182018-11-05
Am 12.10.2018 um 09:44 schrieb David Brown:
> On 12/10/18 08:50, Philipp Klaus Krause wrote: >> Am 12.10.2018 um 01:08 schrieb Paul Rubin: >>> upsidedown@downunder.com writes: >>>> There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers. >>> >>> Being able to (say) add register to register saves traffic through the >>> accumulator and therefore instructions. >>> >>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>> assembly program listing. >>> >>> It would be nice to have a C compiler, and registers help with that. >>> >> >> Looking at the instruction set, it should be possible to make a backend >> for this in SDCC; the architecture looks more C-friendly than the >> existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >> or z80. >> reentrant functions will be inefficent: No registers, and no sp-relative >> adressing mode. On would want to reserve a few memory locations as >> pseudo-registers to help with that, but that only goes so far. >> > > It looks like the lowest 16 memory addresses could be considered > pseudo-registers - they are the ones that can be used for direct memory > access rather than needing indirect access. >
Considering the multi-core variants of the Padauk &micro;Cs: Those adresses are shared across all cores. Each core only has its own A, SP, F, PC. How do we handle local variables? Option 1: Make functions non-reentrant. Requires duplication of code (we need per-thread copies of functions), and link-time analysis to ensure that each thread only calls the function implementation meant for it. Functions pointers get complicated. Option 2: Use an inefficient combination of thread-local storage and stack. Since this is a small &micro;C, we need a lot of support functions, which the compiler inserts (e.g. for multiplication); of course those are affected by the same problems. Philipp