EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

$0.03 microcontroller

Started by Clifford Heath October 9, 2018
On Sunday, October 21, 2018 at 12:51:34 PM UTC-5, gnuarm.del...@gmail.com wrote:
> On Sunday, October 21, 2018 at 12:31:34 PM UTC-4, jim.bra...@ieee.org wrote: > > On Sunday, October 21, 2018 at 10:08:06 AM UTC-5, gnuarm.del...@gmail.com wrote: > > > On Sunday, October 21, 2018 at 10:47:26 AM UTC-4, jim.bra...@ieee.org wrote: > > > > On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote: > > > > > On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin > > > > > <no.email@nospam.invalid> wrote: > > > > > > > > > > >Clifford Heath <no.spam@please.net> writes: > > > > > >> <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> > > > > > >> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf> > > > > > >> OTP, no SPI, UART or I&#28046;, but still... > > > > > > > > > > > >That is impressive! Seems to be an 8-bit RISC with no registers, just > > > > > >an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram, > > > > > >enough for plenty of MCU things. Didn't check if it has an ADC or PWM. > > > > > >I like that it's in a 6-pin SOT23 package since there aren't many other > > > > > >MCUs that small. > > > > > > > > > > Slightly OT, but I have often wonder how primitive a computer > > > > > architecture can be and still do some useful work. In the > > > > > tube/discrete/SSI times, there were quite a lot 1 bit processors. > > > > > There were at least two types, the PLC (programmable Logic Controller) > > > > > type replacing relay logic. These had typically at least AND, OR, NOT, > > > > > (XOR) instructions.The other group was used as truly serial computers > > > > > with the same instructions as the PLC but also at least a 1 bit SUB > > > > > (and ADD) instructions to implement all mathematical functions. > > > > > > > > > > However, in the LSI era, there down't seem to be many implement ions. > > > > > > > > > > One that immediately comes in mind is the MC14500B PLC building block, > > > > > from the 1970's, which requires quite lot of support chips (code > > > > > memory, PC, /O chips) to do some useful work. > > > > > > > > > > After much searching, I found the (NI) National Instruments SBA > > > > > (Serial Boolean Analyser) > > > > > http://www.wass.net/othermanuals/GI%20SBA.pdf > > > > > from the same era, with 1024 word instructions (8 bit) ROM and four > > > > > banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package. > > > > > For the re-entrance enthusiasts, it contains stack pointer relative > > > > > addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 > > > > > Darlington buffers may be needed to drive loads typically found in PLC > > > > > environment. > > > > > > > > > > Anyone seen more modern 1 bit chips either for relay replacement or > > > > > for truly serial computers ? > > > > > > > > ]> Anyone seen more modern 1 bit chips either for relay replacement or > > > > ]> for truly serial computers ? > > > > > > > > LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose > > > > (Logic Emulation Machine) https://opencores.org/project/lem1_9min > > > > > > > > Jim Brakefield > > > > > > It is hard for me to imagine applications where a 1 bit processor would be useful. A useful N bit processor can be built in a small number of LUTs. I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less. > > > > > > I discussed this with someone once and he imagined apps where the processing speed requirement was quite low and you can save LUTs with a bit serial processor. I just don't know how many or why it would matter. Even the smallest FPGAs have thousands of LUTs. It's hard to picture an application where you couldn't spare a few hundred LUTs. > > > > > > Rick C. > > > > ]>It's hard to picture an application where you couldn't spare a few hundred LUTs. > > > > There are advantages to using several soft core processors, each sized and customized to the need. > > > > ]>I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less. > > > > There are many under 600 LUTs, including 32-bit. Had hoped the full featured LEM design would be under 100 LUTs. > > Have done some rough research of whats available for under 600 LUTs: https://opencores.org/project/up_core_list/downloads > > select: "By Performance Metric" > > > > A big rational for small soft core processors is that they replace LUTs (slow speed logic) with block RAM (instructions). And they are completely deterministic as opposed to doing the same by time slicing a ASIC (ARM) processor. > > I won't argue a bit that softcores and especially *customizable* softcore CPUs aren't useful. I was talking about there being at best a very tiny region of utility for 1-bit processors. > > My 600 LUT processor didn't trade off much for performance. It would run pretty fast and was pretty capable. In addition the word size was independent of the instruction set. That said, there are apps where a much less powerful processor would do fine and saving a few more LUTs would be useful. > > Rick C.
]>there being at best a very tiny region of utility for 1-bit processors There are a small number of examples: Bit serial processors such as DEC PDP8L, early vacuum tube & drum machines, for example Bendix G-15. Bit serial Cordic Also telling, is that 4-bit processors for calculators have been replaced by 8-bit processors. My inspiration was EDIF, which was/is output from VHDL & Verilog compilers. E.g. use EDIF as a machine language. In the context of logic simulation, greater FPGA capacity possible for slow logic. This effort also lead to a theoretical insight for brain modelling: There is greater information content in the wiring than in the logic. The human brain has 2<<36+ neurons requiring 36-bits of information for each connection and only 16 or so bits for the state/configuration of each synapse. Also a FPGA requires 60+ bits to route each LUT input (assuming all LUT inputs in use) whereas each possible input can be specified by 20 bits or less (1M LUT FPGA). Of course optimizing simulators convert the EDIF to an existing machine language. Likewise for industrial automation (ladder logic, ...). Jim Brakefield
<jim.brakefield@ieee.org> wrote:
> On Sunday, October 21, 2018 at 12:51:34 PM UTC-5, gnuarm.del...@gmail.com wrote: >> On Sunday, October 21, 2018 at 12:31:34 PM UTC-4, jim.bra...@ieee.org wrote: >>> On Sunday, October 21, 2018 at 10:08:06 AM UTC-5, gnuarm.del...@gmail.com wrote: >>>> On Sunday, October 21, 2018 at 10:47:26 AM UTC-4, jim.bra...@ieee.org wrote: >>>>> On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote: >>>>>> On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin >>>>>> <no.email@nospam.invalid> wrote: >>>>>> >>>>>>> Clifford Heath <no.spam@please.net> writes: >>>>>>>> <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>>>>>>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf> >>>>>>>> OTP, no SPI, UART or I&#28046;, but still... >>>>>>> >>>>>>> That is impressive! Seems to be an 8-bit RISC with no registers, just >>>>>>> an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram, >>>>>>> enough for plenty of MCU things. Didn't check if it has an ADC or PWM. >>>>>>> I like that it's in a 6-pin SOT23 package since there aren't many other >>>>>>> MCUs that small. >>>>>> >>>>>> Slightly OT, but I have often wonder how primitive a computer >>>>>> architecture can be and still do some useful work. In the >>>>>> tube/discrete/SSI times, there were quite a lot 1 bit processors. >>>>>> There were at least two types, the PLC (programmable Logic Controller) >>>>>> type replacing relay logic. These had typically at least AND, OR, NOT, >>>>>> (XOR) instructions.The other group was used as truly serial computers >>>>>> with the same instructions as the PLC but also at least a 1 bit SUB >>>>>> (and ADD) instructions to implement all mathematical functions. >>>>>> >>>>>> However, in the LSI era, there down't seem to be many implement ions. >>>>>> >>>>>> One that immediately comes in mind is the MC14500B PLC building block, >>>>>> from the 1970's, which requires quite lot of support chips (code >>>>>> memory, PC, /O chips) to do some useful work. >>>>>> >>>>>> After much searching, I found the (NI) National Instruments SBA >>>>>> (Serial Boolean Analyser) >>>>>> http://www.wass.net/othermanuals/GI%20SBA.pdf >>>>>> from the same era, with 1024 word instructions (8 bit) ROM and four >>>>>> banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package. >>>>>> For the re-entrance enthusiasts, it contains stack pointer relative >>>>>> addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 >>>>>> Darlington buffers may be needed to drive loads typically found in PLC >>>>>> environment. >>>>>> >>>>>> Anyone seen more modern 1 bit chips either for relay replacement or >>>>>> for truly serial computers ? >>>>> >>>>> ]> Anyone seen more modern 1 bit chips either for relay replacement or >>>>> ]> for truly serial computers ? >>>>> >>>>> LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose >>>>> (Logic Emulation Machine) https://opencores.org/project/lem1_9min >>>>> >>>>> Jim Brakefield >>>> >>>> It is hard for me to imagine applications where a 1 bit processor >>>> would be useful. A useful N bit processor can be built in a small >>>> number of LUTs. I've built a 16 bit processor in just 600 LUTs and >>>> I've seen processors in a bit less. >>>> >>>> I discussed this with someone once and he imagined apps where the >>>> processing speed requirement was quite low and you can save LUTs with >>>> a bit serial processor. I just don't know how many or why it would >>>> matter. Even the smallest FPGAs have thousands of LUTs. It's hard to >>>> picture an application where you couldn't spare a few hundred LUTs. >>>> >>>> Rick C. >>> >>> ]>It's hard to picture an application where you couldn't spare a few hundred LUTs. >>> >>> There are advantages to using several soft core processors, each sized >>> and customized to the need. >>> >>> ]>I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less. >>> >>> There are many under 600 LUTs, including 32-bit. Had hoped the full >>> featured LEM design would be under 100 LUTs. >>> Have done some rough research of whats available for under 600 LUTs: >>> https://opencores.org/project/up_core_list/downloads >>> select: "By Performance Metric" >>> >>> A big rational for small soft core processors is that they replace LUTs >>> (slow speed logic) with block RAM (instructions). And they are >>> completely deterministic as opposed to doing the same by time slicing a >>> ASIC (ARM) processor. >> >> I won't argue a bit that softcores and especially *customizable* >> softcore CPUs aren't useful. I was talking about there being at best a >> very tiny region of utility for 1-bit processors. >> >> My 600 LUT processor didn't trade off much for performance. It would >> run pretty fast and was pretty capable. In addition the word size was >> independent of the instruction set. That said, there are apps where a >> much less powerful processor would do fine and saving a few more LUTs would be useful. >> >> Rick C. > > ]>there being at best a very tiny region of utility for 1-bit processors > > There are a small number of examples: > Bit serial processors such as DEC PDP8L, early vacuum tube & drum > machines, for example Bendix G-15. > Bit serial Cordic > > Also telling, is that 4-bit processors for calculators have been replaced > by 8-bit processors. > > My inspiration was EDIF, which was/is output from VHDL & Verilog > compilers. E.g. use EDIF as a machine language. In the context of logic > simulation, greater FPGA capacity possible for slow logic. > > This effort also lead to a theoretical insight for brain modelling: There > is greater information content in the wiring than in the logic. The > human brain has 2<<36+ neurons requiring 36-bits of information for each > connection and only 16 or so bits for the state/configuration of each > synapse. Also a FPGA requires 60+ bits to route each LUT input (assuming > all LUT inputs in use) whereas each possible input can be specified by 20 > bits or less (1M LUT FPGA).
The clock speed is quite low, 2 Hz? So the wetware is is not quite impossible to emulate with current tech. Raising a baby and training the resultant adult to do a task is still many orders of magnitude cheaper. ;)
> Of course optimizing simulators convert the EDIF to an existing machine > language. Likewise for industrial automation (ladder logic, ...). > > Jim Brakefield >
On Sun, 21 Oct 2018 16:27:31 +0300, upsidedown@downunder.com wrote:

>Slightly OT, but I have often wonder how primitive a computer >architecture can be and still do some useful work. In the >tube/discrete/SSI times, there were quite a lot 1 bit processors. >There were at least two types, the PLC (programmable Logic Controller) >type replacing relay logic. These had typically at least AND, OR, NOT, >(XOR) instructions.The other group was used as truly serial computers >with the same instructions as the PLC but also at least a 1 bit SUB >(and ADD) instructions to implement all mathematical functions. > >However, in the LSI era, there down't seem to be many implement ions. > >One that immediately comes in mind is the MC14500B PLC building block, >from the 1970's, which requires quite lot of support chips (code >memory, PC, /O chips) to do some useful work. > >After much searching, I found the (NI) National Instruments SBA >(Serial Boolean Analyser) >http://www.wass.net/othermanuals/GI%20SBA.pdf >from the same era, with 1024 word instructions (8 bit) ROM and four >banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package. >For the re-entrance enthusiasts, it contains stack pointer relative >addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 >Darlington buffers may be needed to drive loads typically found in PLC >environment. > >Anyone seen more modern 1 bit chips either for relay replacement or >for truly serial computers ?
Circa 1985-1993, Thinking Machines Connection Machine. Circa 1987-1996, MasPar MP series. The CM-1, 2, 2a, and 200 all were SIMD parallel using 1-bit serial integer-only CPUs. Sizes ranged from 8K CPUs at the low end to 64K CPUs at the high end. Each CPU had 4K *bits* of private RAM, and the CPUs were connected in a multidimensional hypercube network. The CM-2, 2a, and 200 were augmented with 32-bit FPUs (1 per 32 CPUs), and the 200 featured a higher clock speed. The MP-1 was SIMD parallel using 4-bit serial integer-only CPUs in sizes from 1K to 16K CPUs. It also had 32-bit FPUs, but I don't remember how many / what ratio. I remember that it had an accumulator register rather than going memory->memory like the CM. [I can't find much information now about the MP-1 ... unfortunately MasPar didn't last very long in the marketplace. The Wikipedia article has some information about the MP-2, but the MP-2 was a later full 32-bit design, very different from the MP-1.] My college had both an 8K CM-2 and a 1K MP-1, accessible to those who took various parallel processing electives. I never got to use the MP-1 much - it was new at the end of my time and I only ever played with it a bit. But I spent 2 semesters working with the CM-2. Even though the CM's clock speed was only ~8MHz, the performance was amazing IF the problem was a good fit to the architecture. E.g., at that time, I owned a 66MHz (dx2) i486. Converted for the CM-2 architecture, O(n^4) array processing on the i486 became O(n) on the CM-2. I had a physics simulation that took over 3 hours on my i486 that ran in ~10 minutes on the CM. George
Am 14.10.2018 um 11:55 schrieb Theo:
> Tim <cpldcpu+usenet@gmail.com> wrote: >> This is quite curious. I wonder >> >> - Has anyone actually received the devices they ordered? The cheaper >> variants seem to be sold out. > > I think they've sold out since they went viral. EEVblog did a video showing > 550 in stock - that's only $16 worth of parts, not hard to imagine they've > been bought up. > > The other option is they're some kind of EOL part and 3c is the 'reduced to > clear' price - which they have done, very successfully. > > Theo >
They're back in stock, though the price rose by 21% to 0.046$. Also, LCSC seems to now be stocking more Padauk parts, including more dual-core devices. Unfortunately, the programmer seems to be out of stock, and they have neither the flash nor the DIP variants. Philipp
Am 12.10.2018 um 09:44 schrieb David Brown:
> On 12/10/18 08:50, Philipp Klaus Krause wrote: >> Am 12.10.2018 um 01:08 schrieb Paul Rubin: >>> upsidedown@downunder.com writes: >>>> There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers. >>> >>> Being able to (say) add register to register saves traffic through the >>> accumulator and therefore instructions. >>> >>>> 1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>> assembly program listing. >>> >>> It would be nice to have a C compiler, and registers help with that. >>> >> >> Looking at the instruction set, it should be possible to make a backend >> for this in SDCC; the architecture looks more C-friendly than the >> existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >> or z80. >> reentrant functions will be inefficent: No registers, and no sp-relative >> adressing mode. On would want to reserve a few memory locations as >> pseudo-registers to help with that, but that only goes so far. >> > > It looks like the lowest 16 memory addresses could be considered > pseudo-registers - they are the ones that can be used for direct memory > access rather than needing indirect access. >
Considering the multi-core variants of the Padauk &micro;Cs: Those adresses are shared across all cores. Each core only has its own A, SP, F, PC. How do we handle local variables? Option 1: Make functions non-reentrant. Requires duplication of code (we need per-thread copies of functions), and link-time analysis to ensure that each thread only calls the function implementation meant for it. Functions pointers get complicated. Option 2: Use an inefficient combination of thread-local storage and stack. Since this is a small &micro;C, we need a lot of support functions, which the compiler inserts (e.g. for multiplication); of course those are affected by the same problems. Philipp
Am 12.10.18 um 20:39 schrieb upsidedown@downunder.com:
> On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de> > wrote: > >> Am 10.10.2018 um 03:05 schrieb Clifford Heath: >>> <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf> >>> >>> >>> OTP, no SPI, UART or I&sup2;C, but still... >>> >>> Clifford Heath >> >> They even make dual-core variants (the part where the first digit in the >> part number is '2'). It seems program counter, stack pointer, flag >> register and accumulator are per-core, while the rest, including the ALU >> is shared. In particular, the I/O registers are also shared, which means >> some multiplier registers would also be - but currently all variants >> with integrated multiplier are single-core. >> Use of the ALU is shared byt he two cores, alternating by clock cycle. >> >> Philipp > > > Interesting, that would make it easy to run a multitasking RTOS > (foreground/background) monitor, which might justify the use of some > reentrant library routines :-). But in reality, the available memory > (ROM/RAM) is so small so that you could easily manage this with static > memory allocations. > >
But static memory allocation would require one copy of each function per thread. And the linker would have to analyze the call graph to always call the correct function for each thread. Function pointers get complicated. Unfortunately, reentrancy becomes even harder with hardware-multithreading: TO access the stack, one has to construct a pointer to the stack location in a memory location. That memory location (as any pseudo-registers) is then shared among all running instances of the function. So it needs to be protected (e.g. with a spinlock), making access even more inefficient. And that spinlock will cause issues with interrupts (a solution might be to heavily restrict interrupt routines, essentially allowing not much more than setting some global variables). The there is the trade-off of using one such memory location per function vs. per program (the latter reducing memroy usage, but resulting in less paralellism). The pseudo-registers one would want to use are not so much a problem for interrupt routines (they would just need saving and thus increase interrupt overhead a bit), but for hardware parallelism. Essentially all access to them would again have to be protected by a spinlock. All these problems could have relatively easily been avoided by providing an efficient stack-pointer-relative addressing mode. Having a few general-purpose or index registers would have somewhat helped as well. Philipp
On 8.11.18 14:53, Philipp Klaus Krause wrote:
> Am 12.10.18 um 20:39 schrieb upsidedown@downunder.com: >> On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de> >> wrote: >> >>> Am 10.10.2018 um 03:05 schrieb Clifford Heath: >>>> <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf> >>>> >>>> >>>> OTP, no SPI, UART or I&sup2;C, but still... >>>> >>>> Clifford Heath >>> >>> They even make dual-core variants (the part where the first digit in the >>> part number is '2'). It seems program counter, stack pointer, flag >>> register and accumulator are per-core, while the rest, including the ALU >>> is shared. In particular, the I/O registers are also shared, which means >>> some multiplier registers would also be - but currently all variants >>> with integrated multiplier are single-core. >>> Use of the ALU is shared byt he two cores, alternating by clock cycle. >>> >>> Philipp >> >> >> Interesting, that would make it easy to run a multitasking RTOS >> (foreground/background) monitor, which might justify the use of some >> reentrant library routines :-). But in reality, the available memory >> (ROM/RAM) is so small so that you could easily manage this with static >> memory allocations. >> >> > > But static memory allocation would require one copy of each function per > thread. And the linker would have to analyze the call graph to always > call the correct function for each thread. Function pointers get > complicated. > > Unfortunately, reentrancy becomes even harder with > hardware-multithreading: TO access the stack, one has to construct a > pointer to the stack location in a memory location. That memory location > (as any pseudo-registers) is then shared among all running instances of > the function. So it needs to be protected (e.g. with a spinlock), making > access even more inefficient. And that spinlock will cause issues with > interrupts (a solution might be to heavily restrict interrupt routines, > essentially allowing not much more than setting some global variables). > > The there is the trade-off of using one such memory location per > function vs. per program (the latter reducing memroy usage, but > resulting in less paralellism). > > The pseudo-registers one would want to use are not so much a problem for > interrupt routines (they would just need saving and thus increase > interrupt overhead a bit), but for hardware parallelism. Essentially all > access to them would again have to be protected by a spinlock. > > All these problems could have relatively easily been avoided by > providing an efficient stack-pointer-relative addressing mode. Having a > few general-purpose or index registers would have somewhat helped as well. > > Philipp
And you'll end up with a low-end Cortex ... -- -TV
Am 08.11.18 um 14:08 schrieb Tauno Voipio:
> > > And you'll end up with a low-end Cortex ... >
A low-end Cortex would still be far heavier than a Padauk variant with an sp-relative adressing mode or a few registers added. I think a more multithreading-friendly variant of the Padauk would even still be simpler than an STM8. But one could surely create a nice STM8-like (with a few STM8 weaknesses fixed) processor with hardware multihreading. Philipp
On Thu, 8 Nov 2018 13:53:48 +0100, Philipp Klaus Krause <pkk@spth.de>
wrote:

>Am 12.10.18 um 20:39 schrieb upsidedown@downunder.com: >> On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de> >> wrote: >> >>> Am 10.10.2018 um 03:05 schrieb Clifford Heath: >>>> <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf> >>>> >>>> >>>> OTP, no SPI, UART or I&#4294967295;C, but still... >>>> >>>> Clifford Heath >>> >>> They even make dual-core variants (the part where the first digit in the >>> part number is '2'). It seems program counter, stack pointer, flag >>> register and accumulator are per-core, while the rest, including the ALU >>> is shared. In particular, the I/O registers are also shared, which means >>> some multiplier registers would also be - but currently all variants >>> with integrated multiplier are single-core. >>> Use of the ALU is shared byt he two cores, alternating by clock cycle. >>> >>> Philipp >> >> >> Interesting, that would make it easy to run a multitasking RTOS >> (foreground/background) monitor, which might justify the use of some >> reentrant library routines :-). But in reality, the available memory >> (ROM/RAM) is so small so that you could easily manage this with static >> memory allocations. >> >> > >But static memory allocation would require one copy of each function per >thread.
For a foreground/background monitor, the worst case would be two copies of static data, if both threads use the same rubroutine.
>And the linker would have to analyze the call graph to always >call the correct function for each thread.
Linker for such small target ? With such small processor, just track any dependencies manually.
>Function pointers get complicated.
Do you really insist of using function pointer with such small targets?
> >Unfortunately, reentrancy becomes even harder with >hardware-multithreading:
With two hardware threads, you would need at most two copies of static data.
>TO access the stack, one has to construct a >pointer to the stack location in a memory location.
Why would you want to access the stack ? The stack is usable for handling return addresses, but I guess that a hardware thread must have its own return address stack pointer. In fact many minicomputers from the 1960's did not even have a stack at all. The calling program just stored the return address in the first word of the subroutine and the at the end o the subroutine, performed an indirect jump through the first word of the subroutine to return to the calling program. Of course, this is not re-entrant and in those days one did not have to worry about multiple CPUs accessing the same routines:-). BTW, who needs a program counter (PC), many microprograms run without a PC, with the next instruction address stored at the end of the long instruction word :-)
>That memory location >(as any pseudo-registers) is then shared among all running instances of >the function. So it needs to be protected (e.g. with a spinlock), making >access even more inefficient. And that spinlock will cause issues with >interrupts (a solution might be to heavily restrict interrupt routines, >essentially allowing not much more than setting some global variables).
Disabling all interrupts for the duration of some critical operations is often enough, but of course, the number of instructions executed during interrupt disabled should be minimized. In MACRO-11 assembler, the standard practice was to start the comment field with a semicolon, when task switching was disabled with two semicolons and when interrupt disabled with three semicolons, it was visually easy to detect when interrupts were disabled and not mess too much with such code sections.
> >The there is the trade-off of using one such memory location per >function vs. per program (the latter reducing memroy usage, but >resulting in less paralellism). > >The pseudo-registers one would want to use are not so much a problem for >interrupt routines (they would just need saving and thus increase >interrupt overhead a bit), but for hardware parallelism. Essentially all >access to them would again have to be protected by a spinlock. > >All these problems could have relatively easily been avoided by >providing an efficient stack-pointer-relative addressing mode. Having a >few general-purpose or index registers would have somewhat helped as well. > >Philipp
Am 08.11.18 um 20:52 schrieb upsidedown@downunder.com:
>> >> But static memory allocation would require one copy of each function per >> thread. > > For a foreground/background monitor, the worst case would be two > copies of static data, if both threads use the same rubroutine. > >> And the linker would have to analyze the call graph to always >> call the correct function for each thread. > > Linker for such small target ?
Of course. The support routines the compiler uses reside in some library, the linker links them in if necessary. Also, the larger variants are not that small, with up to 256 B of RAM and 8 KB of ROM. One might want to e.g. have one .c file for handling I&sup2;", one for the soft UART, etc.
> > With such small processor, just track any dependencies manually.
See above.
> >> Function pointers get complicated. > > Do you really insist of using function pointer with such small > targets? >
I want to have C, function pointers are part of it.
>> >> Unfortunately, reentrancy becomes even harder with >> hardware-multithreading: > > With two hardware threads, you would need at most two copies of static > data.
Padauk still makes one chip with 8 hardware threads (and it looks to me as if there were more in the past, though they are not currently listed on their website, one can find them e.g. in their IDE).
> >> TO access the stack, one has to construct a >> pointer to the stack location in a memory location. > > Why would you want to access the stack ?
For reentrency, so I can use one function implementation for all threads. It would also be useful to dynamically assign threads to hardware threads (so no thread is tied to specific hardware, and some OS schedules them).
> > The stack is usable for handling return addresses, but I guess that a > hardware thread must have its own return address stack pointer.
Each hardware thread has its flag register (4 bits) accumulator (8 bits), pc (12 bits) and stack pointer (8 bits).
> >> That memory location >> (as any pseudo-registers) is then shared among all running instances of >> the function. So it needs to be protected (e.g. with a spinlock), making >> access even more inefficient. And that spinlock will cause issues with >> interrupts (a solution might be to heavily restrict interrupt routines, >> essentially allowing not much more than setting some global variables). > > Disabling all interrupts for the duration of some critical operations > is often enough, but of course, the number of instructions executed > during interrupt disabled should be minimized.
Disabling interrupts any time a spinlock is held or a thread is wating for one might be too much, especially if there are many threads, so the spinlock is held often. Philipp
The 2026 Embedded Online Conference