Integrated TFT controller in PIC MCUs| page 7

Reply by Simon Clubley ●January 11, 20152015-01-11

On 2015-01-10, Anders.Montonen@kapsi.spam.stop.fi.invalid <Anders.Montonen@kapsi.spam.stop.fi.invalid> wrote:
>
> As far as I can tell from the header files and compiler source code, the 
> PIC32MM could be a replacement/follow-up for the PIC32MX1xx/2xx. There's 
> no DSP ASE, and no shadow registers, so it's clearly not a high- 
> performance chip, and it doesn't seem like it has any special 
> peripherals either. Using microMIPS at the low end makes sense, as you 
> can fit more code in a smaller flash. I don't know how much silicon area 
> is saved by having only the one instruction set, but that kind of makes 
> sense for a low-end chip as well.
>

In that case, is there any information yet about whether the PIC32MM will
still have PDIP variants and what the expected capabilities are for
the PIC32MM ?

(Simon may have just found himself a new toy to look forward to. :-))

Thanks,

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by Simon Clubley ●January 11, 20152015-01-11

On 2015-01-11, Simon Clubley <clubley@remove_me.eisner.decus.org-Earth.UFP> wrote:
> On 2015-01-10, Dimiter_Popoff <dp@tgi-sci.com> wrote:
>>
>> http://tgi-sci.com/misc/hvst0q.gif
>>
>> This is not VPA, just plain 68k (well, CF) assembly, so it is
                               ^^^
>> far from being taken from my own world.
>>

[snip]
>
>> I just wonder how hopeless things must have become to question
>> the viability of doing something that basic.
>>
>
> Different set of tradeoffs. The higher level language code can
> potentially be reused on multiple architectures (or if that's not
> possible in a specific case, can at least used as the starting point
> for another driver); your PowerPC specific assembly language code cannot
> be reused in such a way.
>

Re-reading the above shows that this example is 68k, not the PowerPC
architecture we were talking about. However, the same comments still
apply.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by Dimiter_Popoff ●January 11, 20152015-01-11

On 11.1.2015 &#1075;. 13:16, Simon Clubley wrote:
> On 2015-01-10, Dimiter_Popoff <dp@tgi-sci.com> wrote:
>>
>> I am still somewhat amazed at what was said.
>
> That just shows the disconnect between what you do and what the rest
> of the world does. :-)

I suppose so. Explains much of why I am that more efficient than
that rest of the world, too, I suppose.

>> What on earth is there to stop people from doing something
>> *that* simple - here are two IRQ handlers on a small MCU, I used
>> an mcf52211 a couple of months back to make a HV source - it
>> does the PWM/regulating, overcurrent protection/limiting, serial
>> communication etc., all in all 4 tasks, several IRQs. Took me
>> about 2 weeks to program (I had hoped it would take 2 days but
>> I had completely forgotten the insides of the 52211 so I had to
>> recall a lot which is where the 2 weeks went). A total of about
>> 250k sources, the object code being almost 9 kilobytes.
>>
>> So here are two IRQ handlers where hopefully it is obvious how
>> only what is needed is saved and restored, pretty basic stuff:
>>
>> http://tgi-sci.com/misc/hvst0q.gif
>>
>> This is not VPA, just plain 68k (well, CF) assembly, so it is
>> far from being taken from my own world.
>>
>
> While the actual wrapper around the device specific interrupt handler
> (ie: the generic IRQ code which runs before you get to the device
> specific handler itself) is generally still assembly language, most
> people don't write the actual device specific handler in assembly
> language any more, but use a higher level language such as C instead.

Writing an IRQ handler in C is outright poor programming but OK,
let us assume someone is just not qualified to use assembly to
do it properly (though I can hardly see how someone qualified
to write a whole project in C will find it difficult to write
a few - or a few tens of - lines in assembly).

> Once you do that, the IRQ wrapper needs to save all the registers the
> C compiler could potentially use, including all the temporary registers,
> before the wrapper calls the device specific handler.

This means just poor compiler work. Since the compiler knows what
registers it will use there is no problem communicating that up/down
the line such that only those used to be saved. If all compilers are
that stupid well, their users get what they deserve.

>> I just wonder how hopeless things must have become to question
>> the viability of doing something that basic.
>>
> Different set of tradeoffs. The higher level language code can
> potentially be reused on multiple architectures (or if that's not
> possible in a specific case, can at least used as the starting point
> for another driver);

OK, I have yet to see that really work for someone without major
rework but let us assume the cliche is correct.

> your PowerPC specific assembly language code cannot
> be reused in such a way.

Not at all, VPA stands for "virtual processor assembly". The sources
can be compiled for any architecture (using a compiler for it,
obviously). I have used two so far - 68k and power (much of the DPS
code prior to 2000 or so is written for 68k (CPU32) and compiled
for power nowadays). I also did something close to it for a TI
DSP, the 5420, but it was not completely VPA, would have been
impractical, many of the 54xx registers are too specific and
are used all the time - yet bears a huge resemblance.

> What works for you in your restricted environment doesn't work when
> you need your code to work in a generic environment across a wide range
> of architectures.

You simply don't know what you are talking about here.

Dimiter

------------------------------------------------------
Dimiter Popoff, TGI             http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/

Reply by Simon Clubley ●January 11, 20152015-01-11

On 2015-01-11, Dimiter_Popoff <dp@tgi-sci.com> wrote:
> On 11.1.2015 &#1075;. 13:16, Simon Clubley wrote:
>> Different set of tradeoffs. The higher level language code can
>> potentially be reused on multiple architectures (or if that's not
>> possible in a specific case, can at least used as the starting point
>> for another driver);
>
> OK, I have yet to see that really work for someone without major
> rework but let us assume the cliche is correct.
>

We are obviously not going to agree on many things, so just an
observation:

Have a look at the Linux source code and see how much common code
there is between various devices (such as USB host controllers)
which are used on a _wide_ range of architectures.

The Linux kernel is very nicely modular and reusable in that regard.

Now imagine having to write assembly language modules for each
architecture those drivers are going to be used on instead of just
writing it once in a higher level language and letting the toolchain
generate the architecture specific code for each target for those drivers.

>> your PowerPC specific assembly language code cannot
>> be reused in such a way.
>
> Not at all, VPA stands for "virtual processor assembly". The sources
> can be compiled for any architecture (using a compiler for it,
> obviously). I have used two so far - 68k and power (much of the DPS
> code prior to 2000 or so is written for 68k (CPU32) and compiled
> for power nowadays). I also did something close to it for a TI
> DSP, the 5420, but it was not completely VPA, would have been
> impractical, many of the 54xx registers are too specific and
> are used all the time - yet bears a huge resemblance.
>
>> What works for you in your restricted environment doesn't work when
>> you need your code to work in a generic environment across a wide range
>> of architectures.
>
> You simply don't know what you are talking about here.
>

It sounds like your VPA is some kind of assembly language templating
infrastructure.

What does the syntax look like and do you have a language reference
manual which can be downloaded ?

What level of abstraction is possible ?

For example, can you implement high level language control structures
or is it just some pseudo assembly language ?

Can you implement abstract data types in VPA ?

How would something like the following (untested!) look in VPA ?

=========================================================================
struct sample_t
	{
	unsigned long int var1;
	unsigned char char1;
	};

struct sample_t search;

#define SAMPLE_SIZE	100
struct sample_t samples[SAMPLE_SIZE];

int main(int argc, char *argv[])
	{
	unsigned long int i;

{code to populate samples and search would be placed here}

	for(i=0; i<SAMPLE_SIZE; i++)
		{
		if((search.var1 == samples[i].var1) &&
			(search.char1 == samples[i].char1))
			{
			printf("Sample found at position %lu\n", i);
			}
		}
	}
=========================================================================

_If_ it is essentially a high level language but just with assembly
language syntax, then I am interested in knowing what factors caused
you to design and implement VPA instead of just using a higher level
language such as C or a Wirth style language (or designing your own)
and then having most of the architecture specific assembly code
generated by the toolchain itself.

Many people want their own OS environment (for various valid reasons)
that they have complete control over, but they generally either design
their own high level architecture neutral language (such as Oberon) or
use a language such as C/C++ or another existing language.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by David Brown ●January 11, 20152015-01-11

On 11/01/15 12:16, Simon Clubley wrote:
> On 2015-01-10, Dimiter_Popoff <dp@tgi-sci.com> wrote:
>>
>> I am still somewhat amazed at what was said.
>
> That just shows the disconnect between what you do and what the rest
> of the world does. :-)
>
>> What on earth is there to stop people from doing something
>> *that* simple - here are two IRQ handlers on a small MCU, I used
>> an mcf52211 a couple of months back to make a HV source - it
>> does the PWM/regulating, overcurrent protection/limiting, serial
>> communication etc., all in all 4 tasks, several IRQs. Took me
>> about 2 weeks to program (I had hoped it would take 2 days but
>> I had completely forgotten the insides of the 52211 so I had to
>> recall a lot which is where the 2 weeks went). A total of about
>> 250k sources, the object code being almost 9 kilobytes.
>>
>> So here are two IRQ handlers where hopefully it is obvious how
>> only what is needed is saved and restored, pretty basic stuff:
>>
>> http://tgi-sci.com/misc/hvst0q.gif
>>
>> This is not VPA, just plain 68k (well, CF) assembly, so it is
>> far from being taken from my own world.
>>
>
> While the actual wrapper around the device specific interrupt handler
> (ie: the generic IRQ code which runs before you get to the device
> specific handler itself) is generally still assembly language, most
> people don't write the actual device specific handler in assembly
> language any more, but use a higher level language such as C instead.
>
> Once you do that, the IRQ wrapper needs to save all the registers the
> C compiler could potentially use, including all the temporary registers,
> before the wrapper calls the device specific handler.
>

I think there is a serious misunderstanding going on here.

A compiler with good support for interrupts on the given target will 
/not/ save or restore more registers than it has to, when it knows all 
about the code in use.  When it does not know everything about the code, 
it must save and restore registers according to assumptions about their 
usage - possibly saving and restoring /everything/, if it cannot make 
reasonable assumptions.  And this save and restore /will/ take more time 
and space if you have more registers - but the relevance and 
significance of that will depend on the circumstances.

/Exactly the same applies in C and assembly./

If you are writing an interrupt function that calls other code, you will 
have to assume that it may change any "volatile" registers - and thus 
you will have to save and restore them around the function call.  That 
applies in C and assembly - but in assembly you might have your own 
non-standard ABI that affects which registers are "volatile".  In 
assembly, you may also know exactly which registers the called function 
uses, and use that in optimisation.  In C, the compiler may also have 
this knowledge (if the definition of the called function is in the same 
module, or you are using link-time optimisation), and it can take 
advantage of it.

If you are writing an interrupt function that is self-contained, then 
either an assembly programmer or a C compiler will only save and restore 
registers that are needed by the function.

It is perfectly reasonable to expect a C compiler to generate interrupt 
code of the type posted by Dimiter.  Equally, it is perfectly reasonable 
to expect an assembly programmer to write interrupt code that stacks a 
range of registers if it has to call arbitrary external code.

Reply by ●January 11, 20152015-01-11

Simon Clubley <clubley@remove_me.eisner.decus.org-earth.ufp> wrote:
> On 2015-01-10, Anders.Montonen@kapsi.spam.stop.fi.invalid <Anders.Montonen@kapsi.spam.stop.fi.invalid> wrote:
>>
>> As far as I can tell from the header files and compiler source code, the 
>> PIC32MM could be a replacement/follow-up for the PIC32MX1xx/2xx. There's 
>> no DSP ASE, and no shadow registers, so it's clearly not a high- 
>> performance chip, and it doesn't seem like it has any special 
>> peripherals either. Using microMIPS at the low end makes sense, as you 
>> can fit more code in a smaller flash. I don't know how much silicon area 
>> is saved by having only the one instruction set, but that kind of makes 
>> sense for a low-end chip as well.
> In that case, is there any information yet about whether the PIC32MM will
> still have PDIP variants and what the expected capabilities are for
> the PIC32MM ?

I haven't seen any info that would reveal the packaging, but going by 
the compiler header files, the smallest chip (PIC32MM0016GPL020) has 
only 16 GPIOs, up to 28 in the largest (PIC32MM0064GPL036). Flash size 
ranges from 16 to 64K, RAM from 4 to 8K. Contrary to what I wrote above, 
in other support files two register sets are mentioned. No hints on 
performance, but there doesn't seem to be a prefetch cache so I would 
guess it will run at similar speeds to the 1xx/2xx.

All the SFR definitions are included in the processor header files in 
XC32 1.34, if you want to look at what's there.

-a

Reply by David Brown ●January 11, 20152015-01-11

On 11/01/15 08:10, Dimiter_Popoff wrote:
> On 10.1.2015 &#1075;. 20:07, David Brown wrote:
>> On 09/01/15 23:04, Dimiter_Popoff wrote:
>>> On 09.1.2015 &#1075;. 11:47, David Brown wrote:
>>>> On 09/01/15 00:22, Dimiter_Popoff wrote:
>>>>> On 09.1.2015 &#1075;. 00:53, Wouter van Ooijen wrote:
>>>>>> Dimiter_Popoff schreef op 08-Jan-15 om 11:18 PM:
>>>>>>> On 08.1.2015 &#1075;. 23:25, David Brown wrote:
>>>>>>>> ...  (Just as "cpus should
>>>>>>>> have more than 16 core registers" is a good reason for disliking
>>>>>>>> ARM's,
>>>>>>>> if that is your opinion.)
>>>>>>>
>>>>>>> Results of arithmetic calculations are not exactly what I would
>>>>>>> call an
>>>>>>> opinion.
>>>>>>> 16 registers - one of which being reserved for the PC - are too few
>>>>>>> for
>>>>>>> a load/store machine.
>>>>>>
>>>>>> 16 is a fact, but the rest is nothing but opinion.
>>>>>>
>>>>>>   > Clearly it will work but under equal conditions
>>>>>>> will be slower than if it had 32 registers, sometimes much slower.
>>>>>>
>>>>>> More registers can be slower too.
>>>>>
>>>>> Yes, 32 is about the optimum I suppose. But I have not really analyzed
>>>>> that, what I have - and demonstrated by an example which you chose to
>>>>> ignore - is the comparison 32 vs. 16.
>>>>
>>>> Certainly it is possible to pick examples where 32 registers is more
>>>> effective than 16 - but equally we can pick examples where 16 registers
>>>> is more efficient (such as context switching, or code with a lot of
>>>> small functions).  Examples are illustrative, but not proof of a
>>>> general
>>>> rule.
>>>
>>> You have yet to prove this point. Context switching is not a valid
>>> example, as I explained in my former post which you must have read
>>> prior to replying to (for those who have not, context switching is
>>> responsible for a fraction of a percent of CPU time, consequently
>>> halving or even completely eliminating that can bring a fraction of
>>> a percent improvement, i.e. it is negligible).
>>
>> I think it would be wrong to talk about /proving/ points here
>  > ...
>
> OK, I really meant "make" your point. Though in technical terms making
> a point which cannot be proven is fairly pointless...

We are having a discussion and exchange of ideas - there is no need to 
prove anything unless someone feels there must be a "winner" here.

>
>> Context switching is not a valid example for /you/, based on the figures
>> /you/ gave.
>
> So you understand that these figures are correct - but imply that
> the example applies just to me. I thought we could agree at least
> on the meaning of numbers.

Your numbers apply to your example - different numbers apply to 
different situations.

>
>> I have written systems that had timer interrupts at 100,000
>> times per second
>
> Which has *nothing* to do with context switching, if you do that and
> save all registers instead of the minimum you have to you just don't
> know what you are doing.

The system I had with that rate of timer interrupt had to do quite a lot 
of work in each interrupt - most registers were saved, because most were 
used.

>
> I know from threads from years past that you tend to mix up
> task scheduling and interrupt processing but please understand
> that there is a world of a difference between interrupt processing
> and a task switch initiated by an interrupt.

I am well aware of the difference between task switching and interrupts. 
  But when you have a larger interrupt function (i.e., not a small one 
that can use a minimal number of registers), it is perfectly reasonable 
to call it a context switch - you switch from the current task context 
to the interrupt context, and then (depending on the type of system and 
the interrupt function) you switch back to the same task, or a different 
one.

Perhaps you think that you only need to save lots of registers during a 
task context switch?  Usually an interrupt (without a task switch) does 
not involve saving /all/ the registers, but it can still involve saving 
many of them.

>
> You have not made a point - context switching is *not* a case
> where 32 registers can be worse off than 16 in a non-negligible way
> (negligible meaning performance cost within say 0.1%, latency-wise
> same as 16 or better).
>
> You have yet to give a valid example for what you claim.
>
>> In general, in bigger systems (and PowerPC cores tend to be used in
>> bigger systems than most of the embedded cores we see here) you try to
>> avoid many interrupts, and prefer DMA and more sophisticated peripherals
>> to keep the interrupt rate low.
>
> This is wrong, it is not true that on larger systems interrupts must
> be avoided.

Large processors are optimised for steady throughput with few unexpected 
changes of flow - small processors in microcontrollers are optimised for 
faster and more consistent timings on changes.  There is a reason why 
some processors are made with long pipelines and multi-layer caches, 
while others are made with very short pipelines and no caches - or why 
there are devices made that combine a Cortex M3/M4 core alongside one or 
more Cortex A cores.

>
> You should understand that there is no such animal as "in general" in
> engineering. Things we make have to *work*, so we have to go down to
> the details.

I have written many hundreds of systems - I have no doubt you have also 
written vast numbers, as have others here.  I have used something like a 
dozen different processor architectures in embedded systems.  There is 
no way to give details of everything - so some generalisations are 
unavoidable.  But it is certainly the case that details vary wildly 
between systems - thus particular examples can be interesting, but do 
not necessarily show common behaviour.

(I've snipped the rest - not because it was not relevant or interesting, 
but because I don't have the time to give a full response, and I think 
we are going around in circles.  I also feel this discussion looks like 
we are arguing opposites, when in fact I agree with a fair number of 
your points - I am almost certainly expressing myself rather poorly, and 
don't want to continue doing so.)

Reply by Dimiter_Popoff ●January 11, 20152015-01-11

On 11.1.2015 &#1075;. 17:24, David Brown wrote:
> On 11/01/15 12:16, Simon Clubley wrote:
>> On 2015-01-10, Dimiter_Popoff <dp@tgi-sci.com> wrote:
>>>
>>> I am still somewhat amazed at what was said.
>>
>> That just shows the disconnect between what you do and what the rest
>> of the world does. :-)
>>
>>> What on earth is there to stop people from doing something
>>> *that* simple - here are two IRQ handlers on a small MCU, I used
>>> an mcf52211 a couple of months back to make a HV source - it
>>> does the PWM/regulating, overcurrent protection/limiting, serial
>>> communication etc., all in all 4 tasks, several IRQs. Took me
>>> about 2 weeks to program (I had hoped it would take 2 days but
>>> I had completely forgotten the insides of the 52211 so I had to
>>> recall a lot which is where the 2 weeks went). A total of about
>>> 250k sources, the object code being almost 9 kilobytes.
>>>
>>> So here are two IRQ handlers where hopefully it is obvious how
>>> only what is needed is saved and restored, pretty basic stuff:
>>>
>>> http://tgi-sci.com/misc/hvst0q.gif
>>>
>>> This is not VPA, just plain 68k (well, CF) assembly, so it is
>>> far from being taken from my own world.
>>>
>>
>> While the actual wrapper around the device specific interrupt handler
>> (ie: the generic IRQ code which runs before you get to the device
>> specific handler itself) is generally still assembly language, most
>> people don't write the actual device specific handler in assembly
>> language any more, but use a higher level language such as C instead.
>>
>> Once you do that, the IRQ wrapper needs to save all the registers the
>> C compiler could potentially use, including all the temporary registers,
>> before the wrapper calls the device specific handler.
>>
>
> I think there is a serious misunderstanding going on here.
>
> A compiler with good support for interrupts on the given target will
> /not/ save or restore more registers than it has to, when it knows all
> about the code in use. ...
 > ....

Thanks for clarifying that, David. Seemed like the obvious thing to
expect to me but being told the opposite I had begun to question my
state - am I in some dream or what.... :-)

Dimiter

Reply by Simon Clubley ●January 11, 20152015-01-11

On 2015-01-12, Dimiter_Popoff <dp@tgi-sci.com> wrote:
> On 11.1.2015 &#1075;. 17:24, David Brown wrote:
>>
>> I think there is a serious misunderstanding going on here.
>>
>> A compiler with good support for interrupts on the given target will
>> /not/ save or restore more registers than it has to, when it knows all
>> about the code in use. ...
> > ....
>
> Thanks for clarifying that, David. Seemed like the obvious thing to
> expect to me but being told the opposite I had begun to question my
> state - am I in some dream or what.... :-)
>

You need to read the rest of David's comment as well. David's talking
above about the specific case when a compiler knows what registers it
uses _and_ can generate _all_ the interrupt handling code as well.

I've been talking about the general case when you don't have that
knowledge or your initial IRQ interrupt handling code is part of a
general framework.

For example, you could have a hand-written assembly language IRQ
infrastructure which does things like priority nesting and then
dispatches to a device specific C language handler, the address of
which has been entered into a generic interrupt dispatch table during
startup.

There's no way that generic assembly language wrapper is going to know
what registers to save so it saves all the ones the C language handler
could potentially use and which can't be otherwise preserved by (say)
switching to some other execution mode to execute the C language handler.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by Simon Clubley ●January 11, 20152015-01-11

On 2015-01-11, Anders.Montonen@kapsi.spam.stop.fi.invalid <Anders.Montonen@kapsi.spam.stop.fi.invalid> wrote:
> Simon Clubley <clubley@remove_me.eisner.decus.org-earth.ufp> wrote:
>> In that case, is there any information yet about whether the PIC32MM will
>> still have PDIP variants and what the expected capabilities are for
>> the PIC32MM ?
>
> I haven't seen any info that would reveal the packaging, but going by 
> the compiler header files, the smallest chip (PIC32MM0016GPL020) has 
> only 16 GPIOs, up to 28 in the largest (PIC32MM0064GPL036). Flash size 
> ranges from 16 to 64K, RAM from 4 to 8K. Contrary to what I wrote above, 
> in other support files two register sets are mentioned. No hints on 
> performance, but there doesn't seem to be a prefetch cache so I would 
> guess it will run at similar speeds to the 1xx/2xx.
>

Thanks.

I was hoping they were going to go in the other direction for the
PIC32MM PDIP packages with more onboard memory resources and hence
continue the trend they have started with the recent PIC32MX PDIP
packages.

OTOH, if the above MCUs are priced at the (say) Cortex-M0/8-bit MCU
price points, along with (say) a PDIP 20 pin package, then I can see
myself using them in situations I wouldn't normally consider using
the PIC32MX for.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world