EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Integrated TFT controller in PIC MCUs

Started by pozz January 7, 2015
On Sat, 10 Jan 2015, David Brown wrote:

> On 09/01/15 16:30, Vladimir Ivanov wrote: >> >> On Fri, 9 Jan 2015, David Brown wrote: >> >>> On 09/01/15 10:54, Vladimir Ivanov wrote: >>>> >>>> On Fri, 9 Jan 2015, David Brown wrote: >>>> >>>>> For microcontrollers, such as the Cortex M devices, I think 16 >>>>> registers >>>>> is a good balance for a lot of typical code. >>>> >>>> In Thumb2 you work directly with 8 GP registers, indirectly with few >>>> like PC and SP, and accessing the rest of the GPRs is different >>>> and/or >>>> has penalties. >>> >>> As far as I understand it, accessing the other registers means 32-bit >>> instructions rather than the short 16-bit instructions. So accessing >>> them has penalties compared to accessing the faster registers, but >>> not >>> compared to normal ARM 32-bit instructions. >> >> Yes, longer code sequences, and most likely very limited instruction >> forms. The latter leads to shuffling of data between the regular 8 >> GPRs >> and the other, "unregular" GPRs. >> > > That was needed for Thumb, but not for Thumb2 - you simply use the > 32-bit instructions and have access to the same registers as you would > with 32-bit ARM codes. If you like, you can think of Thumb2 as being > mostly the same as 32-bit ARM (losing a little barrel shifter and > conditional execution capability) with the addition of 16-bit > "short-cuts" for the most commonly used instructions.
Did they make everything orthogonal and only a matter of instruction size? I have to recheck this, have forgotten most of it already.
>> What I am trying to communicate, is that the CPU core with all the >> blocks is there. Thumb2 is more or less a decoder, just like the ARM >> mode is. Same with MIPS32 and MIPS16e. Why would one cripple something >> by removing one of the decoders? The power savings are negligible. >> >> ARM7TDMI was more balanced in that regard. > > No, the original Thumb instruction set only gave access to some of the > cpu and let you write significantly slower but more compact code than > full ARM. That's why they had to keep the ARM decoder too - if you > needed fast code, you had to use the full instruction set. And no one > considered the mix of two instruction sets to be "balance" - polite > people called it a pain in the neck.
:-)
> Thumb2 lets you write code that is about 60% of the size of ARM code, > and is often /faster/ than 32-bit ARM code, since you can get almost all > of the functionality while being more efficient on your memory bandwidth > and caches.
Is this still valid for the big OoO/superscalar cores?
>>> With the original Thumb, ARM kept the normal 32-bit ARM ISA as well >>> because for some types of code it could be significantly faster. But >>> with Thumb2, there is almost no code for which the full 32-bit ARM >>> instructions would beat the Thumb2, taking into account the memory >>> bandwidth benefits of Thumb2. >> >> Any pointers to data showing this? Never heard of it so far, and does >> not reflect my experience. >> >> Why'd they include ARM mode at all in the Cortex-A series? :-) > > For backwards compatibility. In Cortex M applications, code is > generally compiled specifically for the target - so there is no need for > binary compatibility. But for Cortex A systems, you regularly have > pre-compiled code from many sources, and binary compatibility with older > devices is essential.
That's an interesting angle. Thanks for the comments, I will investigate some more.
On 16/01/15 20:30, Vladimir Ivanov wrote:
> > On Sat, 10 Jan 2015, David Brown wrote: > >> On 09/01/15 16:30, Vladimir Ivanov wrote: >>> >>> On Fri, 9 Jan 2015, David Brown wrote: >>> >>>> On 09/01/15 10:54, Vladimir Ivanov wrote: >>>>> >>>>> On Fri, 9 Jan 2015, David Brown wrote: >>>>> >>>>>> For microcontrollers, such as the Cortex M devices, I think 16 >>>>>> registers >>>>>> is a good balance for a lot of typical code. >>>>> >>>>> In Thumb2 you work directly with 8 GP registers, indirectly with few >>>>> like PC and SP, and accessing the rest of the GPRs is different and/or >>>>> has penalties. >>>> >>>> As far as I understand it, accessing the other registers means 32-bit >>>> instructions rather than the short 16-bit instructions. So accessing >>>> them has penalties compared to accessing the faster registers, but not >>>> compared to normal ARM 32-bit instructions. >>> >>> Yes, longer code sequences, and most likely very limited instruction >>> forms. The latter leads to shuffling of data between the regular 8 GPRs >>> and the other, "unregular" GPRs. >>> >> >> That was needed for Thumb, but not for Thumb2 - you simply use the >> 32-bit instructions and have access to the same registers as you would >> with 32-bit ARM codes. If you like, you can think of Thumb2 as being >> mostly the same as 32-bit ARM (losing a little barrel shifter and >> conditional execution capability) with the addition of 16-bit >> "short-cuts" for the most commonly used instructions. > > Did they make everything orthogonal and only a matter of instruction > size? I have to recheck this, have forgotten most of it already.
No, it is not entirely orthogonal. In particular, common combinations will have 16-bit Thumb2 instructions, while less common combinations will have 32-bit Thumb2 instructions. For example, in the ARM, like in most RISC architectures, there is not a dedicated stack register - you simply use one of the general registers along with appropriate post and pre increment and decrement addressing modes. But by convention, and codified in the ABI, one of the registers (r13 on the ARM, IIRC) is always used as the stack pointer. Instructions using r13 for these sorts of addressing modes will be common in the 16-bit Thumb2 encodings, but use of the same modes with other registers probably needs 32-bit Thumb2 encodings. (I haven't confirmed the details of this with the ARM documentation, but the principle is accurate.) The same thing applies to similar shorted encodings on other processors.
> >>> What I am trying to communicate, is that the CPU core with all the >>> blocks is there. Thumb2 is more or less a decoder, just like the ARM >>> mode is. Same with MIPS32 and MIPS16e. Why would one cripple something >>> by removing one of the decoders? The power savings are negligible. >>> >>> ARM7TDMI was more balanced in that regard. >> >> No, the original Thumb instruction set only gave access to some of the >> cpu and let you write significantly slower but more compact code than >> full ARM. That's why they had to keep the ARM decoder too - if you >> needed fast code, you had to use the full instruction set. And no one >> considered the mix of two instruction sets to be "balance" - polite >> people called it a pain in the neck. > > :-) > >> Thumb2 lets you write code that is about 60% of the size of ARM code, >> and is often /faster/ than 32-bit ARM code, since you can get almost >> all of the functionality while being more efficient on your memory >> bandwidth and caches. > > Is this still valid for the big OoO/superscalar cores?
Yes. The actual balance between size ratios and speed ratios varies a little depending on the type of code being run, but I a gather that ARM and Thumb2 encodings are typically within a few percent of the same speed, while Thumb2 code size is between 60% and 80%. For bigger cpus, the processing speed exceeds the memory speed by a greater amount - they are likely to gain more overall speed due to Thumb2 than on smaller cpus.
> >>>> With the original Thumb, ARM kept the normal 32-bit ARM ISA as well >>>> because for some types of code it could be significantly faster. But >>>> with Thumb2, there is almost no code for which the full 32-bit ARM >>>> instructions would beat the Thumb2, taking into account the memory >>>> bandwidth benefits of Thumb2. >>> >>> Any pointers to data showing this? Never heard of it so far, and does >>> not reflect my experience. >>> >>> Why'd they include ARM mode at all in the Cortex-A series? :-) >> >> For backwards compatibility. In Cortex M applications, code is >> generally compiled specifically for the target - so there is no need >> for binary compatibility. But for Cortex A systems, you regularly >> have pre-compiled code from many sources, and binary compatibility >> with older devices is essential. > > That's an interesting angle. Thanks for the comments, I will investigate > some more.
There is also the case that there may be types of code that will be noticeably faster in ARM encoding than Thumb2. On a Cortex-A cpu, including the ARM decoder is a tiny percentage of die size, and can be powered-down when not in use - it is therefore cheap to add if people want to use it. For smaller devices like Cortex-M microcontrollers, the size of an ARM decoder (in addition to Thumb2) would be a much bigger percentage of the die size, and therefore a bigger fraction of the cost.
On 18.1.15 19:19, David Brown wrote:
>>> That was needed for Thumb, but not for Thumb2 - you simply use the >>> 32-bit instructions and have access to the same registers as you would >>> with 32-bit ARM codes. If you like, you can think of Thumb2 as being >>> mostly the same as 32-bit ARM (losing a little barrel shifter and >>> conditional execution capability) with the addition of 16-bit >>> "short-cuts" for the most commonly used instructions. >> >> Did they make everything orthogonal and only a matter of instruction >> size? I have to recheck this, have forgotten most of it already. > > No, it is not entirely orthogonal. In particular, common combinations > will have 16-bit Thumb2 instructions, while less common combinations > will have 32-bit Thumb2 instructions. For example, in the ARM, like in > most RISC architectures, there is not a dedicated stack register - you > simply use one of the general registers along with appropriate post and > pre increment and decrement addressing modes. But by convention, and > codified in the ABI, one of the registers (r13 on the ARM, IIRC) is > always used as the stack pointer. Instructions using r13 for these > sorts of addressing modes will be common in the 16-bit Thumb2 encodings, > but use of the same modes with other registers probably needs 32-bit > Thumb2 encodings. (I haven't confirmed the details of this with the ARM > documentation, but the principle is accurate.) The same thing applies > to similar shorted encodings on other processors.
There is more to this: In Cortex M3 and M4, the hardware uses r13 as stack pointer to save the processor state in exception handling. The register can be mirrored to use separate stacks for thread code and exception (and system) code. -- -TV
The 2026 Embedded Online Conference