EmbeddedRelated.com
Forums
Memfault Beyond the Launch

DSP like MCUs, or MCU like DSPs?

Started by Rick C December 21, 2022
I don't recall the TI designator, but they make some DSP parts that have peripherals like MCUs.  I know that some time back, ARM made a push into DSP territory by adding some DSPish instructions to I believe it was the CM3 devices, or maybe CM4.  

Anyone here use these crossover devices?  What sort of apps?  Why did you pick that device over others? 

-- 

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209
On 21/12/2022 18:30, Rick C wrote:
> I don't recall the TI designator, but they make some DSP parts that > have peripherals like MCUs. I know that some time back, ARM made a > push into DSP territory by adding some DSPish instructions to I > believe it was the CM3 devices, or maybe CM4. > > Anyone here use these crossover devices? What sort of apps? Why did > you pick that device over others? >
You are maybe thinking of the TMS320F family of DSP/MCU's from TI. These have a traditional DSP-style processor core - 16-bit "char" (no 8-bit byte access at all), gruesome assembly where each instruction does several different things in a single cycle, multiple memory buses for simultaneous accesses, hardware support for cyclic buffers, FFT twiddling, etc. It lets you make very efficient DSP-style algorithms but is a pain for more microcontroller-style control code. The chips have typical microcontroller-style peripherals such as timers, UARTs, CAN controllers, etc. So they are a hybrid. They are popular for high-temperature electronics, as they are one of the few families of microcontrollers that are available for 175 °C and above. These days, true DSP's are much less common. On the one side, once FPGA's started having multiplier blocks they could outcompete DSP's in parallel and pipelined MAC-based algorithms, and have much more flexibility for memory and operand organisation. On the other side, microcontrollers and processors gained single-cycle MAC instructions and SIMD instructions, giving them similar performance to DSP's for many algorithms while being far easier to use in other situations. True DSP's are now usually found only in very specialised systems, or so deeply embedded that you never see their programmability (i.e., you buy a "video converter" chip and don't care how its insides work). The Cortex-M4 is basically a Cortex-M3 with DSP instructions added - MACs in various formats, saturating arithmetic, and 8-bit and 16-bit SIMD instructions (within 32-bit registers). They don't have all the features of DSP's, but they have enough to make common DSP algorithms quite efficient, and ARM provides optimised libraries. The latest Cortex-M55 core has additional vector/SIMD instructions, but I don't know if any microcontrollers are available yet. As for anyone using them, I think you'll have a very hard job finding anyone who does embedded development with microcontrollers that has /not/ used Cortex-M4 devices. They are everywhere. And as for why I pick a given device for a given project, it will depend entirely on the project - as well as other projects I have done and other projects other colleagues have done. There are thousands of Cortex-M4 devices available, not including variations of memory sizes, chip packages, or speeds. The common reasons are the same as for any other type of chip - price, support, familiarity, peripherals, package, etc. The biggest reason for any choice these days, however, is availability - many designs start off by asking what microcontrollers our suppliers have in stock with the given minimum requirements, because we rarely have time to wait for 52 week lead times.
On 2022-12-22, David Brown <david.brown@hesbynett.no> wrote:

> You are maybe thinking of the TMS320F family of DSP/MCU's from TI. > These have a traditional DSP-style processor core - 16-bit "char" > (no 8-bit byte access at all), gruesome assembly where each > instruction does several different things in a single cycle, > multiple memory buses for simultaneous accesses, hardware support > for cyclic buffers, FFT twiddling, etc.
IIRC, branches were also delayed. The later 320's (C30/C40 and on) were all 32-bit (in C: char, int, long int, float, double were all "one byte" which contained 32-bits). And the floating point format wasn't IEEE. That combination made supporting byte-oriented serial protocols that used IEEE FP extra fun. The dev tools from TI were a but clunky, but worked OK and were available for Solaris (including the in-circuit emulators). But, compared to what else was available 20+ years ago, they were damn fast (especially for the price). -- Grant
On 12/22/2022 15:36, David Brown wrote:
> ... > The Cortex-M4 is basically a Cortex-M3 with DSP instructions added - > MACs in various formats, saturating arithmetic, and 8-bit and 16-bit > SIMD instructions (within 32-bit registers). ...
> ... Just a word of caution for Rick re this portion. Make sure that a 32 bit accumulator will be enough for what you are doing; it can easily fall short in many cases. "Normal" DSPs have 40 or so bits for this reason; or, you can pick some processor with 64 bit FPU MAC ability, 32 bit FPU will fall a lot shorter even than the 32 bit integer regs David is mentioning. David said it all, I am just cautioning because this is the kind of "oh shit" factor which comes at the end of the project (a friend once told me of that "oh shit", you either say it at the beginning or at the end :).
On Thursday, December 22, 2022 at 12:46:00 PM UTC-5, Dimiter wrote:
> On 12/22/2022 15:36, David Brown wrote: > > ... > > The Cortex-M4 is basically a Cortex-M3 with DSP instructions added - > > MACs in various formats, saturating arithmetic, and 8-bit and 16-bit > > SIMD instructions (within 32-bit registers). ... > > ... > > Just a word of caution for Rick re this portion. > Make sure that a 32 bit accumulator will be enough for what you are > doing; it can easily fall short in many cases. "Normal" DSPs have > 40 or so bits for this reason; or, you can pick some processor with > 64 bit FPU MAC ability, 32 bit FPU will fall a lot shorter even than > the 32 bit integer regs David is mentioning. > David said it all, I am just cautioning because this is the kind of > "oh shit" factor which comes at the end of the project (a friend once > told me of that "oh shit", you either say it at the beginning or at > the end :).
I'm not selecting a DSP part. I typically use FPGAs for what I do. Not because they are required for speed, but because they work well and have complete flexibility. I used a $10 FPGA in a product I designed in 2008 and have to refresh the design for a couple of parts that are not made anymore. The new design will still use an FPGA. If I need an MCU in the design, it will be a custom design in the FPGA. I have one I've been pushing around in my head that would have one CPU, pipelined to work like 8 CPUs. Interrupt response of 1 clock cycle and no need to save registers, because all context is switched with the interrupt. ~600 LUTs for 8 processors running at 20 MIPS each. Not bad. I was just curious about what people have used for DSP applications, but in particular if anyone had used one of the "crossover" parts. So far, the answer has been "no". -- Rick C. + Get 1,000 miles of free Supercharging + Tesla referral code - https://ts.la/richard11209
On 12/22/2022 21:57, Rick C wrote:
> On Thursday, December 22, 2022 at 12:46:00 PM UTC-5, Dimiter wrote: >> On 12/22/2022 15:36, David Brown wrote: >>> ... >>> The Cortex-M4 is basically a Cortex-M3 with DSP instructions added - >>> MACs in various formats, saturating arithmetic, and 8-bit and 16-bit >>> SIMD instructions (within 32-bit registers). ... >>> ... >> >> Just a word of caution for Rick re this portion. >> Make sure that a 32 bit accumulator will be enough for what you are >> doing; it can easily fall short in many cases. "Normal" DSPs have >> 40 or so bits for this reason; or, you can pick some processor with >> 64 bit FPU MAC ability, 32 bit FPU will fall a lot shorter even than >> the 32 bit integer regs David is mentioning. >> David said it all, I am just cautioning because this is the kind of >> "oh shit" factor which comes at the end of the project (a friend once >> told me of that "oh shit", you either say it at the beginning or at >> the end :). > > I'm not selecting a DSP part. I typically use FPGAs for what I do. Not because they are required for speed, but because they work well and have complete flexibility. I used a $10 FPGA in a product I designed in 2008 and have to refresh the design for a couple of parts that are not made anymore. The new design will still use an FPGA. If I need an MCU in the design, it will be a custom design in the FPGA. I have one I've been pushing around in my head that would have one CPU, pipelined to work like 8 CPUs. Interrupt response of 1 clock cycle and no need to save registers, because all context is switched with the interrupt. ~600 LUTs for 8 processors running at 20 MIPS each. Not bad. > > I was just curious about what people have used for DSP applications, but in particular if anyone had used one of the "crossover" parts. So far, the answer has been "no".
I have used a "real" DSP just once, 20+ years ago. The TI 5420, I did our first DSP based MCA module back then. The 5420 had two cores clocked at 100 MHz, some dual access RAM (meaning an address can be accessed twice in one clock cycle) and multiple serial ADC interfaces, *very* flexible ones, allowed me to serially push an (almost) 10Msps 16 bit wide stream sequentially using 3 of these (one had just 1/3 the seed I needed). A CPLD was doing the serialization, the 3 streams were getting into the DSP memory in a large FIFO, in the correct sequence, all this could be just programmed into their serial interfaces. Then one core had just one job, to detect an event and pass it to the other core which would do the filtering etc. processing, there was a nice FIFO connecting the two cores on chip. A decade or so later I did the same - with some more sophistication though - using a 400 MHz power architecture part with DDRAM, single core. The sampling rate was half that of the former version (had been somewhat overkill) and it was all done by the processor using 64 bit FP for the filtering (2 cycles per MAC, was hard to get at that but this is another story, it did work once I figured out how to do it). And this uses up to half the CPU resources under real load so it still manages to maintain the user interface, support VNC over tcp/ip etc. Like David said, with processors getting faster the need for a "real" DSP goes down and down. As for those other, mixed sort of TI DSP/MCU I have no experience, never even needed to consider any of them. ====================================================== Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/
On 22/12/2022 16:54, Grant Edwards wrote:
> On 2022-12-22, David Brown <david.brown@hesbynett.no> wrote: > >> You are maybe thinking of the TMS320F family of DSP/MCU's from TI. >> These have a traditional DSP-style processor core - 16-bit "char" >> (no 8-bit byte access at all), gruesome assembly where each >> instruction does several different things in a single cycle, >> multiple memory buses for simultaneous accesses, hardware support >> for cyclic buffers, FFT twiddling, etc. > > IIRC, branches were also delayed.
If you say so - I don't remember. (Delayed branches are not uncommon in processors designed for single-cycle instruction throughput - they are also found in several RISC architectures.)
> The later 320's (C30/C40 and on) > were all 32-bit (in C: char, int, long int, float, double were all > "one byte" which contained 32-bits). And the floating point format > wasn't IEEE.
I did not know they were part of the TMS320F family, though I know Texas Instruments made other DSP's with 32-bit "char".
> > That combination made supporting byte-oriented serial protocols that > used IEEE FP extra fun. >
I had enough fun with a byte-oriented UART protocol on a 16-bit TMS320 with very little ram (so little that I could not afford to waste it on unpacked buffers). Combine that with a UART peripheral that didn't actually work correctly (the "receive" flag was never set) and a toolchain with plenty of "undocumented features" (and some barely documented critical non-conformances). I did not pick the device for any other projects.
> The dev tools from TI were a but clunky, but worked OK and were > available for Solaris (including the in-circuit emulators). > > But, compared to what else was available 20+ years ago, they were damn > fast (especially for the price). >
On 22/12/2022 20:57, Rick C wrote:

> I was just curious about what people have used for DSP applications, > but in particular if anyone had used one of the "crossover" parts. > So far, the answer has been "no".
I don't know exactly how you are defining a "crossover" part. But if it is "a DSP with microcontroller features", then the answer so far is "yes". Both Grant and I have used TMS320F parts - but I would not choose to use one again if I could avoid it. (I can't answer for Grant there.) I have also used a "DSP with microcontroller features" from Freescale (from the MC56000 family, IIRC) - though I hadn't mentioned that at all. And if you mean "a microcontroller with DSP features", then as I said almost everyone who works with embedded software has used Cortex-M4 devices. I have lost count of the number of different ones I have used (plus Cortex-M7, ColdFire, and PPC based microcontrollers that had DSP features). So I don't quite see how you could have interpreted the posts as "no".
On Thursday, December 22, 2022 at 4:03:29 PM UTC-5, David Brown wrote:
> On 22/12/2022 20:57, Rick C wrote: > > > I was just curious about what people have used for DSP applications, > > but in particular if anyone had used one of the "crossover" parts. > > So far, the answer has been "no". > I don't know exactly how you are defining a "crossover" part.
Please read the first post in this thread for that.
> But if it > is "a DSP with microcontroller features", then the answer so far is > "yes". Both Grant and I have used TMS320F parts - but I would not > choose to use one again if I could avoid it. (I can't answer for Grant > there.) I have also used a "DSP with microcontroller features" from > Freescale (from the MC56000 family, IIRC) - though I hadn't mentioned > that at all. > > And if you mean "a microcontroller with DSP features", then as I said > almost everyone who works with embedded software has used Cortex-M4 > devices. I have lost count of the number of different ones I have used > (plus Cortex-M7, ColdFire, and PPC based microcontrollers that had DSP > features). > > So I don't quite see how you could have interpreted the posts as "no".
I was looking for some insight into their experiences with such devices for DSP work, and I'm counting both DSP like MCUs and MCU like DSPs. I don't see in your post that you talk about any particular experience, rather offer a 10,000 foot overview of the state of the market. Thanks for that, but this is not new to me. So your post was pretty much a "no", to me. I guess I was not quite explicit enough in my initial post. I was asking about specific experiences where a crossover part was chosen for a project with a significant DSP content, which would have required a DSP chip, if these devices were not available. I am fully aware that MCUs are getting faster and more capable, but that doesn't mean DSPs are not needed. It simply means they are used in other applications that require more horsepower. Sometimes, it's not even the horsepower, but the performance to power consumption ratio. There are application specific DSPs for hearing aids that run on very low power, much better than any MCU could do. Years ago DSP split into two categories based on the cell phone market. The high performance devices needed their own power plants, but cranked out some serious MIPS/MFLOPS. The much smaller, lower power, fixed point devices gained in speed, without sucking all the juice from mobile batteries, while serving in hand sets. Now the hand sets have dedicated CPU chips with built in DSP sections for the front end processing of cell phones, rather than separate DSP chips. There's no shortage of DSP cores in the world, we just don't see all of them because they are part of system chips. -- Rick C. -- Get 1,000 miles of free Supercharging -- Tesla referral code - https://ts.la/richard11209
On 2022-12-22, David Brown <david.brown@hesbynett.no> wrote:
> On 22/12/2022 16:54, Grant Edwards wrote: >> On 2022-12-22, David Brown <david.brown@hesbynett.no> wrote: >> >>> You are maybe thinking of the TMS320F family of DSP/MCU's from TI. >>> These have a traditional DSP-style processor core - 16-bit "char" >>> (no 8-bit byte access at all), gruesome assembly where each >>> instruction does several different things in a single cycle, >>> multiple memory buses for simultaneous accesses, hardware support >>> for cyclic buffers, FFT twiddling, etc. >> >> IIRC, branches were also delayed. > > If you say so - I don't remember. (Delayed branches are not uncommon in > processors designed for single-cycle instruction throughput - they are > also found in several RISC architectures.) > >> The later 320's (C30/C40 and on) were all 32-bit (in C: char, int, >> long int, float, double were all "one byte" which contained >> 32-bits). And the floating point format wasn't IEEE. > > I did not know they were part of the TMS320F family, though I know Texas > Instruments made other DSP's with 32-bit "char".
Ah, I overlooked the "F" in your original post. I don't remember any F parts. Interestingly the Wikipedia page on TMS320 doesn't mention the F parts at all. I did find this page abouit the TMS320F28335, but it's a 32-bit part also: https://www.ti.com/product/TMS320F28335 -- Grant

Memfault Beyond the Launch