DSP like MCUs, or MCU like DSPs?

I don't recall the TI designator, but they make some DSP parts that have peripherals like MCUs.  I know that some time back, ARM made a push into DSP territory by adding some DSPish instructions to I believe it was the CM3 devices, or maybe CM4.  

Anyone here use these crossover devices?  What sort of apps?  Why did you pick that device over others? 

-- 

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209

Reply by David Brown ●December 22, 20222022-12-22

On 21/12/2022 18:30, Rick C wrote:
> I don't recall the TI designator, but they make some DSP parts that
> have peripherals like MCUs.  I know that some time back, ARM made a
> push into DSP territory by adding some DSPish instructions to I
> believe it was the CM3 devices, or maybe CM4.
> 
> Anyone here use these crossover devices?  What sort of apps?  Why did
> you pick that device over others?
> 

You are maybe thinking of the TMS320F family of DSP/MCU's from TI. 
These have a traditional DSP-style processor core - 16-bit "char" (no 
8-bit byte access at all), gruesome assembly where each instruction does 
several different things in a single cycle, multiple memory buses for 
simultaneous accesses, hardware support for cyclic buffers, FFT 
twiddling, etc.  It lets you make very efficient DSP-style algorithms 
but is a pain for more microcontroller-style control code.  The chips 
have typical microcontroller-style peripherals such as timers, UARTs, 
CAN controllers, etc.

So they are a hybrid.  They are popular for high-temperature 
electronics, as they are one of the few families of microcontrollers 
that are available for 175 &deg;C and above.

These days, true DSP's are much less common.  On the one side, once 
FPGA's started having multiplier blocks they could outcompete DSP's in 
parallel and pipelined MAC-based algorithms, and have much more 
flexibility for memory and operand organisation.  On the other side, 
microcontrollers and processors gained single-cycle MAC instructions and 
SIMD instructions, giving them similar performance to DSP's for many 
algorithms while being far easier to use in other situations.  True 
DSP's are now usually found only in very specialised systems, or so 
deeply embedded that you never see their programmability (i.e., you buy 
a "video converter" chip and don't care how its insides work).

The Cortex-M4 is basically a Cortex-M3 with DSP instructions added - 
MACs in various formats, saturating arithmetic, and 8-bit and 16-bit 
SIMD instructions (within 32-bit registers).  They don't have all the 
features of DSP's, but they have enough to make common DSP algorithms 
quite efficient, and ARM provides optimised libraries.  The latest 
Cortex-M55 core has additional vector/SIMD instructions, but I don't 
know if any microcontrollers are available yet.

As for anyone using them, I think you'll have a very hard job finding 
anyone who does embedded development with microcontrollers that has 
/not/ used Cortex-M4 devices.  They are everywhere.

And as for why I pick a given device for a given project, it will depend 
entirely on the project - as well as other projects I have done and 
other projects other colleagues have done.  There are thousands of 
Cortex-M4 devices available, not including variations of memory sizes, 
chip packages, or speeds.  The common reasons are the same as for any 
other type of chip - price, support, familiarity, peripherals, package, etc.

The biggest reason for any choice these days, however, is availability - 
many designs start off by asking what microcontrollers our suppliers 
have in stock with the given minimum requirements, because we rarely 
have time to wait for 52 week lead times.

Reply by Grant Edwards ●December 22, 20222022-12-22

On 2022-12-22, David Brown <david.brown@hesbynett.no> wrote:

> You are maybe thinking of the TMS320F family of DSP/MCU's from TI.
> These have a traditional DSP-style processor core - 16-bit "char"
> (no 8-bit byte access at all), gruesome assembly where each
> instruction does several different things in a single cycle,
> multiple memory buses for simultaneous accesses, hardware support
> for cyclic buffers, FFT twiddling, etc.

IIRC, branches were also delayed.  The later 320's (C30/C40 and on)
were all 32-bit (in C: char, int, long int, float, double were all
"one byte" which contained 32-bits). And the floating point format
wasn't IEEE.

That combination made supporting byte-oriented serial protocols that
used IEEE FP extra fun.

The dev tools from TI were a but clunky, but worked OK and were
available for Solaris (including the in-circuit emulators).

But, compared to what else was available 20+ years ago, they were damn
fast (especially for the price).

--
Grant

Reply by Dimiter_Popoff ●December 22, 20222022-12-22

On 12/22/2022 15:36, David Brown wrote:
> ...
> The Cortex-M4 is basically a Cortex-M3 with DSP instructions added - 
> MACs in various formats, saturating arithmetic, and 8-bit and 16-bit 
> SIMD instructions (within 32-bit registers). ...
 > ...

Just a word of caution for Rick re this portion.
Make sure that a 32 bit accumulator will be enough for what you are
doing; it can easily fall short in many cases. "Normal" DSPs have
40 or so bits for this reason; or, you can pick some processor with
64 bit FPU MAC ability, 32 bit FPU will fall a lot shorter even than
the 32 bit integer regs David is mentioning.
David said it all, I am just cautioning because this is the kind of
"oh shit" factor which comes at the end of the project (a friend once
told me of that "oh shit", you either say it at the beginning or at
the end :).

Reply by Rick C ●December 22, 20222022-12-22

On Thursday, December 22, 2022 at 12:46:00 PM UTC-5, Dimiter wrote:
> On 12/22/2022 15:36, David Brown wrote: 
> > ...
> > The Cortex-M4 is basically a Cortex-M3 with DSP instructions added - 
> > MACs in various formats, saturating arithmetic, and 8-bit and 16-bit
> > SIMD instructions (within 32-bit registers). ... 
> > ... 
> 
> Just a word of caution for Rick re this portion. 
> Make sure that a 32 bit accumulator will be enough for what you are 
> doing; it can easily fall short in many cases. "Normal" DSPs have 
> 40 or so bits for this reason; or, you can pick some processor with 
> 64 bit FPU MAC ability, 32 bit FPU will fall a lot shorter even than 
> the 32 bit integer regs David is mentioning. 
> David said it all, I am just cautioning because this is the kind of 
> "oh shit" factor which comes at the end of the project (a friend once 
> told me of that "oh shit", you either say it at the beginning or at 
> the end :).

I'm not selecting a DSP part.  I typically use FPGAs for what I do.  Not because they are required for speed, but because they work well and have complete flexibility.  I used a $10 FPGA in a product I designed in 2008 and have to refresh the design for a couple of parts that are not made anymore.  The new design will still use an FPGA.  If I need an MCU in the design, it will be a custom design in the FPGA.  I have one I've been pushing around in my head that would have one CPU, pipelined to work like 8 CPUs.  Interrupt response of 1 clock cycle and no need to save registers, because all context is switched with the interrupt.  ~600 LUTs for 8 processors running at 20 MIPS each.  Not bad. 

I was just curious about what people have used for DSP applications, but in particular if anyone had used one of the "crossover" parts.  So far, the answer has been "no". 
-- 

Rick C.

+ Get 1,000 miles of free Supercharging
+ Tesla referral code - https://ts.la/richard11209

Reply by Dimiter_Popoff ●December 22, 20222022-12-22

On 12/22/2022 21:57, Rick C wrote:
> On Thursday, December 22, 2022 at 12:46:00 PM UTC-5, Dimiter wrote:
>> On 12/22/2022 15:36, David Brown wrote:
>>> ...
>>> The Cortex-M4 is basically a Cortex-M3 with DSP instructions added -
>>> MACs in various formats, saturating arithmetic, and 8-bit and 16-bit
>>> SIMD instructions (within 32-bit registers). ...
>>> ...
>>
>> Just a word of caution for Rick re this portion.
>> Make sure that a 32 bit accumulator will be enough for what you are
>> doing; it can easily fall short in many cases. "Normal" DSPs have
>> 40 or so bits for this reason; or, you can pick some processor with
>> 64 bit FPU MAC ability, 32 bit FPU will fall a lot shorter even than
>> the 32 bit integer regs David is mentioning.
>> David said it all, I am just cautioning because this is the kind of
>> "oh shit" factor which comes at the end of the project (a friend once
>> told me of that "oh shit", you either say it at the beginning or at
>> the end :).
> 
> I'm not selecting a DSP part.  I typically use FPGAs for what I do.  Not because they are required for speed, but because they work well and have complete flexibility.  I used a $10 FPGA in a product I designed in 2008 and have to refresh the design for a couple of parts that are not made anymore.  The new design will still use an FPGA.  If I need an MCU in the design, it will be a custom design in the FPGA.  I have one I've been pushing around in my head that would have one CPU, pipelined to work like 8 CPUs.  Interrupt response of 1 clock cycle and no need to save registers, because all context is switched with the interrupt.  ~600 LUTs for 8 processors running at 20 MIPS each.  Not bad.
> 
> I was just curious about what people have used for DSP applications, but in particular if anyone had used one of the "crossover" parts.  So far, the answer has been "no".

I have used a "real" DSP just once, 20+ years ago. The TI 5420,
I did our first DSP based MCA module back then.
The 5420 had two cores clocked at 100 MHz, some dual access RAM
(meaning an address can be accessed twice in one clock cycle) and
multiple serial ADC interfaces, *very* flexible ones, allowed me
to serially push an (almost) 10Msps 16 bit wide stream sequentially
using 3 of these (one had just 1/3 the seed I needed). A CPLD
was doing the serialization, the 3 streams were getting into the
DSP memory in a large FIFO, in the correct sequence, all this
could be just programmed into their serial interfaces.
Then one core had just one job, to detect an event and pass it
to the other core which would do the filtering etc. processing,
there was a nice FIFO connecting the two cores on chip.
A decade or so later I did the same - with some more sophistication
though - using a 400 MHz power architecture part with DDRAM,
single core. The sampling rate was half that of the former version
(had been somewhat overkill) and it was all done by the processor
using 64 bit FP for the filtering (2 cycles per MAC, was hard to
get at that but this is another story, it did work once I figured
out how to do it). And this uses up to half the CPU resources
under real load so it still manages to maintain the user interface,
support VNC over tcp/ip etc.
Like David said, with processors getting faster the need for a
"real" DSP goes down and down.
As for those other, mixed sort of TI DSP/MCU I have no experience,
never even needed to consider any of them.

======================================================
Dimiter Popoff, TGI             http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/

Reply by David Brown ●December 22, 20222022-12-22

On 22/12/2022 16:54, Grant Edwards wrote:
> On 2022-12-22, David Brown <david.brown@hesbynett.no> wrote:
> 
>> You are maybe thinking of the TMS320F family of DSP/MCU's from TI.
>> These have a traditional DSP-style processor core - 16-bit "char"
>> (no 8-bit byte access at all), gruesome assembly where each
>> instruction does several different things in a single cycle,
>> multiple memory buses for simultaneous accesses, hardware support
>> for cyclic buffers, FFT twiddling, etc.
> 
> IIRC, branches were also delayed.  

If you say so - I don't remember.  (Delayed branches are not uncommon in 
processors designed for single-cycle instruction throughput - they are 
also found in several RISC architectures.)

> The later 320's (C30/C40 and on)
> were all 32-bit (in C: char, int, long int, float, double were all
> "one byte" which contained 32-bits). And the floating point format
> wasn't IEEE.

I did not know they were part of the TMS320F family, though I know Texas 
Instruments made other DSP's with 32-bit "char".

> 
> That combination made supporting byte-oriented serial protocols that
> used IEEE FP extra fun.
> 

I had enough fun with a byte-oriented UART protocol on a 16-bit TMS320 
with very little ram (so little that I could not afford to waste it on 
unpacked buffers).  Combine that with a UART peripheral that didn't 
actually work correctly (the "receive" flag was never set) and a 
toolchain with plenty of "undocumented features" (and some barely 
documented critical non-conformances).  I did not pick the device for 
any other projects.

> The dev tools from TI were a but clunky, but worked OK and were
> available for Solaris (including the in-circuit emulators).
> 
> But, compared to what else was available 20+ years ago, they were damn
> fast (especially for the price).
>

Reply by David Brown ●December 22, 20222022-12-22

On 22/12/2022 20:57, Rick C wrote:

> I was just curious about what people have used for DSP applications,
> but in particular if anyone had used one of the "crossover" parts.
> So far, the answer has been "no".

I don't know exactly how you are defining a "crossover" part.  But if it 
is "a DSP with microcontroller features", then the answer so far is 
"yes".  Both Grant and I have used TMS320F parts - but I would not 
choose to use one again if I could avoid it.  (I can't answer for Grant 
there.)  I have also used a "DSP with microcontroller features" from 
Freescale (from the MC56000 family, IIRC) - though I hadn't mentioned 
that at all.

And if you mean "a microcontroller with DSP features", then as I said 
almost everyone who works with embedded software has used Cortex-M4 
devices.  I have lost count of the number of different ones I have used 
(plus Cortex-M7, ColdFire, and PPC based microcontrollers that had DSP 
features).

So I don't quite see how you could have interpreted the posts as "no".

Reply by Rick C ●December 22, 20222022-12-22

On Thursday, December 22, 2022 at 4:03:29 PM UTC-5, David Brown wrote:
> On 22/12/2022 20:57, Rick C wrote: 
> 
> > I was just curious about what people have used for DSP applications, 
> > but in particular if anyone had used one of the "crossover" parts. 
> > So far, the answer has been "no".
> I don't know exactly how you are defining a "crossover" part. 

Please read the first post in this thread for that. 

> But if it 
> is "a DSP with microcontroller features", then the answer so far is 
> "yes". Both Grant and I have used TMS320F parts - but I would not 
> choose to use one again if I could avoid it. (I can't answer for Grant 
> there.) I have also used a "DSP with microcontroller features" from 
> Freescale (from the MC56000 family, IIRC) - though I hadn't mentioned 
> that at all. 
> 
> And if you mean "a microcontroller with DSP features", then as I said 
> almost everyone who works with embedded software has used Cortex-M4 
> devices. I have lost count of the number of different ones I have used 
> (plus Cortex-M7, ColdFire, and PPC based microcontrollers that had DSP 
> features). 
> 
> So I don't quite see how you could have interpreted the posts as "no".

I was looking for some insight into their experiences with such devices for DSP work, and I'm counting both DSP like MCUs and MCU like DSPs.  I don't see in your post that you talk about any particular experience, rather offer a 10,000 foot overview of the state of the market.  Thanks for that, but this is not new to me.  So your post was pretty much a "no", to me.  

I guess I was not quite explicit enough in my initial post.  I was asking about specific experiences where a crossover part was chosen for a project with a significant DSP content, which would have required a DSP chip, if these devices were not available.  

I am fully aware that MCUs are getting faster and more capable, but that doesn't mean DSPs are not needed.  It simply means they are used in other applications that require more horsepower.  Sometimes, it's not even the horsepower, but the performance to power consumption ratio.  There are application specific DSPs for hearing aids that run on very low power, much better than any MCU could do.  

Years ago DSP split into two categories based on the cell phone market.  The high performance devices needed their own power plants, but cranked out some serious MIPS/MFLOPS.  The much smaller, lower power, fixed point devices gained in speed, without sucking all the juice from mobile batteries, while serving in hand sets.  Now the hand sets have dedicated CPU chips with built in DSP sections for the front end processing of cell phones, rather than separate DSP chips.  

There's no shortage of DSP cores in the world, we just don't see all of them because they are part of system chips. 
-- 

Rick C.

-- Get 1,000 miles of free Supercharging
-- Tesla referral code - https://ts.la/richard11209

Reply by Grant Edwards ●December 22, 20222022-12-22

On 2022-12-22, David Brown <david.brown@hesbynett.no> wrote:
> On 22/12/2022 16:54, Grant Edwards wrote:
>> On 2022-12-22, David Brown <david.brown@hesbynett.no> wrote:
>> 
>>> You are maybe thinking of the TMS320F family of DSP/MCU's from TI.
>>> These have a traditional DSP-style processor core - 16-bit "char"
>>> (no 8-bit byte access at all), gruesome assembly where each
>>> instruction does several different things in a single cycle,
>>> multiple memory buses for simultaneous accesses, hardware support
>>> for cyclic buffers, FFT twiddling, etc.
>> 
>> IIRC, branches were also delayed.  
>
> If you say so - I don't remember.  (Delayed branches are not uncommon in 
> processors designed for single-cycle instruction throughput - they are 
> also found in several RISC architectures.)
>
>> The later 320's (C30/C40 and on) were all 32-bit (in C: char, int,
>> long int, float, double were all "one byte" which contained
>> 32-bits). And the floating point format wasn't IEEE.
>
> I did not know they were part of the TMS320F family, though I know Texas 
> Instruments made other DSP's with 32-bit "char".

Ah, I overlooked the "F" in your original post. I don't remember any F
parts. Interestingly the Wikipedia page on TMS320 doesn't mention the
F parts at all. I did find this page abouit the TMS320F28335, but it's
a 32-bit part also:

   https://www.ti.com/product/TMS320F28335

--
Grant

Previous12 Next

DSP like MCUs, or MCU like DSPs?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group