EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

newbie: FIFOs in C for DSP

Started by sebastian August 17, 2004
hi,

im a newbie in DSP and embedded programming and i've some doubts that
i hope some of you could kindly answer.

i've to do some profiling between DSP and ASIC/FPGA solutions. Im a
FPGA guy so im lost with this DSP thing.

1)
i'd like to know what do you think about the code i wrote, cause i
feel it's not very "DSP optimized". I need to implement some sort of
shiftregister or FIFO, because im to calculate a correlation, so
here's how i do it. (like i'd do in vhdl...)

the input coming from an ADC is 10bits wide, so it'll be stored in a
"short"

#include....

#define FifoSize(x) (x+1)

// FIFO declarations
short FifoN[FifoSize(gN)];
short *FifoN_ptr_write = &FifoN[0];
short *FifoN_ptr_read = &FifoN[1];
short *FifoN_end_ptr = &FifoN[gN];

short input;
int multN;

// check for the end of FIFO (write), if yes, then wrap around
if (FifoN_ptr_write == FifoN_end_ptr)
{
     FifoN_ptr_write = &FifoN[0];
}

// write to the FIFO
*(FifoN_ptr_write++) = input;
	
// check for the end of FIFO (read), if yes, then wrap around
if (FifoN_ptr_read == FifoN_end_ptr)
  {
    FifoN_ptr_read = &FifoN[0];
  }

// calculate input * FIFO_output
multN = input * (*FifoN_ptr_read++)

....

EchR = EchR + multN - (*FifoD1_ptr_read++);


is this the best way to do it? or there's a better way to implement
the FIFOs?
should i "fix" for 20bits discarding the first 12? (i guess that's not
necesary)
will memory used by the FIFO be effectivelly stored in the cache?
can i use three operands at the right of the equal? or should i split
the expression?

2)
how about the ADC doing a DMA transfer to RAM and then the DSP reading
a whole chunk of data while the ADC performs the next DMA? it is
possible i guess.

3)
are there online tutorials or coding style guideliness for C for DSP?

thanks in advance
sebastian wrote:

> hi, > > im a newbie in DSP and embedded programming and i've some doubts that > i hope some of you could kindly answer. > > i've to do some profiling between DSP and ASIC/FPGA solutions. Im a > FPGA guy so im lost with this DSP thing. > > 1) > i'd like to know what do you think about the code i wrote, cause i > feel it's not very "DSP optimized". I need to implement some sort of > shiftregister or FIFO, because im to calculate a correlation, so > here's how i do it. (like i'd do in vhdl...) > > the input coming from an ADC is 10bits wide, so it'll be stored in a > "short" > > #include.... > > #define FifoSize(x) (x+1) > > // FIFO declarations > short FifoN[FifoSize(gN)]; > short *FifoN_ptr_write = &FifoN[0]; > short *FifoN_ptr_read = &FifoN[1]; > short *FifoN_end_ptr = &FifoN[gN]; > > short input; > int multN; > > // check for the end of FIFO (write), if yes, then wrap around > if (FifoN_ptr_write == FifoN_end_ptr) > { > FifoN_ptr_write = &FifoN[0]; > } > > // write to the FIFO > *(FifoN_ptr_write++) = input; > > // check for the end of FIFO (read), if yes, then wrap around > if (FifoN_ptr_read == FifoN_end_ptr) > { > FifoN_ptr_read = &FifoN[0]; > } > > // calculate input * FIFO_output > multN = input * (*FifoN_ptr_read++) > > .... > > EchR = EchR + multN - (*FifoD1_ptr_read++); > > > is this the best way to do it? or there's a better way to implement > the FIFOs? > should i "fix" for 20bits discarding the first 12? (i guess that's not > necesary) > will memory used by the FIFO be effectivelly stored in the cache? > can i use three operands at the right of the equal? or should i split > the expression? > > 2) > how about the ADC doing a DMA transfer to RAM and then the DSP reading > a whole chunk of data while the ADC performs the next DMA? it is > possible i guess. > > 3) > are there online tutorials or coding style guideliness for C for DSP? > > thanks in advance
1. The FIFO write is probably as good as you can get for a single-element write. The correlation calculation is what DSP's are built for, should be done using a MAC instruction, and may well need to be done in assembly to take advantage of the MAC. Your DSP tools should come with library functions to do this (but I haven't found any that I like, so I always write that part by hand in assembly. One day...). Whether a cache will be effective with the FIFO depends on your software and your hardware. If you're doing integer computations, do you have a cache at all? If it's vital to get that correlation computation done pronto it may be a good idea to lock that section of the cache. 2. Sure -- but you're just looking for a way to sneak an FPGA in there, aren't you? 3. I don't know. I have always just used good C coding practice, with the recognition that a compiler will not write your innermost loop for you (others will disagree, but that's my experience). -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Tim Wescott <tim@wescottnospamdesign.com> wrote in message 
> > 1. The FIFO write is probably as good as you can get for a > single-element write. > > The correlation calculation is what DSP's are built for, should be done > using a MAC instruction, and may well need to be done in assembly to > take advantage of the MAC. Your DSP tools should come with library > functions to do this (but I haven't found any that I like, so I always > write that part by hand in assembly. One day...). >
yeah, that's what i thought too, but TI's documentation talks wonders about their optimizing compiler, and till now i havent been able to find as much assembly examples as in C. but to use the "circular addressing" mode i'd need to write in assembly, as C the compiler cant understand that im modelling a circular buffer and there are no C templates or callable library functions to do it (but i havent gone thru all the docs yet, the Instruction Set and CPU guide is 635pages!)
> Whether a cache will be effective with the FIFO depends on your software > and your hardware. If you're doing integer computations, do you have a > cache at all? If it's vital to get that correlation computation done > pronto it may be a good idea to lock that section of the cache. >
well TI's c6713 has it embedded
> 2. Sure -- but you're just looking for a way to sneak an FPGA in there, > aren't you? >
well, the profiling im doing is to see if a DSP could be used instead of a ASIC/FPGA solution, but so far no luck, the DSP is slow compared to the FPGA. It is more precise cause it's a FP DSP, but im guessing (like i always thought) that it cant match the FPGA. Im doing baseband processing at about 100MSPS(mega samples per second) and i dont see how could i fit all the processing i need to, with only 225MHz*8=1800MIPS
> 3. I don't know. I have always just used good C coding practice, with > the recognition that a compiler will not write your innermost loop for > you (others will disagree, but that's my experience).
seems like you're right :)
sebastian wrote:

> Tim Wescott <tim@wescottnospamdesign.com> wrote in message > >>1. The FIFO write is probably as good as you can get for a >>single-element write. >> >>The correlation calculation is what DSP's are built for, should be done >>using a MAC instruction, and may well need to be done in assembly to >>take advantage of the MAC. Your DSP tools should come with library >>functions to do this (but I haven't found any that I like, so I always >>write that part by hand in assembly. One day...). >> > > > yeah, that's what i thought too, but TI's documentation talks wonders > about their optimizing compiler, and till now i havent been able to > find as much assembly examples as in C. > but to use the "circular addressing" mode i'd need to write in > assembly, as C the compiler cant understand that im modelling a > circular buffer and there are no C templates or callable library > functions to do it (but i havent gone thru all the docs yet, the > Instruction Set and CPU guide is 635pages!) >
My experience with Code Composter for the 'F28xx chip is that the optimizer is very good with "standard" code, but it is simply incapable of recognizing opportunities to use circular buffers, the MAC, and all that other stuff that makes a DSP more than just a really good processor.
> >>Whether a cache will be effective with the FIFO depends on your software >>and your hardware. If you're doing integer computations, do you have a >>cache at all? If it's vital to get that correlation computation done >>pronto it may be a good idea to lock that section of the cache. >> > > > well TI's c6713 has it embedded >
So there's a few hundred more pages to study! :)
> >>2. Sure -- but you're just looking for a way to sneak an FPGA in there, >>aren't you? >> > > > well, the profiling im doing is to see if a DSP could be used instead > of a ASIC/FPGA solution, but so far no luck, the DSP is slow compared > to the FPGA. It is more precise cause it's a FP DSP, but im guessing > (like i always thought) that it cant match the FPGA. Im doing baseband > processing at about 100MSPS(mega samples per second) and i dont see > how could i fit all the processing i need to, with only > 225MHz*8=1800MIPS >
You should have said so. At 100MSPS you need an FPGA, for sure (or enough DSPs in a fancy enough configuration that a Vertex II Pro as large as your hand is looking inexpensive and small). If you know your data ranges beforehand you shouldn't need floating point. You may want to consider using an FPGA front-end with a DSP behind it. I've done this where the FPGA does high-rate sampling and decimation, then the DSP took over at about 10KSPS (that was a few years ago, the product is being replaced by DSP's that sample directly at 50KSPS). Done right this will allow you to use a cheaper FPGA, and debugging DSP code is generally easier than debugging FPGAs.
> > >>3. I don't know. I have always just used good C coding practice, with >>the recognition that a compiler will not write your innermost loop for >>you (others will disagree, but that's my experience). > > > seems like you're right :)
-- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Tim Wescott <tim@wescottnospamdesign.com> wrote in message news:<10i892sjh07g880@corp.supernews.com>...
 
> My experience with Code Composter for the 'F28xx chip is that the > optimizer is very good with "standard" code, but it is simply incapable > of recognizing opportunities to use circular buffers, the MAC, and all > that other stuff that makes a DSP more than just a really good processor.
exactly, maybe TI should give some macros or templates to use them so that later the compiler could recognize them (so far i havent found those macros if they exist)
> So there's a few hundred more pages to study! :)
yeah :), is it me or TI's datasheets are different from other manufacturers? i was used to read datasheets from several other semiconductor manufacturers and they seemed to have the same layout, but im kinda lost with TI's ones, maybe im just getting old :)
> >
> You should have said so. At 100MSPS you need an FPGA, for sure (or > enough DSPs in a fancy enough configuration that a Vertex II Pro as > large as your hand is looking inexpensive and small). If you know your > data ranges beforehand you shouldn't need floating point. >
using FP will improve the precision 'cause we do plenty of divisions, that's the whole idea of using a DSP, dont ask me, i was happy in my ASIC world :) after all, if they use DSP where would i work? :)
> You may want to consider using an FPGA front-end with a DSP behind it. > I've done this where the FPGA does high-rate sampling and decimation, > then the DSP took over at about 10KSPS (that was a few years ago, the > product is being replaced by DSP's that sample directly at 50KSPS). > Done right this will allow you to use a cheaper FPGA, and debugging DSP > code is generally easier than debugging FPGAs.
yeah, specially when it comes to "hard" bugs in ASIC :) by the way, i cant do anything to the incoming data, so i need to process it at 100MSPS. anyways i guess my report will say i wont recommend the use of DSP thanks everybody for your answers
sebastian wrote:

   ...

> using FP will improve the precision 'cause we do plenty of divisions, > that's the whole idea of using a DSP, dont ask me, i was happy in my > ASIC world :) after all, if they use DSP where would i work? :)
32-bit fixed point has 31 bits of precision plus sign. 32-bit floating point has 24 bits of precision. 7 bits are traded away for extra range. If you know the range of your data and construct accordingly, you don't have to make that trade. ... Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
"Jerry Avins" <jya@ieee.org> wrote in message
news:4126b229$0$21759$61fed72c@news.rcn.com...
> sebastian wrote: > > 32-bit fixed point has 31 bits of precision plus sign. 32-bit floating > point has 24 bits of precision.
plus sign also. (At least the standard IEEE floating-point.)
> 7 bits are traded away for extra range. > If you know the range of your data and construct accordingly, you don't > have to make that trade.
Jon Harris wrote:
> "Jerry Avins" <jya@ieee.org> wrote in message > news:4126b229$0$21759$61fed72c@news.rcn.com... > >>sebastian wrote: >> >>32-bit fixed point has 31 bits of precision plus sign. 32-bit floating >>point has 24 bits of precision. > > > plus sign also. (At least the standard IEEE floating-point.) > > >>7 bits are traded away for extra range. >>If you know the range of your data and construct accordingly, you don't >>have to make that trade.
Implicit 1 bit buys place for the sign bit Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

The 2024 Embedded Online Conference