EmbeddedRelated.com
Forums
Memfault Beyond the Launch

Xilinx vs Altera / Microblaze vs Nios???

Started by Mats Brorsson December 14, 2004
Yes, You can do this in NIOS but you have to create two different
versions, one for the custom instruction and one for the Avalon interfacing.
With the FSL interfaces, you can connect it to the processor or connect
it to a streaming interface from a memory controller or why not from
your own HW logic.
The actually idct module does not change at all.
The key to acceleration is to connect it to the right place and if you
want to move it around without changing the interface all the time, the
FSL interface is a saver.

Another key benefit of the FSL is that if I want to do a 2-D idct, I can
chain two 1-D FSL modules together.
One FSL module is feeding the other FSL module, this is hard to do with
custom instructions since it's very tied to the processor.

Gan

James wrote:

>In Nios II, I'd implement the IDCT as a coprocessor. The coprocessor
>could connect to the Nios II CPU using custom instructions or using
>Avalon (the on-chip interconnect fabric).
>
>In the custom instruction version, you have Nios II load the source
>operands from memory, use a custom instruction to transfer two 32-bit
>operands per cycle to the coprocessor, use a custom instruction to
>start the operation, use a custom instruction to transfer one 32-bit
>result value per cycle, and then use Nios II store instructions to
>save the results to memory. You have to do the same thing with
>Microblaze but the FSLs can only transfer one 32-bit value every
>other cycle so Nios II would be significantly faster.
>
>However, if you really want the best performance, I'd make the IDCT
>its own SOPC Builder component with an Avalon master interface. Think
>of it as an intelligent DMA. Instead of having Nios II move around
>source and result values, Nios II just provides the memory addresses
>to the coprocessor (either through a custom instruction or an Avalon
>slave interface) and then waits for it to complete. The IDCT reads
>the source operands from memory, performs the computation, and writes
>back the result. This provides the best speedup.
>
>+james+
>
>--- In , Gan Bilski <goran.bilski@x> wrote: >>Hi,
>>
>>The custom function is attaching a full function on the FSL (fifo)
>>channels that MicroBlaze has.
>>
>>Custom instruction has the drawback that you can only have 2
>>
>>
>operands >>and 1 result which minimize the usage of it.
>>Most useful instruction are already covered by the ISA.
>>
>>One example that I use to demonstrate this is to optimize a idct
>>function that is needed for a jpeg decoder.
>>If you run a jpeg decoder, the idct function will take most of the
>>
>>
>CPU >>cycles,
>>Further more the idct function is a loop in a loop function as many
>>
>>
>DSP >>functions are.
>>The idct function has 8 inputs and 8 outputs and uses 64 constants
>>
>>
>for >>the calculation.
>>By examine the assembler output from the compiler and using custom
>>instructions, you can create your own super MAC instructions which
>>
>>
>not >>only does a MAC but also contains the whole constant table plus
>>autoincrement the pointer into the constant table.
>>If you get this instruction to execute in 1 clock cycle, you have
>>increase the performance of the idct function with 90% which is not
>>
>>
>a >>bad improvement.
>>
>>But if you take the FSL approach you place the WHOLE idct function
>>
>>
>into >>HW and just pass the parameters through the FIFO and receive the
>>
>>
>results >>through the FIFO. This will improve the performance 1010% which is
>>
>>
>7x >>faster than the custom instruction.
>>WHY?
>>Because the whole function is placed in HW, you can collapse both
>>
>>
>loops >>and there is where all the performance gain is.
>>But in order to collapse the loops, you have now 8 inputs and 8
>>
>>
>outputs >>which is hard to specify in a custom instruction.
>>
>>The custom instruction sounds nice but doesn't really give you that
>>
>>
>much >>gain. The FSL has potential of improving your application with 10x
>>
>>
>or 100x. >>Gan
>>
>>Perez Ramas, Javier Basilio wrote:
>>
>>
>>
>>> Hello,
>>>
>>>
>>>
>>>
>>>
>>>>-----Mensaje original-----
>>>>De: Gan Bilski [mailto:goran.bilski@x...]
>>>>Enviado el: martes, 14 de diciembre de 2004 20:24
>>>>Para:
>>>>Asunto: Re: [fpga-cpu] Xilinx vs Altera / Microblaze vs Nios???
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>>* Nios II and Microblaze are roughly of the same architecture.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>However, Nios II is more customizable and Microblaze cannot
>>>>>
>>>>>
>handle >>>>>custom instructions (is this true)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>MicroBlaze can do much more by allowing you to do customer
>>>>
>>>>
>functions >>>>which is far more powerful than custom instructions.
>>>>
>>>>
>>>>
>>>>
>>> Please, can you explain further what do you want to say
>>>
>>>
>with "customer functions"? >>> As far as I know, the ways to "expand" Microblaze is either
>>>
>>>
>a custom OPB-connected hardware or hardware attached to a superlink
>FIFO. Is this correct? Although that's valid for most applications I
>think that the "custom opcodes" is really a powerful function. >>> Best regards,
>>>
>>> Javier Basilio Pez Ramas
>>> GECOM sensors
>>> INDRA Sistemas, S.A.
>>>
>>>
>>>To post a message, send it to:
>>>To unsubscribe, send a blank message to: fpga-cpu-
>>>
>> >
>>>Yahoo! Groups Links
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>> >To post a message, send it to:
>To unsubscribe, send a blank message to:
>Yahoo! Groups Links >



Martin, > Martin Schoeberl wrote:
> JOP: a Java processor soft-core for FPGAs: http://www.jopdesign.com/
>
> With this processor you can postbone your decision about A vs. X
> because it runns on both (e.g. Cyclone, Spartan-3). However, the design
> runns faster on the Cyclone than on the Spartan-3 (100MHz vs. 82MHz).
Can you please give your opinion on why you think that the JOP runs
faster on the Cyclone than the Spartan-3?

Is there a way to close this performance gap?

What is your personal opinion on this issue?

Richard Newman


>> Martin Schoeberl wrote:
>> JOP: a Java processor soft-core for FPGAs: http://www.jopdesign.com/
>>
>> With this processor you can postbone your decision about A vs. X
>> because it runns on both (e.g. Cyclone, Spartan-3). However, the design
>> runns faster on the Cyclone than on the Spartan-3 (100MHz vs. 82MHz).
>>
>>
> Can you please give your opinion on why you think that the JOP runs
> faster on the Cyclone than the Spartan-3?

I think it is just the fact that the Cyclone is a little bit faster than the Spartan-3.
But I don't want to start a flame ware between C and S - we had discussions
on this issue on comp.arch.fpga (where a guy from Altera wondered that
JOP wasn't much more faster on the Cyclone ;-)

Cyclone and Spartan-3 are comparable and competitive FPGAs. It depends on
the application, which one is faster. For DSP applications I expect that the
Spartan will be the winner as it contains hardware multiplier.

However, I want to stress that I have no Altera or Xilinx optimized code in JOP.

>
> Is there a way to close this performance gap?
>
Not from the VHDL coding level. Perhaps Xilinx will add another speed grade
for Spartan-3 (Cyclone comes in three grades, wheras Spartan-3 only in two) or
you could squeeze out a few ps with floor-planning.

Martin



> Random opinions:
> - I like Quartus II (Altera) a lot more than ISE, the
> latter feels a lot less polished.
> - The ML401 development kit (Xilinx) is hands-down the
> best ~$500 development kit out there, but is _not_
> supported by the free ISE WebPACK :-(
> - The new Stratix II has a really interesting
> architecture. I'd love an ML401 with a Stratix II
> instead of the Virtex4 (Altera? Xmas? Please? ;-)

I would like to add one to Tommy's list from a production board standpoint:

We continue to find that we cannot obtain Altera devices in same density; i.e. given
a limited space (board requirement), the Xilinx part has more logic capacity in a
smaller footprint. And, Xilinx seems to offer more range of devices to choose from
for each footprint. For example, we put down an XC3S400-FG456, but we know we can
move up to XC3S1000 or XC3S1500 if needed, without changing footprint.

Every time we ask our Altera FAEs and reps to give us the equivalent Altera part for
a given Xilinx part that will meet our requirements, they're silent. From
salespeople, no news is bad news.

In this situation, tools etc. become less relevant. You gotta be able to build the
board and support it over time with new features.

-Jeff



Heh, half of the replies could hardly be called
unbiased.

I use both X and A, and they are very comparable. For
educational use it's important to think about the
design software. Both X and A offer free packages of
their designtools (ISE WebPACK 6.3i and Quartus II Web
Edition), but each only support a subset of the
devices. If you can't get a special education
license, you should pick a development kit that is
supported by the gratis version of their tools.

Random opinions:
- I like Quartus II (Altera) a lot more than ISE, the
latter feels a lot less polished.
- The ML401 development kit (Xilinx) is hands-down the
best ~$500 development kit out there, but is _not_
supported by the free ISE WebPACK :-(
- The new Stratix II has a really interesting
architecture. I'd love an ML401 with a Stratix II
instead of the Virtex4 (Altera? Xmas? Please? ;-)

Tommy

__________________________________



Memfault Beyond the Launch