Yes, You can do this in NIOS but you have to create two different
versions, one for the custom instruction and one for the Avalon interfacing.
With the FSL interfaces, you can connect it to the processor or connect
it to a streaming interface from a memory controller or why not from
your own HW logic.
The actually idct module does not change at all.
The key to acceleration is to connect it to the right place and if you
want to move it around without changing the interface all the time, the
FSL interface is a saver.
Another key benefit of the FSL is that if I want to do a 2-D idct, I can
chain two 1-D FSL modules together.
One FSL module is feeding the other FSL module, this is hard to do with
custom instructions since it's very tied to the processor.
Gan
James wrote:
>In Nios II, I'd implement the IDCT as a
coprocessor. The coprocessor
>could connect to the Nios II CPU using custom instructions or using
>Avalon (the on-chip interconnect fabric).
>
>In the custom instruction version, you have Nios II load the source
>operands from memory, use a custom instruction to transfer two 32-bit
>operands per cycle to the coprocessor, use a custom instruction to
>start the operation, use a custom instruction to transfer one 32-bit
>result value per cycle, and then use Nios II store instructions to
>save the results to memory. You have to do the same thing with
>Microblaze but the FSLs can only transfer one 32-bit value every
>other cycle so Nios II would be significantly faster.
>
>However, if you really want the best performance, I'd make the IDCT
>its own SOPC Builder component with an Avalon master interface. Think
>of it as an intelligent DMA. Instead of having Nios II move around
>source and result values, Nios II just provides the memory addresses
>to the coprocessor (either through a custom instruction or an Avalon
>slave interface) and then waits for it to complete. The IDCT reads
>the source operands from memory, performs the computation, and writes
>back the result. This provides the best speedup.
>
>+james+
>
>--- In , Gan Bilski <goran.bilski@x> wrote:
>>Hi,
>>
>>The custom function is attaching a full function on the FSL (fifo)
>>channels that MicroBlaze has.
>>
>>Custom instruction has the drawback that you can only have 2
>>
>>
>operands
>>and 1 result which minimize the usage of it.
>>Most useful instruction are already covered by the ISA.
>>
>>One example that I use to demonstrate this is to optimize a idct
>>function that is needed for a jpeg decoder.
>>If you run a jpeg decoder, the idct function will take most of the
>>
>>
>CPU
>>cycles,
>>Further more the idct function is a loop in a loop function as many
>>
>>
>DSP
>>functions are.
>>The idct function has 8 inputs and 8 outputs and uses 64 constants
>>
>>
>for
>>the calculation.
>>By examine the assembler output from the compiler and using custom
>>instructions, you can create your own super MAC instructions which
>>
>>
>not
>>only does a MAC but also contains the whole constant table plus
>>autoincrement the pointer into the constant table.
>>If you get this instruction to execute in 1 clock cycle, you have
>>increase the performance of the idct function with 90% which is not
>>
>>
>a
>>bad improvement.
>>
>>But if you take the FSL approach you place the WHOLE idct function
>>
>>
>into
>>HW and just pass the parameters through the FIFO and receive the
>>
>>
>results
>>through the FIFO. This will improve the performance 1010% which is
>>
>>
>7x
>>faster than the custom instruction.
>>WHY?
>>Because the whole function is placed in HW, you can collapse both
>>
>>
>loops
>>and there is where all the performance gain is.
>>But in order to collapse the loops, you have now 8 inputs and 8
>>
>>
>outputs
>>which is hard to specify in a custom instruction.
>>
>>The custom instruction sounds nice but doesn't really give you that
>>
>>
>much
>>gain. The FSL has potential of improving your application with 10x
>>
>>
>or 100x.
>>Gan
>>
>>Perez Ramas, Javier Basilio wrote:
>>
>>
>>
>>> Hello,
>>>
>>>
>>>
>>>
>>>
>>>>-----Mensaje original-----
>>>>De: Gan Bilski [mailto:goran.bilski@x...]
>>>>Enviado el: martes, 14 de diciembre de 2004 20:24
>>>>Para:
>>>>Asunto: Re: [fpga-cpu] Xilinx vs Altera / Microblaze vs
Nios???
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>>* Nios II and Microblaze are roughly of the same
architecture.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>However, Nios II is more customizable and Microblaze cannot
>>>>>
>>>>>
>handle
>>>>>custom instructions (is this true)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>MicroBlaze can do much more by allowing you to do customer
>>>>
>>>>
>functions
>>>>which is far more powerful than custom instructions.
>>>>
>>>>
>>>>
>>>>
>>> Please, can you explain further what do you want to say
>>>
>>>
>with "customer functions"?
>>> As far as I know, the ways to "expand" Microblaze is
either
>>>
>>>
>a custom OPB-connected hardware or hardware attached to a superlink
>FIFO. Is this correct? Although that's valid for most applications I
>think that the "custom opcodes" is really a powerful function.
>>> Best regards,
>>>
>>> Javier Basilio Pez Ramas
>>> GECOM sensors
>>> INDRA Sistemas, S.A.
>>>
>>>
>>>To post a message, send it to:
>>>To unsubscribe, send a blank message to: fpga-cpu-
>>>
>>
>
>>>Yahoo! Groups Links
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>To post a message, send it to:
>To unsubscribe, send a blank message to:
>Yahoo! Groups Links
>
|