EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Application processor with fast parallel I/O

Started by Theo Markettos September 9, 2016
On 10.9.2016 г. 00:26, Tim Wescott wrote:
> On Fri, 09 Sep 2016 23:48:22 +0300, Dimiter_Popoff wrote: > >> On 09.9.2016 г. 23:14, Tim Wescott wrote: >>> On Fri, 09 Sep 2016 15:34:23 +0100, Theo Markettos wrote: >>> >>>> I'm looking for a Cortex-A class processor that has reasonably quick >>>> parallel I/O that might be hooked up to an FPGA. I'm aware of the >>>> existing Zynq and Altera SoC FPGAs, but looking for something >>>> different. >>>> >>>> By 'parallel I/O' I mean ideally a memory interface - either >>>> bidirectional eg 64-bits or separate 32-bit tx and 32-bit rx. GPIO >>>> with data valid signals/strobes is another possibility. By 'quick' I >>>> mean hundreds of MHz, ideally with low latency. >>>> >>>> I know there are things like the TI PRU in eg the OMAP family, but >>>> they seem to have a limited number of pins (16 bit tx/rx). >>>> >>>> Can anyone suggest anything else in this space? >>> >>> Have you looked at data sheets? The last time I looked it seemed like >>> there were ones out there that came with built-in SDRAM interfaces. >>> That may not be ideal ('cuz you'd have to make your FPGA that pretend >>> to be SDRAM), but it should support the bandwidth you'd want. >>> >>> I have to admit I didn't look hard -- I wanted a microcontroller with a >>> Cortex A core; I basically stopped looking when I saw that I'd need to >>> deal with external memory and whatnot if I wanted that core. >>> >>> >> Hi Tim, >> not having fpga experience (though I have designed plenty of logic, used >> cpld-s, written a logic compiler for cpld-s and another for GAL-s back >> in the day etc.) I wonder if it would not be easier to use PCIe than >> pretend to be DDRAM. I have seen fpga-s (soon I may have to use one with >> DDR etc. if I want to have a display controller which I seem to do...) >> advertising PCI-e which looks somehow "readily available" to the >> customer, perhaps this could be a way? I have also seen them having DDR >> but it seems to be the wrong way around for this task (not for mine with >> the display though). > > I did think that, and should have commented. One of the features of life > in a world with big FPGAs is that you can just go out and buy IP to do > things like talk to SDRAM or PCI. The last time that I was involved in > such a project the PCI in question was "plain old", but there has to be > PCI-e cores out there for sale. > > "Reverse" SDRAM would be more rare, but the actual core should be simpler > (and answers the OP's question more closely). > > Were it me, I'd certainly leave PCI-e on the table until I'd done some > design studies. >
What I will probably do one of these days is a display controller doing up to and included 4k video. Just the bitmap -> serial stream, hdmi and probably raw display module lvds. From what I have seen on the first page of the datasheets if not even before that there are fpgas which have DDR, PCIe and even lvds drivers (but I think the latter are not fast enough, yet to check on that). The idea is the DDR to hold the display bitmap, the processor to write it via PCIe (will also read it but rarely if at all). No fancy stuff with maintaining windows etc. involved, the processors nowadays are fast enough to do all that (and I am not making a 3d gaming console). But let me see when (if...) I'll manage to get around to that.... Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
On 9/9/2016 12:37 PM, Theo Markettos wrote:
> rickman <gnuarm@gmail.com> wrote: >> I think you are not going to find anything other than the memory >> interface. What's wrong with that? I assume you are referring to a >> processor that runs in the GHz range, but even then it would be hard for >> it to push data out of parallel I/O at "hundreds of MHz". Since this >> would be a very atypical use of parallel I/Os, I can't imagine a chip >> maker who would put the I/O pins on the fast local bus. Rather they are >> typically connected through a slower bus for the peripherals. >> >> Can I ask why you don't want to use high speed serial I/Os (which are >> intended for this) or a combined FPGA/CPU chip? Is using the memory >> interface too obvious or is there a reason to not use that? > > I don't have a problem with using a memory interface, just would rather > avoid something like PCIe - I don't have a transceiver interface to receive > it. I just wasn't aware of anything that exported a 'simple' high speed > memory interface. > > The reason for not using a combined FPGA/CPU chip is that I'll be using a > dev board rather than making my own board (buying serious FPGAs in small > quantities isn't fun and I'd rather not have to do the DDR3/etc layout). On > the other side of my bridge CPLD/FPGA is a 1.5V parallel interface: most ARM > FPGA dev boards hardwire their pins to something other than 1.5v, so I can't > simply use the FPGA on the combined chip. I'm also physically constrained > which rules out a lot of dev boards.
I don't quite understand what you are looking for. You seem to be saying you want an off the shelf board? I don't know how you would interconnect an ARM board and an FPGA board at that high speed. -- Rick C
Dimiter_Popoff <dp@tgi-sci.com> wrote:
> I have used and programmed PCI a few times and assuming that PCIe is > very similar from a programmers point of view I'd say you can do it > without too much pain. > If your two sides will be aware of each other (i.e. not discover what > is there on the bus, things allowed to do, set address range etc.) you > can do it quite easily, you will only need the bus handshakes. And if > you restrict them to a known size (say, 32 bit) it becomes even easier.
PCIe is completely different at the electrical level: you need 2.5G, 5G or 8Gbps high speed serial transceivers. The lower tiers of FPGAs don't support such transceivers, so you're into the hundreds of dollars territory already. Then these fancy FPGAs only come in 500 pin BGAs... because they're fancy. That means you can't hand solder them, you have to contract that out. Then the 500 pin BGAs need an 8-layer PCB stack to get the ball escapes. And so on - the costs and complexity keep rising. I was kinda hoping to do this with an off-the-shelf SOM and a $10 CPLD/bargain bucket FPGA on a 4 layer board...
> > I hadn't thought about pretending to be SDRAM - will think about that. > > Which CPU did you find, Tim? I suspect finding a dev board that > > exposes SDRAM pins but wires DDR3 internally is going to be tricky. > > Should be doable, they do put 2 DIMMS per board after all (still so?).
It's not quite that simple. It turns out that NAND or NOR flash interfaces are a better bet than SDRAM because a lot more of the protocol stuff is left to software, while on SDRAM you have to dodge the controller helpfully reordering accesses and inserting refresh cycles. However a lot of SOMs don't export them, and when they do it's only 8 or 16 bit. 16 bits at ~100MHz is something, I suppose.
> But you are likely to get yourself into a nightmare of problems, trial > and error etc. - whereas the PCIe link will just work unless you abuse > it too much (not so long ago I switched from ATA to SATA - well, the > SATA part were just 4 signal wires and connecting them in a decent > way it just worked for me).
The retransmission masks a vast multitude of sins... (people don't realise how terrible their $1 SATA cables are until you start measuring their packet loss and crosstalk)
> You might want to look at the Freescale (now NXP) QorIQ parts, like the > t1040 or the t1042, large and not really cheap things but I think > they had smaller and cheaper there, too.
Wow, that's a family where marketing really need to get a grip - 32 bit, 64 bit, Power, ARM11, Cortex A, low power, baseband, server - throw them all into the same brand so the customer is thoroughly confused. I'm not sure I see anything distinctive there, but I'm mostly baffled by how they've organised them. Theo
rickman <gnuarm@gmail.com> wrote:
> I don't quite understand what you are looking for. You seem to be > saying you want an off the shelf board?
Off the shelf SOM (probably Cortex A but something non-ARM in that rough landscape is possible) to Custom carrier board with CPLD/small FPGA (space constrained) to 1.5V parallel TX/RX interface
> I don't know how you would interconnect an ARM board and an FPGA board at > that high speed.
The question is about picking a suitable SoC family to live on the SOM (lack of SOM availability being a secondary but relevant issue). Theo
On 9/9/2016 8:08 PM, Theo Markettos wrote:
> rickman <gnuarm@gmail.com> wrote: >> I don't quite understand what you are looking for. You seem to be >> saying you want an off the shelf board? > > Off the shelf SOM (probably Cortex A but something non-ARM in that rough > landscape is possible) > to > Custom carrier board with CPLD/small FPGA (space constrained) > to > 1.5V parallel TX/RX interface > >> I don't know how you would interconnect an ARM board and an FPGA board at >> that high speed. > > The question is about picking a suitable SoC family to live on the SOM > (lack of SOM availability being a secondary but relevant issue).
I don't know what to tell you. I would contact the makers of the Zedboard of one of the equivalent Altera based parts and ask about setting I/O voltage of 1.5 volts. The Zedboard seems to have provision to set the voltage on at least two banks of I/Os although 1.5 volts is not indicated. I expect a simple part change will get you 1.5 in place of 1.8 volts. Call the maker... I think this problem will be easier than getting the throughput between two boards that you need. -- Rick C
On 9/9/2016 7:58 PM, Theo Markettos wrote:
> Dimiter_Popoff <dp@tgi-sci.com> wrote: >> I have used and programmed PCI a few times and assuming that PCIe is >> very similar from a programmers point of view I'd say you can do it >> without too much pain. >> If your two sides will be aware of each other (i.e. not discover what >> is there on the bus, things allowed to do, set address range etc.) you >> can do it quite easily, you will only need the bus handshakes. And if >> you restrict them to a known size (say, 32 bit) it becomes even easier. > > PCIe is completely different at the electrical level: you need 2.5G, 5G or > 8Gbps high speed serial transceivers. The lower tiers of FPGAs don't > support such transceivers, so you're into the hundreds of dollars territory > already.
That shouldn't be true. I know some time back Lattice came out with low end FPGAs with SERDES and I thought X and A had to follow suit.
> Then these fancy FPGAs only come in 500 pin BGAs... because > they're fancy. That means you can't hand solder them, you have to contract > that out. Then the 500 pin BGAs need an 8-layer PCB stack to get the ball > escapes. And so on - the costs and complexity keep rising.
It has been a long time since you could hand solder any FPGA other than possibly the 144 pin TQFP which is a pretty large package.
> I was kinda hoping to do this with an off-the-shelf SOM and a $10 > CPLD/bargain bucket FPGA on a 4 layer board...
Check out the Lattice FPGAs. If you are designing your own FPGA board, why do you care if the FPGA is $10 or $50? That cost will be swamped by the cost of making a board.
>>> I hadn't thought about pretending to be SDRAM - will think about that. >>> Which CPU did you find, Tim? I suspect finding a dev board that >>> exposes SDRAM pins but wires DDR3 internally is going to be tricky. >> >> Should be doable, they do put 2 DIMMS per board after all (still so?). > > It's not quite that simple. > > It turns out that NAND or NOR flash interfaces are a better bet than SDRAM > because a lot more of the protocol stuff is left to software, while on SDRAM > you have to dodge the controller helpfully reordering accesses and inserting > refresh cycles. However a lot of SOMs don't export them, and when they do > it's only 8 or 16 bit. 16 bits at ~100MHz is something, I suppose.
Flash interfaces don't run at 100's of MHz.
>> But you are likely to get yourself into a nightmare of problems, trial >> and error etc. - whereas the PCIe link will just work unless you abuse >> it too much (not so long ago I switched from ATA to SATA - well, the >> SATA part were just 4 signal wires and connecting them in a decent >> way it just worked for me). > > The retransmission masks a vast multitude of sins... > (people don't realise how terrible their $1 SATA cables are until you start > measuring their packet loss and crosstalk) > >> You might want to look at the Freescale (now NXP) QorIQ parts, like the >> t1040 or the t1042, large and not really cheap things but I think >> they had smaller and cheaper there, too. > > Wow, that's a family where marketing really need to get a grip - 32 bit, 64 > bit, Power, ARM11, Cortex A, low power, baseband, server - throw them all > into the same brand so the customer is thoroughly confused. > I'm not sure I see anything distinctive there, but I'm mostly baffled by how > they've organised them.
Motorola (now Freescale... I mean NXP) has always been terrible at making the differences and similarities clear in their processor product lines. I think that partly comes from targeting the really large customers where they get *lots* of support to explain just what parts will suit their needs. -- Rick C
On Fri, 09 Sep 2016 15:34:23 +0100, Theo Markettos wrote:

> I'm looking for a Cortex-A class processor that has reasonably quick > parallel I/O that might be hooked up to an FPGA. I'm aware of the > existing Zynq and Altera SoC FPGAs, but looking for something different. > > By 'parallel I/O' I mean ideally a memory interface - either > bidirectional eg 64-bits or separate 32-bit tx and 32-bit rx. GPIO with > data valid signals/strobes is another possibility. By 'quick' I mean > hundreds of MHz, ideally with low latency. > > I know there are things like the TI PRU in eg the OMAP family, but they > seem to have a limited number of pins (16 bit tx/rx). > > Can anyone suggest anything else in this space? > > Thanks Theo
I couldn't find where you asked, but here's the processor I found: http://www.ti.com/lit/ds/symlink/am3358.pdf I'm absolutely not saying "use this one" -- it's just the first one I found, and it had a conventional memory bus. I suspect that if you look around you'll find something better. I looked at the general-purpose memory interface and it doesn't look fast enough for you -- it's calling out 100MHz or 50MHz clock on a 16-bit wide bus, and I'm not sure when you can drive it at 100MHz. DDR, OTOH, will go up to a 200MHz clock (with 400MHz data rate) -- that's why I was suggesting it, if things are simple enough on the FPGA side. -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
On Sat, 10 Sep 2016 09:51:35 -0500, Tim Wescott wrote:

> On Fri, 09 Sep 2016 15:34:23 +0100, Theo Markettos wrote: > >> I'm looking for a Cortex-A class processor that has reasonably quick >> parallel I/O that might be hooked up to an FPGA. I'm aware of the >> existing Zynq and Altera SoC FPGAs, but looking for something >> different. >> >> By 'parallel I/O' I mean ideally a memory interface - either >> bidirectional eg 64-bits or separate 32-bit tx and 32-bit rx. GPIO >> with data valid signals/strobes is another possibility. By 'quick' I >> mean hundreds of MHz, ideally with low latency. >> >> I know there are things like the TI PRU in eg the OMAP family, but they >> seem to have a limited number of pins (16 bit tx/rx). >> >> Can anyone suggest anything else in this space? >> >> Thanks Theo > > I couldn't find where you asked, but here's the processor I found: > > http://www.ti.com/lit/ds/symlink/am3358.pdf > > I'm absolutely not saying "use this one" -- it's just the first one I > found, and it had a conventional memory bus. I suspect that if you look > around you'll find something better. > > I looked at the general-purpose memory interface and it doesn't look > fast enough for you -- it's calling out 100MHz or 50MHz clock on a > 16-bit wide bus, and I'm not sure when you can drive it at 100MHz. > > DDR, OTOH, will go up to a 200MHz clock (with 400MHz data rate) -- > that's why I was suggesting it, if things are simple enough on the FPGA > side.
That "faster DDR clock means faster data" assertion assumes that you have big chunks to send -- it'll probably be slower to send single words, but you'll gain a lot if you can use burst mode. -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
Den fredag den 9. september 2016 kl. 18.37.54 UTC+2 skrev Theo Markettos:
> rickman <gnuarm@gmail.com> wrote: > > I think you are not going to find anything other than the memory > > interface. What's wrong with that? I assume you are referring to a > > processor that runs in the GHz range, but even then it would be hard for > > it to push data out of parallel I/O at "hundreds of MHz". Since this > > would be a very atypical use of parallel I/Os, I can't imagine a chip > > maker who would put the I/O pins on the fast local bus. Rather they are > > typically connected through a slower bus for the peripherals. > > > > Can I ask why you don't want to use high speed serial I/Os (which are > > intended for this) or a combined FPGA/CPU chip? Is using the memory > > interface too obvious or is there a reason to not use that? > > I don't have a problem with using a memory interface, just would rather > avoid something like PCIe - I don't have a transceiver interface to receive > it. I just wasn't aware of anything that exported a 'simple' high speed > memory interface. > > The reason for not using a combined FPGA/CPU chip is that I'll be using a > dev board rather than making my own board (buying serious FPGAs in small > quantities isn't fun and I'd rather not have to do the DDR3/etc layout). On > the other side of my bridge CPLD/FPGA is a 1.5V parallel interface: most ARM > FPGA dev boards hardwire their pins to something other than 1.5v, so I can't > simply use the FPGA on the combined chip. I'm also physically constrained > which rules out a lot of dev boards.
on Microzed the Vcco is in the connector so you can set it to 1.5V if you need -Lasse
On Saturday, September 10, 2016 at 9:51:46 AM UTC-5, Tim Wescott wrote:
> On Fri, 09 Sep 2016 15:34:23 +0100, Theo Markettos wrote: > > > I'm looking for a Cortex-A class processor that has reasonably quick > > parallel I/O that might be hooked up to an FPGA. I'm aware of the > > existing Zynq and Altera SoC FPGAs, but looking for something different. > > > > By 'parallel I/O' I mean ideally a memory interface - either > > bidirectional eg 64-bits or separate 32-bit tx and 32-bit rx. GPIO with > > data valid signals/strobes is another possibility. By 'quick' I mean > > hundreds of MHz, ideally with low latency. > > > > I know there are things like the TI PRU in eg the OMAP family, but they > > seem to have a limited number of pins (16 bit tx/rx). > > > > Can anyone suggest anything else in this space? > > > > Thanks Theo > > I couldn't find where you asked, but here's the processor I found: > > http://www.ti.com/lit/ds/symlink/am3358.pdf > > I'm absolutely not saying "use this one" -- it's just the first one I > found, and it had a conventional memory bus. I suspect that if you look > around you'll find something better. > > I looked at the general-purpose memory interface and it doesn't look fast > enough for you -- it's calling out 100MHz or 50MHz clock on a 16-bit wide > bus, and I'm not sure when you can drive it at 100MHz. > > DDR, OTOH, will go up to a 200MHz clock (with 400MHz data rate) -- that's > why I was suggesting it, if things are simple enough on the FPGA side. > > -- > Tim Wescott > Control systems, embedded software and circuit design > I'm looking for work! See my website if you're interested > http://www.wescottdesign.com
]> > I'm looking for a Cortex-A class processor that has reasonably quick ]> > parallel I/O that might be hooked up to an FPGA. I'm aware of the ]> > existing Zynq and Altera SoC FPGAs, but looking for something different. ] http://www.ti.com/lit/ds/symlink/am3358.pdf The TI part has a DRAM and a separate 2nd memory port with 7 chip selects. A lot better than a single memory port: DDR timing much different from FPGA IO port capabilities. With 7 chip selects one can have distinct FPGA read and write pins with tri-states on the read pins. Again, helps with timing in my experience. According to the literature the xilinx tools completely handle the internal interfaces within the Zynq part and Vivado/ISE gives you your timing pass/fail. I'd go with a SOC FPGA in a minute, given the timing troubles we had with a distinct ARM chip.

The 2024 Embedded Online Conference