MCU mimicking a SPI flash slave| page 3

Reply by rickman ●June 15, 20172017-06-15

Tom Gardner wrote on 6/15/2017 1:01 PM:
> On 15/06/17 17:13, rickman wrote:
>> Tom Gardner wrote on 6/15/2017 3:01 AM:
>>> On 15/06/17 02:54, rickman wrote:
>>>> John Speth wrote on 6/14/2017 1:44 PM:
>>>>> Hi folks-
>>>>>
>>>>> I'm hoping to tap into your various experiences to see if what I'm
>>>>> thinking
>>>>> is practical.
>>>>>
>>>>> Our customer has a device into which a Datakey is plugged to extract data
>>>>> stored in the device.  The Datakey is a really just a trade name for a SPI
>>>>> flash with ergonomic features that resemble a real key (see datakey.com).
>>>>> Our customer would like to replace the Datakey with a new design that will
>>>>> transmit the data wirelessly instead of storing it to the Datakey SPI
>>>>> flash.
>>>>>
>>>>> We're proposing a design based on an MCU which will appear as a SPI slave
>>>>> flash device, store the data pumped to it from the customer's device, and
>>>>> forward it wirelessly.  I'm wondering if it's practical to accomplish it.
>>>>> The firmware would have to be totally responsive temporally and in data
>>>>> content.  That sounds like a tough hill to climb that requires getting the
>>>>> interface just right, nearly perfect.  I see a lot of pitfalls that could
>>>>> make the effort a dead end.
>>>>>
>>>>> Does anybody have any success or failure stories to relate that would help
>>>>> us gauge the feasibility of the proposed design?
>>>>
>>>> I have no experience in building such a product, but why not use an actual
>>>> SPI
>>>> data flash as the intermediate storage while the MCU monitors the action
>>>> and
>>>> forwards the data?  This would require periods when the MCU could read the
>>>> data
>>>> flash.  Is that possible or practical?
>>>>
>>>> I had a design that needed a 30 MHz SPI like interface (not entirely
>>>> compatible
>>>> though) which was designed to read/write registers.  I considered using a
>>>> fast
>>>> CPU to emulate the interface in software, but it just couldn't meet the
>>>> timing.
>>>> Even with a 700 MIPS processor I wasn't sure it could receive the
>>>> command to
>>>> read a register and then get the data onto the output fast enough.  I'm
>>>> not sure
>>>> of the timing of the SPI interface you have, but it will likely be hard to
>>>> emulate the SPI in software.
>>>
>>> That sounds easy for the XMOS processors:
>>>  - multicore (up to 32 100MIPS cores)
>>>  - FPGA-like i/o, e.g. SERDES, buffers, programmable
>>>    clocks to 250Mb/s, separate timer for each port
>>>  - i/o timing changes can be specified/measured in
>>>    software with 4ns resolution
>>>  - guaranteed latencies <100ns
>>>  - *excellent* programming model based on CSP
>>>    (summary: of "Occam in C")
>>>  - Eclipse-based dev environment
>>
>> Any yet the XMOS can't run fast enough to return a result in the time
>> available.  What a pity, all dressed up an nowhere to go.
>
> You'll have to explain that, because I don't understand
> what you are referring to.

Maybe I don't understand the speed of the XMOS.  Nothing you write above 
says to me it is any faster than the 700 MIPS processor I considered. 
Nothing written above would allow it to perform the task any differently 
than the bit banging approach I initially considered.

So what are you confused about?

Maybe you don't understand the problem.  A bit serial data port (two bits 
wide actually, but that is not relevant) provides data, address and a two 
bit command word serially.  The last bit of the command work is indicated by 
a "command" signal going high during that bit.  Data is clocked in on the 
rising edge of the clock.  When the command word is a read, the read data is 
provided serially starting on the subsequent falling edge.  This provides 15 
ns from rising edge to falling edge.  It is possible to prefetch the read 
data based on the shifted address on each rising edge until the command 
signal is asserted.  This helps with timing of the memory fetch, but still, 
that data has to be presented to the serial port output in 15 ns and updated 
every 30 ns.

I'd like to see code that will make this work.  Or maybe you weren't 
addressing my previous application?  For the OP, can the XMOS emulate a 30 
Mbps SPI port?


>>> Currently I'm managing to count transitions in
>>> software on two 50Mb/s inputs, plus do
>>> simultaneous USB comms to a host computer.
>>
>> Can you output a result before the next input transition?
>
> Yes and no.
>
> Yes: the output (to a host processor and/or LCD)
> proceeds in parallel with the next capture phase.

I'm referring to bit I/O using the same 50 MHz clock.


> No: in the current incarnation there is a short gap
> between capture phases as the results are passed
> from one core to another and the next capture phase
> is started. While it isn't important in my application,
> I may be able to remove that limitation in future
> incarnations.

That's a lot more than a "short" gap in the context of a fast clock.


>>> Processors plus &pound;10 devkit available from Farnell
>>> and DigiKey.
>>
>> Not a very good processor to use in cost conscious applications.  The lowest
>> price at Digikey is $3 at qty 1000.  They can't make a part in the sub-dollar
>> range?
>
> Don't presume everybody has your constraints.

I don't, but not everyone has *your* constraints.  If a processor can't be 
sold in a low cost product, it cuts off the largest volume parts of the 
market including the app I am looking at presently.

-- 

Rick C

Reply by John Speth ●June 15, 20172017-06-15

On 6/14/2017 10:44 AM, John Speth wrote:
> Does anybody have any success or failure stories to relate that would
> help us gauge the feasibility of the proposed design?

Thanks all for the suggestions and stories.  We decided to take the safe 
route and design in a SPI flash with an MCU-controlled switch that will 
switch between external and internal access.  We figured the expense and 
certainty of the HW design is less than the time and uncertainty of the 
SW design.

We figured that with enough time end effort plus a worthy DMA engine, we 
could make an MCU SPI slave look like SPI flash, but with some 
yet-to-be-learned challenge.  We didn't have the time.

JJS

Reply by David Brown ●June 15, 20172017-06-15

On 15/06/17 22:52, rickman wrote:
> Tom Gardner wrote on 6/15/2017 1:01 PM:
>> On 15/06/17 17:13, rickman wrote:
>>> Tom Gardner wrote on 6/15/2017 3:01 AM:
>>>> On 15/06/17 02:54, rickman wrote:
>>>>> John Speth wrote on 6/14/2017 1:44 PM:
>>>>>> Hi folks-
>>>>>>
>>>>>> I'm hoping to tap into your various experiences to see if what I'm
>>>>>> thinking
>>>>>> is practical.
>>>>>>
>>>>>> Our customer has a device into which a Datakey is plugged to 
>>>>>> extract data
>>>>>> stored in the device.  The Datakey is a really just a trade name 
>>>>>> for a SPI
>>>>>> flash with ergonomic features that resemble a real key (see 
>>>>>> datakey.com).
>>>>>> Our customer would like to replace the Datakey with a new design 
>>>>>> that will
>>>>>> transmit the data wirelessly instead of storing it to the Datakey SPI
>>>>>> flash.
>>>>>>
>>>>>> We're proposing a design based on an MCU which will appear as a 
>>>>>> SPI slave
>>>>>> flash device, store the data pumped to it from the customer's 
>>>>>> device, and
>>>>>> forward it wirelessly.  I'm wondering if it's practical to 
>>>>>> accomplish it.
>>>>>> The firmware would have to be totally responsive temporally and in 
>>>>>> data
>>>>>> content.  That sounds like a tough hill to climb that requires 
>>>>>> getting the
>>>>>> interface just right, nearly perfect.  I see a lot of pitfalls 
>>>>>> that could
>>>>>> make the effort a dead end.
>>>>>>
>>>>>> Does anybody have any success or failure stories to relate that 
>>>>>> would help
>>>>>> us gauge the feasibility of the proposed design?
>>>>>
>>>>> I have no experience in building such a product, but why not use an 
>>>>> actual
>>>>> SPI
>>>>> data flash as the intermediate storage while the MCU monitors the 
>>>>> action
>>>>> and
>>>>> forwards the data?  This would require periods when the MCU could 
>>>>> read the
>>>>> data
>>>>> flash.  Is that possible or practical?
>>>>>
>>>>> I had a design that needed a 30 MHz SPI like interface (not entirely
>>>>> compatible
>>>>> though) which was designed to read/write registers.  I considered 
>>>>> using a
>>>>> fast
>>>>> CPU to emulate the interface in software, but it just couldn't meet 
>>>>> the
>>>>> timing.
>>>>> Even with a 700 MIPS processor I wasn't sure it could receive the
>>>>> command to
>>>>> read a register and then get the data onto the output fast enough.  
>>>>> I'm
>>>>> not sure
>>>>> of the timing of the SPI interface you have, but it will likely be 
>>>>> hard to
>>>>> emulate the SPI in software.
>>>>
>>>> That sounds easy for the XMOS processors:
>>>>  - multicore (up to 32 100MIPS cores)
>>>>  - FPGA-like i/o, e.g. SERDES, buffers, programmable
>>>>    clocks to 250Mb/s, separate timer for each port
>>>>  - i/o timing changes can be specified/measured in
>>>>    software with 4ns resolution
>>>>  - guaranteed latencies <100ns
>>>>  - *excellent* programming model based on CSP
>>>>    (summary: of "Occam in C")
>>>>  - Eclipse-based dev environment
>>>
>>> Any yet the XMOS can't run fast enough to return a result in the time
>>> available.  What a pity, all dressed up an nowhere to go.
>>
>> You'll have to explain that, because I don't understand
>> what you are referring to.
> 
> Maybe I don't understand the speed of the XMOS.  Nothing you write above 
> says to me it is any faster than the 700 MIPS processor I considered. 
> Nothing written above would allow it to perform the task any differently 
> than the bit banging approach I initially considered.

The XMOS devices clock at 500 MHz, with a dual-issue CPU for 1000 MIPS. 
And it really is 1000 MIPS - all code in data is in single-cycle SRAM. 
There is no cache, no prefetches, no branch prediction - everything is 
as fully deterministic as on a simple 8-bit microcontroller.  But that 
does require at least 5 hardware threads - the maximum for a single 
thread is 100 MHz, 200 MIPS.  (Some devices run at 400 MHz, and 
therefore fewer MIPS.  And older devices have only single-issue CPUs.)

The hardware timers, input/outputs, and serial-to-parallel and 
parallel-to-serial converters attached to the IO pins can handle 10 ns 
timing.

Of course, no one had mentioned that so far in this thread - so unless 
you were familiar with the devices, you would not know that.

> 
> So what are you confused about?
> 
> Maybe you don't understand the problem.  A bit serial data port (two 
> bits wide actually, but that is not relevant) provides data, address and 
> a two bit command word serially.  The last bit of the command work is 
> indicated by a "command" signal going high during that bit.  Data is 
> clocked in on the rising edge of the clock.  When the command word is a 
> read, the read data is provided serially starting on the subsequent 
> falling edge.  This provides 15 ns from rising edge to falling edge.  It 
> is possible to prefetch the read data based on the shifted address on 
> each rising edge until the command signal is asserted.  This helps with 
> timing of the memory fetch, but still, that data has to be presented to 
> the serial port output in 15 ns and updated every 30 ns.
> 
> I'd like to see code that will make this work.  Or maybe you weren't 
> addressing my previous application?  For the OP, can the XMOS emulate a 
> 30 Mbps SPI port?

The XMOS has parallel/serial converters for every GPIO pin.  For an 
application like this, you would use an 8-bit SERDES on the MISO and 
MOSI pins, and use the clock pin to trigger the transfers.

> 
> 
>>>> Currently I'm managing to count transitions in
>>>> software on two 50Mb/s inputs, plus do
>>>> simultaneous USB comms to a host computer.
>>>
>>> Can you output a result before the next input transition?
>>
>> Yes and no.
>>
>> Yes: the output (to a host processor and/or LCD)
>> proceeds in parallel with the next capture phase.
> 
> I'm referring to bit I/O using the same 50 MHz clock.
> 
> 
>> No: in the current incarnation there is a short gap
>> between capture phases as the results are passed
>> from one core to another and the next capture phase
>> is started. While it isn't important in my application,
>> I may be able to remove that limitation in future
>> incarnations.
> 
> That's a lot more than a "short" gap in the context of a fast clock.
> 
> 
>>>> Processors plus &pound;10 devkit available from Farnell
>>>> and DigiKey.
>>>
>>> Not a very good processor to use in cost conscious applications.  The 
>>> lowest
>>> price at Digikey is $3 at qty 1000.  They can't make a part in the 
>>> sub-dollar
>>> range?
>>
>> Don't presume everybody has your constraints.
> 
> I don't, but not everyone has *your* constraints.  If a processor can't 
> be sold in a low cost product, it cuts off the largest volume parts of 
> the market including the app I am looking at presently.
> 

You get a lot for your money with an XMOS - but not every application 
needs that power, and then your money is wasted.  They are certainly not 
as cheap as a small microcontroller, but if the alternative is an FPGA, 
they suddenly look much better value.

Reply by Don Y ●June 15, 20172017-06-15

On 6/15/2017 1:56 PM, John Speth wrote:
> On 6/14/2017 10:44 AM, John Speth wrote:
>> Does anybody have any success or failure stories to relate that would
>> help us gauge the feasibility of the proposed design?
>
> Thanks all for the suggestions and stories.  We decided to take the safe route
> and design in a SPI flash with an MCU-controlled switch that will switch
> between external and internal access.  We figured the expense and certainty of
> the HW design is less than the time and uncertainty of the SW design.
>
> We figured that with enough time end effort plus a worthy DMA engine, we could
> make an MCU SPI slave look like SPI flash, but with some yet-to-be-learned
> challenge.  We didn't have the time.

Dealing with the timing of the hardware interface (which, IMO, is all you've
*tried* to address with this approach) misses most of the bigger issues that
can arise.

[Disclaimer:  you've not told us anything at all about the Datakey's size,
how it is actually *used* -- as an "online" store (both written AND read)
or as a simple "batch" storage device (insert datakey, press button on
customer device, wait for idiot light to go out, remove datakey), etc.]

Presumably, it is used to stored configuration data and/or "reports"/logs.

The wireless link represents another opportunity for the key to be
"withdrawn" asynchronously (wrt whatever "normal usage procedure" is
imposed).

If the customer device treats it as a bit-bucket, there's no guarantee
that the data written is actually being delivered to the remote (wireless)
device (i.e., interference can corrupt the link at any time, the remote
device could be off-line, etc.).

If the customer device reads data from the datakey (e.g., to verify its
contents), you can only report a locally cached copy of those contents
(cuz you can't query the remote node in the time between address
specification and data read out on the SPI bus).  Can you guarantee that
the customer device will always write any datakey contents BEFORE it
tries to read them IN THIS SESSION?  (i.e., it writes to datakey #6
on tuesday then expects to read from it again on wednesday)

What do you do if you can't access that (previously written) data AT ALL?
Report "No Key Present" (i.e., LOFO released)?

If the customer device writes data, do you fake "WIP" until you know
that the data has traversed your wireless link?  What if the wireless
link takes longer (e.g., noise, busy, etc.) than any "WIP timeouts"
in the customer device will tolerate?

The Datakey has its own notion of atomic operations (per the SPI
interface).  Do you mimic these in your wireless protocol?
E.g., if the customer device writes a single byte, do you forward
that command to the remote device (so it can know that only the
addressed byte is altered and not the entire FLASH page)?
If the device writes X to location N and then writes Y to that same
location (before you've had a chance to inform the remote node
of the X write and receive acknowledgement), do you discard the
first (X) write on the basis that the second (Y) write will
overwrite it?

(The what-ifs are innumerable)

Retrofitting an interposed communication channel to an interface that
wasn't designed with "unreliable" communications in mind is fraught
with opportunities for things to misbehave -- in dubious ways
("Why isn't it working properly?")

And, we won't even address the issues where the customer gets a "brainstorm"
and thinks "Hey, we can put an A/DC on the remote node and use this
hardware interface in ways that we hadn't previously imagined!!"  :>

[My point in all this is to make sure you don't focus on the small
problem of interfacing a few electrical signals to your new device at
the risk of missing the other issues that can/will creep in after
you've cobbled the hardware together]

Reply by rickman ●June 15, 20172017-06-15

David Brown wrote on 6/15/2017 7:04 AM:
> On 15/06/17 10:41, Tom Gardner wrote:
>> On 15/06/17 09:02, David Brown wrote:
>>> On 15/06/17 09:25, Tom Gardner wrote:
>>>> On 15/06/17 08:14, David Brown wrote:
>>>>> On 14/06/17 19:44, John Speth wrote:
>>>>>> Hi folks-
>>>>>>
>>>>>> I'm hoping to tap into your various experiences to see if what I'm
>>>>>> thinking is practical.
>>>>>>
>>>>>> Our customer has a device into which a Datakey is plugged to extract
>>>>>> data stored in the device.  The Datakey is a really just a trade name
>>>>>> for a SPI flash with ergonomic features that resemble a real key (see
>>>>>> datakey.com).  Our customer would like to replace the Datakey with
>>>>>> a new
>>>>>> design that will transmit the data wirelessly instead of storing it to
>>>>>> the Datakey SPI flash.
>>>>>>
>>>>>> We're proposing a design based on an MCU which will appear as a SPI
>>>>>> slave flash device, store the data pumped to it from the customer's
>>>>>> device, and forward it wirelessly.  I'm wondering if it's practical to
>>>>>> accomplish it.  The firmware would have to be totally responsive
>>>>>> temporally and in data content.  That sounds like a tough hill to
>>>>>> climb
>>>>>> that requires getting the interface just right, nearly perfect.  I
>>>>>> see a
>>>>>> lot of pitfalls that could make the effort a dead end.
>>>>>>
>>>>>> Does anybody have any success or failure stories to relate that would
>>>>>> help us gauge the feasibility of the proposed design?
>>>>>>
>>>>>> Thanks - John Speth
>>>>>
>>>>> If you make sure you have a MCU with good slave SPI support and DMA,
>>>>> then the actual SPI transfers should not be an issue (assuming a sane
>>>>> SPI clock speed).  The cpu would then not be involved in the
>>>>> back-to-back SPI moves at all.  Pretty much any Cortex M3/4 device
>>>>> should cope fine.
>>>>>
>>>>> The key challenge is how to deal with the commands - the bits that need
>>>>> some processing.  That will probably mean fast response to the first
>>>>> few
>>>>> incoming SPI bytes via an interrupt function.  Your requirements here
>>>>> will depend on the protocol, the clock speed, the processing time
>>>>> required, etc.
>>>>>
>>>>> Write-style SPI commands are no problem - just buffer up the SPI
>>>>> command
>>>>> and data with DMA as it comes in.  It's the read-style ones that are
>>>>> the
>>>>> killer.  For very fast SPI interfaces, it is not uncommon that with a
>>>>> read command, the first byte returned is a dummy byte - that gives you
>>>>> the time to fill your DMA buffer with the real data for the reply.  But
>>>>> not all SPI protocols use that, and they might have some reads that
>>>>> need
>>>>> immediate response.  If you can get that under control, the rest should
>>>>> be (relatively) easy.
>>>>>
>>>>> If you can't build this from a normal microcontroller because you need
>>>>> too fast response, your options are FPGA (either as an assist to a
>>>>> microcontroller, or with a processor in the FPGA) or perhaps an XMOS
>>>>> microcontroller.  XMOS programming is a bit different from normal
>>>>> microcontroller programming, but it will handle tasks like this easily.
>>>>
>>>> I'll just say that I find XMOS programming to be *very*
>>>> easy compared to C+RTOS -- doubly so for a hardware engineer.
>>>>
>>>> The "RTOS" is in hardware => very low API learning curve :)
>>>>
>>>
>>> XMOS programming is very easy for some things, but hard for other
>>> things.  In particular, there is a limit to the scalability.  It starts
>>> off easy splitting your work into lots of parallel tasks, putting them
>>> each in threads, and everything is fine - until you run out of hardware
>>> threads.  Then you end up destroying your structure to get a several
>>> logically parallel tasks running in the same hardware thread, or running
>>> a software RTOS in one hardware thread.  And when you have your multiple
>>> high-speed interfaces working, you find you don't have enough ram left
>>> for buffers.
>>
>> Agreed. But then there are cliffs in /everything/
>> so it is unsurprising if this toolset also has them :)
>>
>> A more interesting question is whether the limitations
>> are significant in a given context - and that's less
>> easy to quantify.
>
> Absolutely.
>
> One point to note is that when I worked with XMOS, one of their big
> features was that their architecture was so fast it could make a USB
> interface and an Ethernet interface in software.  And yes, it /could/
> work - but the cost in hardware threads and memory space (code and data
> go in the same ram) meant that there was very little left to do anything
> useful with the device.  Since then, XMOS have realised that the big
> blocks are much more efficient in hardware - they now have devices with
> hardware Ethernet and USB rather than making them in software.  The
> devices also have much more ram.
>
> I think XMOS should go further, and include hardware peripherals for
> common features - in particular, UARTs, I2C, SPI, timers.  A simple
> hardware UART peripheral is a small and easy part to make in a chip
> design.  With the XMOS, you can do it in software - you can write a
> nice, clear UART software block, without much coding space.  But it
> takes two hardware threads - a quarter of your chip's power if you have
> a small device.  Turned into cash, that's about $2 for a UART.  On a
> chip microcontroller, you have lots of UARTs and they don't take
> significant resources - it's a "cost" of perhaps $0.02.  For peripherals
> that you often need, it is /much/ cheaper to have dedicated hardware
> than to have them in software.

I think that depends on the design.  Perhaps you are familiar with the GA144 
with 144 processors at a cost of around $0.10 per CPU.  They took a similar 
route with NO dedicated I/O hardware other than a SERDES receiver and 
transmitter pair.

It is expected that all I/O would be through software.  Along that line the 
chip boots through one of three I/O ports, async serial, SPI serial, a 2 
wire interface and I believe there is a 1 wire interface but I'm not 
certain.  Three of the CPUs can be ganged to form a parallel port/memory 
interface, two CPUs with 18 bits of I/O and the third 4 bits.  All of this 
is controlled by software.

I find the device has significant limitations overall, but certainly with a 
peak execution rate of 700 MIPS there is a lot of potential.  Much like no 
one focuses on the idea that using the 4 input LUT of an FPGA as an inverter 
is excessively wasteful, the GA144 with its 10 cent CPUs gets us out of the 
thinking that using a CPU as a UART is wasteful.

Not trying to be negative, but I see the $2 cost of an XMOS CPU as being 
excessive and wasteful.  Heck, you can buy complete MCU devices for a 
fraction of that price.


>> XMOS does provide some techniques (e.g. composable
>> and interfaces) to reduce the sharpness of the cliffs,
>> but not to eliminate them. But then the same is true
>> of ARM+RTOS etc etc.
>
> Yes, there are learning curves everywhere.  And scope for getting things
> wrong :-)  XMOS gives a different balance amongst many of the challenges
> facing designers - it is better in some ways, worse in other ways.
>
>>
>> I wouldn't regard XMOS as being a replacement for
>> a general-purpose processor, but there is a large
>> overlap in many hard-realtime applications.
>>
>> Similarly I wouldn't regard an ARM as being a
>> replacement for a CPLD/FPGA, but there can be
>> an overlap in many soft-realtime applications
>>
>> The XMOS devices inhabit an interesting and important
>> niche between those two.
>
> Agreed.

The question is whether it is worth investing the time and energy into 
learning the chip if you don't focus your work in this realm.  Personally I 
find the FPGA approach covers a *lot* of ground that others don't see and 
the region left that is not so easy to address with either FPGAs or more 
conventional CPUs is very limited.  If the XMOS isn't a good price fit, I 
most likely would just go with a small FPGA.  I saw the XMOS has a 12 bit, 1 
MSPS ADC which is nice.  But again, this only makes its range of good fit 
slightly larger.


>>> XMOS devices can be a lot of fun, and I would enjoy working with them
>>> again.  For some tasks, they are an ideal solution (like for this one),
>>> once you have learned to use them.  But not everything is easy with
>>> them, and you have to think in a somewhat different way.
>>
>> Yes, but the learning curve is very short and
>> there are no unpleasant surprises.
>>
>> IMNSHO "thinking in CSP" will help structure
>> thoughts for any realtime application, whether
>> or not it is formally part of the implementation.
>>
>
> Agreed.

I prefer Forth for embedded work.  I believe there is a Forth available.  I 
don't know if it captures any of the flavor of CSP, mostly because I know 
little about CSP.  I did use Occam some time ago.  Mostly I recall it had a 
lot of constraints on what the programmer was allowed to do.  The project we 
were using the Transputer on programmed it all in C.

-- 

Rick C

Reply by John Speth ●June 15, 20172017-06-15

Sorry about not being clear describing my endeavor.  My only concern was 
the difficulty of making the MCU look like SPI flash.  All other 
functionality is not a problem.

JJS

Reply by rickman ●June 15, 20172017-06-15

David Brown wrote on 6/15/2017 5:37 PM:
> On 15/06/17 22:52, rickman wrote:
>> Tom Gardner wrote on 6/15/2017 1:01 PM:
>>> On 15/06/17 17:13, rickman wrote:
>>>> Tom Gardner wrote on 6/15/2017 3:01 AM:
>>>>> On 15/06/17 02:54, rickman wrote:
>>>>>> John Speth wrote on 6/14/2017 1:44 PM:
>>>>>>> Hi folks-
>>>>>>>
>>>>>>> I'm hoping to tap into your various experiences to see if what I'm
>>>>>>> thinking
>>>>>>> is practical.
>>>>>>>
>>>>>>> Our customer has a device into which a Datakey is plugged to extract
>>>>>>> data
>>>>>>> stored in the device.  The Datakey is a really just a trade name for
>>>>>>> a SPI
>>>>>>> flash with ergonomic features that resemble a real key (see
>>>>>>> datakey.com).
>>>>>>> Our customer would like to replace the Datakey with a new design that
>>>>>>> will
>>>>>>> transmit the data wirelessly instead of storing it to the Datakey SPI
>>>>>>> flash.
>>>>>>>
>>>>>>> We're proposing a design based on an MCU which will appear as a SPI
>>>>>>> slave
>>>>>>> flash device, store the data pumped to it from the customer's device,
>>>>>>> and
>>>>>>> forward it wirelessly.  I'm wondering if it's practical to accomplish
>>>>>>> it.
>>>>>>> The firmware would have to be totally responsive temporally and in data
>>>>>>> content.  That sounds like a tough hill to climb that requires
>>>>>>> getting the
>>>>>>> interface just right, nearly perfect.  I see a lot of pitfalls that
>>>>>>> could
>>>>>>> make the effort a dead end.
>>>>>>>
>>>>>>> Does anybody have any success or failure stories to relate that would
>>>>>>> help
>>>>>>> us gauge the feasibility of the proposed design?
>>>>>>
>>>>>> I have no experience in building such a product, but why not use an
>>>>>> actual
>>>>>> SPI
>>>>>> data flash as the intermediate storage while the MCU monitors the action
>>>>>> and
>>>>>> forwards the data?  This would require periods when the MCU could read
>>>>>> the
>>>>>> data
>>>>>> flash.  Is that possible or practical?
>>>>>>
>>>>>> I had a design that needed a 30 MHz SPI like interface (not entirely
>>>>>> compatible
>>>>>> though) which was designed to read/write registers.  I considered using a
>>>>>> fast
>>>>>> CPU to emulate the interface in software, but it just couldn't meet the
>>>>>> timing.
>>>>>> Even with a 700 MIPS processor I wasn't sure it could receive the
>>>>>> command to
>>>>>> read a register and then get the data onto the output fast enough.  I'm
>>>>>> not sure
>>>>>> of the timing of the SPI interface you have, but it will likely be
>>>>>> hard to
>>>>>> emulate the SPI in software.
>>>>>
>>>>> That sounds easy for the XMOS processors:
>>>>>  - multicore (up to 32 100MIPS cores)
>>>>>  - FPGA-like i/o, e.g. SERDES, buffers, programmable
>>>>>    clocks to 250Mb/s, separate timer for each port
>>>>>  - i/o timing changes can be specified/measured in
>>>>>    software with 4ns resolution
>>>>>  - guaranteed latencies <100ns
>>>>>  - *excellent* programming model based on CSP
>>>>>    (summary: of "Occam in C")
>>>>>  - Eclipse-based dev environment
>>>>
>>>> Any yet the XMOS can't run fast enough to return a result in the time
>>>> available.  What a pity, all dressed up an nowhere to go.
>>>
>>> You'll have to explain that, because I don't understand
>>> what you are referring to.
>>
>> Maybe I don't understand the speed of the XMOS.  Nothing you write above
>> says to me it is any faster than the 700 MIPS processor I considered.
>> Nothing written above would allow it to perform the task any differently
>> than the bit banging approach I initially considered.
>
> The XMOS devices clock at 500 MHz, with a dual-issue CPU for 1000 MIPS. And
> it really is 1000 MIPS - all code in data is in single-cycle SRAM. There is
> no cache, no prefetches, no branch prediction - everything is as fully
> deterministic as on a simple 8-bit microcontroller.  But that does require
> at least 5 hardware threads - the maximum for a single thread is 100 MHz,
> 200 MIPS.  (Some devices run at 400 MHz, and therefore fewer MIPS.  And
> older devices have only single-issue CPUs.)
>
> The hardware timers, input/outputs, and serial-to-parallel and
> parallel-to-serial converters attached to the IO pins can handle 10 ns timing.
>
> Of course, no one had mentioned that so far in this thread - so unless you
> were familiar with the devices, you would not know that.

I think you are saying the CPU can do things with a 10 ns time resolution, 
no?  That is the relevant number for this if bit banging the I/O port.  I 
assume the "dual issue" can't simultaneously execute instructions where one 
depends on the result from the other?


>> So what are you confused about?
>>
>> Maybe you don't understand the problem.  A bit serial data port (two bits
>> wide actually, but that is not relevant) provides data, address and a two
>> bit command word serially.  The last bit of the command work is indicated
>> by a "command" signal going high during that bit.  Data is clocked in on
>> the rising edge of the clock.  When the command word is a read, the read
>> data is provided serially starting on the subsequent falling edge.  This
>> provides 15 ns from rising edge to falling edge.  It is possible to
>> prefetch the read data based on the shifted address on each rising edge
>> until the command signal is asserted.  This helps with timing of the
>> memory fetch, but still, that data has to be presented to the serial port
>> output in 15 ns and updated every 30 ns.
>>
>> I'd like to see code that will make this work.  Or maybe you weren't
>> addressing my previous application?  For the OP, can the XMOS emulate a 30
>> Mbps SPI port?
>
> The XMOS has parallel/serial converters for every GPIO pin.  For an
> application like this, you would use an 8-bit SERDES on the MISO and MOSI
> pins, and use the clock pin to trigger the transfers.

Serial to parallel converters may not help with this design.  Data is 8 
bits, address is 8 bits, command is 2 bits, the incoming data path is 2 bits 
wide, outgoing data path is 1 bit wide.  The design was made to be tolerant 
of extraneous clock edges between words.  The end of the serial transfer was 
flagged by a CMD signal going high on the 2 bit command word transfer.  I 
don't see how an 8 bit serial shift register would help receiving the input 
data even if it were only 1 bit wide (or you use two CPUs) since you can't 
rely on the clock count to always be right, *plus* you get a single clock 
with 2 input bits and a flag to indicate the end of the input transfer. 
Then you have 15 ns (minus setup and hold time on the output pin) to fetch 
the data and start shifting it out.

I recall even in the FPGA I was reading the output of a register mux which 
then had to feed a shift register.  I used another 1 bit mux to select the 
output of the mux for the first bit and the output of the shift register for 
the remaining bits *and* the timing was tight.

The data ports of the design had serial interfaces up to 50 MHz, time 
correlated to a CODEC.  The CODEC received a time code which was transmitted 
along with the digital data in packets over IP.  The same board on the other 
end received the packets and reconstructed the data and time  stamp.

Once an FPGA was on the board there was no reason to use a CPU, although I 
would have liked to have a hybrid chip with about 1000-4 input LUTs and a 
moderate CPU or DSP even.  Add a 16 bit stereo CODEC and it would be perfect!


>>>>> Currently I'm managing to count transitions in
>>>>> software on two 50Mb/s inputs, plus do
>>>>> simultaneous USB comms to a host computer.
>>>>
>>>> Can you output a result before the next input transition?
>>>
>>> Yes and no.
>>>
>>> Yes: the output (to a host processor and/or LCD)
>>> proceeds in parallel with the next capture phase.
>>
>> I'm referring to bit I/O using the same 50 MHz clock.
>>
>>
>>> No: in the current incarnation there is a short gap
>>> between capture phases as the results are passed
>>> from one core to another and the next capture phase
>>> is started. While it isn't important in my application,
>>> I may be able to remove that limitation in future
>>> incarnations.
>>
>> That's a lot more than a "short" gap in the context of a fast clock.
>>
>>
>>>>> Processors plus &pound;10 devkit available from Farnell
>>>>> and DigiKey.
>>>>
>>>> Not a very good processor to use in cost conscious applications.  The
>>>> lowest
>>>> price at Digikey is $3 at qty 1000.  They can't make a part in the
>>>> sub-dollar
>>>> range?
>>>
>>> Don't presume everybody has your constraints.
>>
>> I don't, but not everyone has *your* constraints.  If a processor can't be
>> sold in a low cost product, it cuts off the largest volume parts of the
>> market including the app I am looking at presently.
>>
>
> You get a lot for your money with an XMOS - but not every application needs
> that power, and then your money is wasted.  They are certainly not as cheap
> as a small microcontroller, but if the alternative is an FPGA, they suddenly
> look much better value.

I wonder why they can't make lower cost versions?  The GA144 has 144 
processors, $15 @ qty 1.  It's not even a modern process node, 150 or 180 nm 
I think, 100 times more area than what they are using today.

-- 

Rick C

Reply by rickman ●June 15, 20172017-06-15

John Speth wrote on 6/15/2017 4:56 PM:
> On 6/14/2017 10:44 AM, John Speth wrote:
>> Does anybody have any success or failure stories to relate that would
>> help us gauge the feasibility of the proposed design?
>
> Thanks all for the suggestions and stories.  We decided to take the safe
> route and design in a SPI flash with an MCU-controlled switch that will
> switch between external and internal access.  We figured the expense and
> certainty of the HW design is less than the time and uncertainty of the SW
> design.
>
> We figured that with enough time end effort plus a worthy DMA engine, we
> could make an MCU SPI slave look like SPI flash, but with some
> yet-to-be-learned challenge.  We didn't have the time.

I will say that while you will have to meet the specs of the data flash as 
presented, I am pretty sure the data flash is meeting the same specs with a 
processor.  But they tailor the CPU to work optimally and tailor the specs 
to match the limitations of the CPU while you don't have any of that 
flexibility.

-- 

Rick C

Reply by Tom Gardner ●June 15, 20172017-06-15

On 15/06/17 21:52, rickman wrote:
> Tom Gardner wrote on 6/15/2017 1:01 PM:
>> On 15/06/17 17:13, rickman wrote:
>>> Tom Gardner wrote on 6/15/2017 3:01 AM:
>>>> On 15/06/17 02:54, rickman wrote:
>>>>> John Speth wrote on 6/14/2017 1:44 PM:
>>>>>> Hi folks-
>>>>>>
>>>>>> I'm hoping to tap into your various experiences to see if what I'm
>>>>>> thinking
>>>>>> is practical.
>>>>>>
>>>>>> Our customer has a device into which a Datakey is plugged to extract data
>>>>>> stored in the device.  The Datakey is a really just a trade name for a SPI
>>>>>> flash with ergonomic features that resemble a real key (see datakey.com).
>>>>>> Our customer would like to replace the Datakey with a new design that will
>>>>>> transmit the data wirelessly instead of storing it to the Datakey SPI
>>>>>> flash.
>>>>>>
>>>>>> We're proposing a design based on an MCU which will appear as a SPI slave
>>>>>> flash device, store the data pumped to it from the customer's device, and
>>>>>> forward it wirelessly.  I'm wondering if it's practical to accomplish it.
>>>>>> The firmware would have to be totally responsive temporally and in data
>>>>>> content.  That sounds like a tough hill to climb that requires getting the
>>>>>> interface just right, nearly perfect.  I see a lot of pitfalls that could
>>>>>> make the effort a dead end.
>>>>>>
>>>>>> Does anybody have any success or failure stories to relate that would help
>>>>>> us gauge the feasibility of the proposed design?
>>>>>
>>>>> I have no experience in building such a product, but why not use an actual
>>>>> SPI
>>>>> data flash as the intermediate storage while the MCU monitors the action
>>>>> and
>>>>> forwards the data?  This would require periods when the MCU could read the
>>>>> data
>>>>> flash.  Is that possible or practical?
>>>>>
>>>>> I had a design that needed a 30 MHz SPI like interface (not entirely
>>>>> compatible
>>>>> though) which was designed to read/write registers.  I considered using a
>>>>> fast
>>>>> CPU to emulate the interface in software, but it just couldn't meet the
>>>>> timing.
>>>>> Even with a 700 MIPS processor I wasn't sure it could receive the
>>>>> command to
>>>>> read a register and then get the data onto the output fast enough.  I'm
>>>>> not sure
>>>>> of the timing of the SPI interface you have, but it will likely be hard to
>>>>> emulate the SPI in software.
>>>>
>>>> That sounds easy for the XMOS processors:
>>>>  - multicore (up to 32 100MIPS cores)
>>>>  - FPGA-like i/o, e.g. SERDES, buffers, programmable
>>>>    clocks to 250Mb/s, separate timer for each port
>>>>  - i/o timing changes can be specified/measured in
>>>>    software with 4ns resolution
>>>>  - guaranteed latencies <100ns
>>>>  - *excellent* programming model based on CSP
>>>>    (summary: of "Occam in C")
>>>>  - Eclipse-based dev environment
>>>
>>> Any yet the XMOS can't run fast enough to return a result in the time
>>> available.  What a pity, all dressed up an nowhere to go.
>>
>> You'll have to explain that, because I don't understand
>> what you are referring to.
>
> Maybe I don't understand the speed of the XMOS.

A 30000ft opverview:
http://www.xmos.com/published/xcore-architecture-flyer?version=latest

> Nothing you write above says to
> me it is any faster than the 700 MIPS processor I considered. Nothing written
> above would allow it to perform the task any differently than the bit banging
> approach I initially considered.

What are the *guaranteed* timings and latencies?
Include all possible disturbances due to cache/TLB
misses, branch predictions and interrupts.
Include simultaneous USB comms to a host PC.

N.B. "guaranteed" /precludes/ measurements to see
what is happening. "Guaranteed" requires accurate
/prediction/ *before* the code executes.


> So what are you confused about?
>
> Maybe you don't understand the problem.  A bit serial data port (two bits wide
> actually, but that is not relevant) provides data, address and a two bit command
> word serially.  The last bit of the command work is indicated by a "command"
> signal going high during that bit.  Data is clocked in on the rising edge of the
> clock.  When the command word is a read, the read data is provided serially
> starting on the subsequent falling edge.  This provides 15 ns from rising edge
> to falling edge.  It is possible to prefetch the read data based on the shifted
> address on each rising edge until the command signal is asserted.  This helps
> with timing of the memory fetch, but still, that data has to be presented to the
> serial port output in 15 ns and updated every 30 ns.
>
> I'd like to see code that will make this work.  Or maybe you weren't addressing
> my previous application?  For the OP, can the XMOS emulate a 30 Mbps SPI port?

Each i/o pin can be clocked at up to 250Mb/s.
That clock can come from an internal clock
(up to 500MHz) or an external pin. The latter
sounds relevant for your case. (Strobed and
master/slave interfaces are also directly
supported in hardware and software).

Each I/O pin has SERDES registers, so the data
rate processed by a core can be reduced by a factor
of 32.

So yes, it does look like a /small/ fraction of an xCORE
device could very comfortably support that speed.


>>>> Currently I'm managing to count transitions in
>>>> software on two 50Mb/s inputs, plus do
>>>> simultaneous USB comms to a host computer.
>>>
>>> Can you output a result before the next input transition?
>>
>> Yes and no.
>>
>> Yes: the output (to a host processor and/or LCD)
>> proceeds in parallel with the next capture phase.
>
> I'm referring to bit I/O using the same 50 MHz clock.
>
>
>> No: in the current incarnation there is a short gap
>> between capture phases as the results are passed
>> from one core to another and the next capture phase
>> is started. While it isn't important in my application,
>> I may be able to remove that limitation in future
>> incarnations.
>
> That's a lot more than a "short" gap in the context of a fast clock.

Sigh.

My application only stops for a few microseconds when
it is convenient for my application, i.e. once every
1 or 10seconds at the end of a measurement cycle.

At other times it chunters away continuously at full
rate without interruption - as guaranteed by design.



>>>> Processors plus &pound;10 devkit available from Farnell
>>>> and DigiKey.
>>>
>>> Not a very good processor to use in cost conscious applications.  The lowest
>>> price at Digikey is $3 at qty 1000.  They can't make a part in the sub-dollar
>>> range?
>>
>> Don't presume everybody has your constraints.
>
> I don't, but not everyone has *your* constraints.  If a processor can't be sold
> in a low cost product, it cuts off the largest volume parts of the market
> including the app I am looking at presently.

I don't think there are any surprises there.

But, again, your (current) requirement is only one
perspective.

Reply by rickman ●June 16, 20172017-06-16

Tom Gardner wrote on 6/15/2017 6:33 PM:
> On 15/06/17 21:52, rickman wrote:
>> Tom Gardner wrote on 6/15/2017 1:01 PM:
>>> On 15/06/17 17:13, rickman wrote:
>>>> Tom Gardner wrote on 6/15/2017 3:01 AM:
>>>>>
>>>>> That sounds easy for the XMOS processors:
>>>>>  - multicore (up to 32 100MIPS cores)
>>>>>  - FPGA-like i/o, e.g. SERDES, buffers, programmable
>>>>>    clocks to 250Mb/s, separate timer for each port
>>>>>  - i/o timing changes can be specified/measured in
>>>>>    software with 4ns resolution
>>>>>  - guaranteed latencies <100ns
>>>>>  - *excellent* programming model based on CSP
>>>>>    (summary: of "Occam in C")
>>>>>  - Eclipse-based dev environment
>>>>
>>>> Any yet the XMOS can't run fast enough to return a result in the time
>>>> available.  What a pity, all dressed up an nowhere to go.
>>>
>>> You'll have to explain that, because I don't understand
>>> what you are referring to.
>>
>> Maybe I don't understand the speed of the XMOS.
>
> A 30000ft opverview:
> http://www.xmos.com/published/xcore-architecture-flyer?version=latest
>
>> Nothing you write above says to
>> me it is any faster than the 700 MIPS processor I considered. Nothing written
>> above would allow it to perform the task any differently than the bit banging
>> approach I initially considered.
>
> What are the *guaranteed* timings and latencies?
> Include all possible disturbances due to cache/TLB
> misses, branch predictions and interrupts.
> Include simultaneous USB comms to a host PC.
>
> N.B. "guaranteed" /precludes/ measurements to see
> what is happening. "Guaranteed" requires accurate
> /prediction/ *before* the code executes.

I know this is clear to you, but I'm not sure what processor you are talking 
about.


>> So what are you confused about?
>>
>> Maybe you don't understand the problem.  A bit serial data port (two bits
>> wide
>> actually, but that is not relevant) provides data, address and a two bit
>> command
>> word serially.  The last bit of the command work is indicated by a "command"
>> signal going high during that bit.  Data is clocked in on the rising edge
>> of the
>> clock.  When the command word is a read, the read data is provided serially
>> starting on the subsequent falling edge.  This provides 15 ns from rising
>> edge
>> to falling edge.  It is possible to prefetch the read data based on the
>> shifted
>> address on each rising edge until the command signal is asserted.  This helps
>> with timing of the memory fetch, but still, that data has to be presented
>> to the
>> serial port output in 15 ns and updated every 30 ns.
>>
>> I'd like to see code that will make this work.  Or maybe you weren't
>> addressing
>> my previous application?  For the OP, can the XMOS emulate a 30 Mbps SPI
>> port?
>
> Each i/o pin can be clocked at up to 250Mb/s.
> That clock can come from an internal clock
> (up to 500MHz) or an external pin. The latter
> sounds relevant for your case. (Strobed and
> master/slave interfaces are also directly
> supported in hardware and software).
>
> Each I/O pin has SERDES registers, so the data
> rate processed by a core can be reduced by a factor
> of 32.
>
> So yes, it does look like a /small/ fraction of an xCORE
> device could very comfortably support that speed.

You still are not addressing the issue at hand.  You are talking about raw 
I/O speeds and the problem is not about raw I/O speeds.  The issue is 
interactivity.  Stimulus followed by response in very short order.  I have 
seen nothing to indicate the XMOS will work for this problem.  That's why I 
asked for a code snippet.


>>>>> Currently I'm managing to count transitions in
>>>>> software on two 50Mb/s inputs, plus do
>>>>> simultaneous USB comms to a host computer.
>>>>
>>>> Can you output a result before the next input transition?
>>>
>>> Yes and no.
>>>
>>> Yes: the output (to a host processor and/or LCD)
>>> proceeds in parallel with the next capture phase.
>>
>> I'm referring to bit I/O using the same 50 MHz clock.
>>
>>
>>> No: in the current incarnation there is a short gap
>>> between capture phases as the results are passed
>>> from one core to another and the next capture phase
>>> is started. While it isn't important in my application,
>>> I may be able to remove that limitation in future
>>> incarnations.
>>
>> That's a lot more than a "short" gap in the context of a fast clock.
>
> Sigh.
>
> My application only stops for a few microseconds when
> it is convenient for my application, i.e. once every
> 1 or 10seconds at the end of a measurement cycle.
>
> At other times it chunters away continuously at full
> rate without interruption - as guaranteed by design.

Fine, but the fact that it will work for your needs does not mean it will 
work for mine.  Again the requirement is to read the inputs looking for a 
command strobe, on finding that retrieve the appropriate word from memory 
and outputting it.  The clock cycle is 30 ns and the total I/O time from 
command strobe read on the positive edge of the clock to the input of the 
device monitoring the output pin (with a 5 ns setup time) 15 ns.  You 
haven't even addressed the output delays in the I/O pins.


>>>>> Processors plus &pound;10 devkit available from Farnell
>>>>> and DigiKey.
>>>>
>>>> Not a very good processor to use in cost conscious applications.  The
>>>> lowest
>>>> price at Digikey is $3 at qty 1000.  They can't make a part in the
>>>> sub-dollar
>>>> range?
>>>
>>> Don't presume everybody has your constraints.
>>
>> I don't, but not everyone has *your* constraints.  If a processor can't be
>> sold
>> in a low cost product, it cuts off the largest volume parts of the market
>> including the app I am looking at presently.
>
> I don't think there are any surprises there.
>
> But, again, your (current) requirement is only one
> perspective.

I can't argue with that.  But that is the need I have and low cost is not a 
very minor requirement.  Processors are sold at much higher volumes for the 
low cost products.  The higher cost, lower volume products often can be 
built with a wide range of solutions, again the selected device is often 
chosen as the one that meets the requirements at the lowest price.  So 
shrugging off the cost issue of *my* current need is rather disingenuous.

-- 

Rick C

Previous 1 234 5 6 Next

MCU mimicking a SPI flash slave

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group