SD/MMC Interface (LPC2378) - reaction time in FIFO mode
Today I worked for the first time with the SD/MMC interface of the LPC23XX to read an SD card. Initially I did this using the FIFO but am not sure whether this is a practical approach for real work.
The reason is that the controller reads a complete block of data and the FIFO has space to hold maximum 16 x 4 = 64 bytes (I believe this is the case because the actual size doesn't seem to be specified explicitly but this is about when rx overflows start). A block of data is generally 512 bytes in size but could be more, so this involves filling (and clearing) the FIFO at least 8 times.
Setting the interface to 20MHz, which is seemingly the maximum possible when running at 72MHz (HCSD cards should run at least 20MHz) I realised that this was reading 10MByte/s from the card and the FIFO was filling in about 6us. The processor needed to read the FIFO to make more space as fast as the data arrived, which didn't seem possible by using a standard C code loop since this was not adequately fast to keep up. If the CPU were interrupted by another interrupt it would also immediately cause a Rx overrun, which obviously resulted in a read error.
By clocking at a more relaxed rate of 4MHz (31us per FIFO fill) things worked well, but still I had to block interrupts while reading a block to ensure that nothing could cause an overrun to take place - blocking interrupts for a block read time of about 250us is however also not a practical method when other high speed peripherals are in operation.
I took a peak at some other driver code which said that it was reading at 12MHz and it was relaying on a half buffer interrupt to read 32 bytes per interrupt. This would give an interrupt rate of about 5us if my calculations are correct and any interrupt latency plus read code time of more that about 12us would (according to my tests and calculations) result in read errors. [note that the reference code was not doing anything else - no other interrupts and the code was just waiting in a loop for the interrupts handling to complete so maybe it could just keep up(?)].
My question is whether more experienced users can comment on these figures. Am I correct that FIFO mode is not a practical mode for real use (apart from slow clock rates) and DMA operation is therefore a must with this peripheral?
P.S. A couple of videos of SD card use on other targets (using SPI SD card mode) - an LPC23xx or LPC24xx one should follow... ;-)
> My question is whether more experienced users can comment on these figures. Am I correct that FIFO mode is not a practical mode for real use (apart from slow clock rates) and DMA operation is therefore a must with this peripheral?
you are right with your suggestion mentioned below. Have a look at
"FatFs Generic File System Module" on
http://elm-chan.org/fsw/ff/00index_e.html, they provide low-level driver
stuff for the LPC23xx/LPC24xx working with DMA mode as well.
With this library the SD card works with 18 MHz via MCI without problems
- as long as you do not try to run this stuff on a LPC2478 having DMA
for display refresh enabled at the same time :-(
What happens when you try to run at 18MHz with LCD DMA refresh operation too?
Can you solve the problem by reducing speed? (Perhaps DMA priority scheme can be optimised? What about when using Ethernet and USB too - these will also be using dedicated DMA)
please see my answers in-line below.
Am 04.08.2010 13:25, schrieb Mark:
> Hi Herbert
> What happens when you try to run at 18MHz with LCD DMA refresh operation too?
We get timeouts causing the blocks not to be read in time. FatFs uses
two buffers where the DMA is setup on the fly to continue in the other
buffer. If the first buffer is not read before it is to be re-used
again, you get an error captured by the FatFs library.
> Can you solve the problem by reducing speed?
Yes, we had to go down at least to 4.5MHz.
> (Perhaps DMA priority scheme can be optimised? What about when using Ethernet and USB too - these will also be using dedicated DMA)
We are currently trying to find such an solution for reading NAND flash
via DMA while the LCD is on and SPI data comes in. What we get here is a
interrupt response time of up to 50 (!!!) us (micro-seconds !),
sometimes SPI interrupts even get lost. Everything more than about 5 us
is too late for our design, as the SPI bytes get overwritten if it is
not read in time.
We could not find a solution with priorities yet. What we now do, is to
read the NAND flash not with 512 block size but with partial reads of 32
byte until the NAND flash page is fully read. This keeps the interrupt
latency below about 2 ~ 3 us, thus it increases the time for displaying
a full screen image out of NAND flash by about 10% :-(
Many thanks for the details.
Today I took the DMA into operation, using DMA channel 1 for both reading and writing. I am presently still working on an LPC2378 and will move onto the LPC2478 shortly (to test with LCD controller in parallel).
Like this I could read the card at 18MHz. It all seemed to work reliably.
I could also write cards at this speed but the reliability was not good. As a test I would reformat a 1GByte card several times and typically it would result in a write error once every few reformat tests (also sometimes when generally writing data) and the error was always due to a tx under-run while writing a sector.
By dropping the speed down to 9MHz it then became reliable (whether this is completely solved at this speed remains to be seen in long term tests).
Obviously the DMA operation is better than FIFO since it removes SW/interrupt latency issues, but I am surprised that the DMA controller sometimes seems to have problems keeping up.
Also I am wondering whether the SD controller design would not have been better to stop generating clocks when there is no data ready to be sent or when the input buffer is full - this would presumably simply add small wait periods and remove any such reaction time constraints from the processor (whether SW or DMA controlled).
As I have understood the operation, once the MCI interface is activated to perform a block transfer it will generate the full number of clocks and if there happens to be any delay in getting data in or out during that time then either an under-run or overrun results. This results in the block being corrupted and is thus quite a serious error.
I didn't try handling the under-runs or overruns since, in DMA mode - assuming the speed is not set higher than the one that operates reliably (worst case) - this should never occur. Maybe it would be possible to run faster and handle such block corruption with repetitions but this seems over-complicated and the repetitions also result in reduced throughput anyway....
At the moment I am testing by using the SD card as a mass storage device via USB (USB is however in FIFO mode so not yet using DMA on the same RAM area as the MCI DMA operations). At the same time I am serving web content from the SD card via web server - since the Ethernet DMA is on a different bus to the USB RAM I don't think that this has any impact.
A quick comparison between older tests in SPI SD card mode shows that the mass storage write via USB (using SD controller) seems to have increased from about 180kbyte/s to around 350kbyte/s (tentative first tests).
Maybe I can repeat the tests tomorrow on the LPC2478 with LCD controller in operation to see how things compare there.
you seem to be a pretty fast guy ;-)
I've not done the verification and tests for the SD card source by
myself (neither I've written the code), this has been done by a college.
So I've not been involved that deep in this stuff. Take care when
switching to the LPC2478, you may need to play around with the AHBCFG
register (see other threads in this forum) as well to get best results -
this seriously depends on the pixel clock speed (we run up to 24 MHz,
which is the maximum the LPC2478 can handle when the SDRAM access is 16
bits wide and the TFT operates with 16 bits). As already mentioned we
had to go down to 4.5 MHz for the SD card with our design.
Please note that I'm leaving my country for going to holidays (thanks
god !) within the next 3 hours, so I'm not able to send any replies for
the next two weeks.
I am using a 2368 with MCI and DMA controller. Can't say I have noticed any buffer under runs. I am running the MCI clock at 15Mhz CPU core at 60Mhz.
I am only doing small writes of 2k (4 bit bus mode) every few milliseconds. I am DMAing from USB ram to the MCI controller. I really wish you could use the ethernet RAM for that though.
By the way there is a PwrSave (bit 9) in the MCIclock register to turn off the clock when idle.
I have the power save active - this disables clocks between transfers but not in the middle of transfer when there is no data ready.
I also disable the DMA controller between transfers (supposedly also save a bit of power).
All tests that I did were with active Ethernet (commands were via TELNET). Also there was some LCD activity on my MB2300 board (probably irrelevant). I also have an older chip so MAM is not set to maximum level.
The debugger was set to break on tx under-runs and it did break fairly regularly at 18MHz (probably equivalent to 15MHz with your PLL). At 9MHz I never detected any more tx under-runs (i.e. the debugger never hit the break point any more during tests). However I don't have a lot of experience with this to know whether there is some additional optimisation (I am using GPDMA channel 1, whereas I think that channel 0 has higher priority - although channel 0 is not used) or whether long-term it will always be reliable.
I will switch to the LPC2478 now and repeat tests with USB, Ethernet and LCD controller in parallel and play around with a few settings. Basically I am a bit surprised that the speed with DMA is still limited and am especially curious of the additional LCD controller impact. But, at the end of the day, the maximum safe speed of operation that can be set is the important point. When writing data to the SD card the SD clock speed is not such a big issue since the card takes some time to save sectors anyway (a lot of time is spent waiting for the SD card to signal that it is no longer busy before further blocks can be written) but it is probably more important for read speed. Maybe, assuming that the tx under-run case is limiting (I didn't have any rx overruns at 18MHz with DMA), it may be interesting to also switch between 9MHz and 18MHz depending on direction (i.e. depending on whether reading or writing...).
I hope to know more later though.
I managed to get the LPC2478-STK operating.
I also did some more accurate USB MSD transfer measurements on the LPC2378:
I was getting file writes (tested with a few hundred MBytes files) of about 270 kBytes/second and reads of about 510kBytes/second. This was with 9MHz SDIO clock speed.
On the LPC2478 I didn't manage to do the speed test via USB MSD since the TFT pin initialisation was giving some conflict (didn't study why but just disabled the USB for the moment). What I found was that 9MHz operation didn't give any read problems with the TFT in operation, but I definitely couldn't read at 18MHz - this resulted in immediate rx over-runs (as comparison, on the LPC2378 I could read at 18MHz but writes could fails with tx under-runs).
I was getting TFT display update times of about 0.2s when displaying photos in 16 bit color mode from 24-bit color bit maps files.
So that anyone interested can get an idea I made a quick and provisional video of the first operation (i.e. I may delete it and upgrade it with more details later [eg. web server, USB MSD, etc.]). It can be viewed at http://www.youtube.com/watch?v]D2cZ8FEqo
(search there for uTasker for other related videos with more details)
As a follow-up, it is interesting that the ST Micro SDIO, which is very
similar to the SD/MMC controller in the LPC23xx/LPC24xx, has an
additional control bit in its clock register (equivalent to the MCIClock
register in the LPCs).
The extra bit is called HWFC_EN (Hardware Flow Control Enable).
The description is
"The HW flow control functionality is used to avoid FIFO underrun (TX
mode) and overrun (RX mode) errors.
The behavior is to stop SDIO_CK and freeze SDIO state machines. The data
transfer is stalled while the FIFO is unable to transmit or receive
data. Only state machines clocked by SDIOCLK are frozen, the AHB
interface is still alive. The FIFO can thus be filled or emptied even if
flow control is activated."
This is exactly what is missing in the LPC's implementation...;-)