As a follow-up, it is interesting that the ST Micro SDIO, which is very
similar to the SD/MMC controller in the LPC23xx/LPC24xx, has an
additional control bit in its clock register (equivalent to the MCIClock
register in the LPCs).
The extra bit is called HWFC_EN (Hardware Flow Control Enable).
The description is
"The HW flow control functionality is used to avoid FIFO underrun (TX
mode) and overrun (RX mode) errors.
The behavior is to stop SDIO_CK and freeze SDIO state machines. The data
transfer is stalled while the FIFO is unable to transmit or receive
data. Only state machines clocked by SDIOCLK are frozen, the AHB
interface is still alive. The FIFO can thus be filled or emptied even if
flow control is activated."
This is exactly what is missing in the LPC's implementation...;-)
I also did some more accurate USB MSD transfer measurements on the LPC2378:
I was getting file writes (tested with a few hundred MBytes files) of about 270
kBytes/second and reads of about 510kBytes/second. This was with 9MHz SDIO clock
speed.
On the LPC2478 I didn't manage to do the speed test via USB MSD since the
TFT pin initialisation was giving some conflict (didn't study why but just
disabled the USB for the moment). What I found was that 9MHz operation
didn't give any read problems with the TFT in operation, but I definitely
couldn't read at 18MHz - this resulted in immediate rx over-runs (as
comparison, on the LPC2378 I could read at 18MHz but writes could fails with tx
under-runs).
I was getting TFT display update times of about 0.2s when displaying photos in
16 bit color mode from 24-bit color bit maps files.
So that anyone interested can get an idea I made a quick and provisional video
of the first operation (i.e. I may delete it and upgrade it with more details
later [eg. web server, USB MSD, etc.]). It can be viewed at
http://www.youtube.com/watch?v]D2cZ8FEqo
(search there for uTasker for other related videos with more details)
I have the power save active - this disables clocks between transfers but not in
the middle of transfer when there is no data ready.
I also disable the DMA controller between transfers (supposedly also save a bit
of power).
All tests that I did were with active Ethernet (commands were via TELNET). Also
there was some LCD activity on my MB2300 board (probably irrelevant). I also
have an older chip so MAM is not set to maximum level.
The debugger was set to break on tx under-runs and it did break fairly regularly
at 18MHz (probably equivalent to 15MHz with your PLL). At 9MHz I never detected
any more tx under-runs (i.e. the debugger never hit the break point any more
during tests). However I don't have a lot of experience with this to know
whether there is some additional optimisation (I am using GPDMA channel 1,
whereas I think that channel 0 has higher priority - although channel 0 is not
used) or whether long-term it will always be reliable.
I will switch to the LPC2478 now and repeat tests with USB, Ethernet and LCD
controller in parallel and play around with a few settings. Basically I am a bit
surprised that the speed with DMA is still limited and am especially curious of
the additional LCD controller impact. But, at the end of the day, the maximum
safe speed of operation that can be set is the important point. When writing
data to the SD card the SD clock speed is not such a big issue since the card
takes some time to save sectors anyway (a lot of time is spent waiting for the
SD card to signal that it is no longer busy before further blocks can be
written) but it is probably more important for read speed. Maybe, assuming that
the tx under-run case is limiting (I didn't have any rx overruns at 18MHz
with DMA), it may be interesting to also switch between 9MHz and 18MHz depending
on direction (i.e. depending on whether reading or writing...).
I am using a 2368 with MCI and DMA controller. Can't say I have noticed any
buffer under runs. I am running the MCI clock at 15Mhz CPU core at 60Mhz.
I am only doing small writes of 2k (4 bit bus mode) every few milliseconds. I am
DMAing from USB ram to the MCI controller. I really wish you could use the
ethernet RAM for that though.
By the way there is a PwrSave (bit 9) in the MCIclock register to turn off the
clock when idle.
Ben
Reply by Herbert Demmel●August 6, 20102010-08-06
Dear Mark,
you seem to be a pretty fast guy ;-)
I've not done the verification and tests for the SD card source by
myself (neither I've written the code), this has been done by a college.
So I've not been involved that deep in this stuff. Take care when
switching to the LPC2478, you may need to play around with the AHBCFG
register (see other threads in this forum) as well to get best results -
this seriously depends on the pixel clock speed (we run up to 24 MHz,
which is the maximum the LPC2478 can handle when the SDRAM access is 16
bits wide and the TFT operates with 16 bits). As already mentioned we
had to go down to 4.5 MHz for the SD card with our design.
Please note that I'm leaving my country for going to holidays (thanks
god !) within the next 3 hours, so I'm not able to send any replies for
the next two weeks.
Have fun
Herbert
Reply by Mark●August 5, 20102010-08-05
Hi Herbert
Many thanks for the details.
Today I took the DMA into operation, using DMA channel 1 for both reading and
writing. I am presently still working on an LPC2378 and will move onto the
LPC2478 shortly (to test with LCD controller in parallel).
Like this I could read the card at 18MHz. It all seemed to work reliably.
I could also write cards at this speed but the reliability was not good. As a
test I would reformat a 1GByte card several times and typically it would result
in a write error once every few reformat tests (also sometimes when generally
writing data) and the error was always due to a tx under-run while writing a
sector.
By dropping the speed down to 9MHz it then became reliable (whether this is
completely solved at this speed remains to be seen in long term tests).
Obviously the DMA operation is better than FIFO since it removes SW/interrupt
latency issues, but I am surprised that the DMA controller sometimes seems to
have problems keeping up.
Also I am wondering whether the SD controller design would not have been better
to stop generating clocks when there is no data ready to be sent or when the
input buffer is full - this would presumably simply add small wait periods and
remove any such reaction time constraints from the processor (whether SW or DMA
controlled).
As I have understood the operation, once the MCI interface is activated to
perform a block transfer it will generate the full number of clocks and if there
happens to be any delay in getting data in or out during that time then either
an under-run or overrun results. This results in the block being corrupted and
is thus quite a serious error.
I didn't try handling the under-runs or overruns since, in DMA mode -
assuming the speed is not set higher than the one that operates reliably (worst
case) - this should never occur. Maybe it would be possible to run faster and
handle such block corruption with repetitions but this seems over-complicated
and the repetitions also result in reduced throughput anyway....
At the moment I am testing by using the SD card as a mass storage device via USB
(USB is however in FIFO mode so not yet using DMA on the same RAM area as the
MCI DMA operations). At the same time I am serving web content from the SD card
via web server - since the Ethernet DMA is on a different bus to the USB RAM I
don't think that this has any impact.
A quick comparison between older tests in SPI SD card mode shows that the mass
storage write via USB (using SD controller) seems to have increased from about
180kbyte/s to around 350kbyte/s (tentative first tests).
Maybe I can repeat the tests tomorrow on the LPC2478 with LCD controller in
operation to see how things compare there.
Regards
Mark
Reply by grou...@demmel.com●August 4, 20102010-08-04
Dear Mark,
please see my answers in-line below.
Am 04.08.2010 13:25, schrieb Mark: > Hi Herbert
>
> What happens when you try to run at 18MHz with LCD DMA refresh operation
too?
>
We get timeouts causing the blocks not to be read in time. FatFs uses
two buffers where the DMA is setup on the fly to continue in the other
buffer. If the first buffer is not read before it is to be re-used
again, you get an error captured by the FatFs library.
> Can you solve the problem by reducing speed?
Yes, we had to go down at least to 4.5MHz.
> (Perhaps DMA priority scheme can be optimised?
What about when using Ethernet and USB too - these will also be using dedicated
DMA)
We are currently trying to find such an solution for reading NAND flash
via DMA while the LCD is on and SPI data comes in. What we get here is a
interrupt response time of up to 50 (!!!) us (micro-seconds !),
sometimes SPI interrupts even get lost. Everything more than about 5 us
is too late for our design, as the SPI bytes get overwritten if it is
not read in time.
We could not find a solution with priorities yet. What we now do, is to
read the NAND flash not with 512 block size but with partial reads of 32
byte until the NAND flash page is fully read. This keeps the interrupt
latency below about 2 ~ 3 us, thus it increases the time for displaying
a full screen image out of NAND flash by about 10% :-(
Regards
Herbert
Reply by Mark●August 4, 20102010-08-04
Hi Herbert
What happens when you try to run at 18MHz with LCD DMA refresh operation too?
Can you solve the problem by reducing speed? (Perhaps DMA priority scheme can be
optimised? What about when using Ethernet and USB too - these will also be using
dedicated DMA)
Regards
Mark
Reply by Herbert Demmel●August 4, 20102010-08-04
Am 04.08.2010 00:39, schrieb Mark: > My question is whether more experienced users can
comment on these figures. Am I correct that FIFO mode is not a practical mode
for real use (apart from slow clock rates) and DMA operation is therefore a must
with this peripheral?
Mark,
you are right with your suggestion mentioned below. Have a look at
"FatFs Generic File System Module" on http://elm-chan.org/fsw/ff/00index_e.html, they provide low-level driver
stuff for the LPC23xx/LPC24xx working with DMA mode as well.
With this library the SD card works with 18 MHz via MCI without problems
- as long as you do not try to run this stuff on a LPC2478 having DMA
for display refresh enabled at the same time :-(
Have fun,
Herbert
Reply by Mark●August 3, 20102010-08-03
Hi All
Today I worked for the first time with the SD/MMC interface of the LPC23XX to
read an SD card. Initially I did this using the FIFO but am not sure whether
this is a practical approach for real work.
The reason is that the controller reads a complete block of data and the FIFO
has space to hold maximum 16 x 4 = 64 bytes (I believe this is the case because
the actual size doesn't seem to be specified explicitly but this is about
when rx overflows start). A block of data is generally 512 bytes in size but
could be more, so this involves filling (and clearing) the FIFO at least 8
times.
Setting the interface to 20MHz, which is seemingly the maximum possible when
running at 72MHz (HCSD cards should run at least 20MHz) I realised that this was
reading 10MByte/s from the card and the FIFO was filling in about 6us. The
processor needed to read the FIFO to make more space as fast as the data
arrived, which didn't seem possible by using a standard C code loop since
this was not adequately fast to keep up. If the CPU were interrupted by another
interrupt it would also immediately cause a Rx overrun, which obviously resulted
in a read error.
By clocking at a more relaxed rate of 4MHz (31us per FIFO fill) things worked
well, but still I had to block interrupts while reading a block to ensure that
nothing could cause an overrun to take place - blocking interrupts for a block
read time of about 250us is however also not a practical method when other high
speed peripherals are in operation.
I took a peak at some other driver code which said that it was reading at 12MHz
and it was relaying on a half buffer interrupt to read 32 bytes per interrupt.
This would give an interrupt rate of about 5us if my calculations are correct
and any interrupt latency plus read code time of more that about 12us would
(according to my tests and calculations) result in read errors. [note that the
reference code was not doing anything else - no other interrupts and the code
was just waiting in a loop for the interrupts handling to complete so maybe it
could just keep up(?)].
My question is whether more experienced users can comment on these figures. Am I
correct that FIFO mode is not a practical mode for real use (apart from slow
clock rates) and DMA operation is therefore a must with this peripheral?