EmbeddedRelated.com
Forums
Memfault Beyond the Launch

SD/MMC card communication speed using SPI

Started by Jan Thogersen September 28, 2006
HI all,

I have a project were I communicate between a LPC2148 and a SD card via
SPI. It's works as it should but it's just to slow... even though I have
the SPI clock running at 15Mhz.

I started digging into the details about the communication and found out
that every time I send the SD cmd 17 "READ_SINGLE_BLOCK" the SD card
lock's up for about 1-15ms before it responds "ready" (0xFE).

The datasheet also states a long delay between a bank shift and the
ready signal.

Anyone what knows reason for this long delay? And how is it then
possible to reach the 20MB/s that the card manufacture states is
possible? Is the situation completely different if the MMC protocol is
used instead of the SPI protocol?

Regards
Jan Thogersen

An Engineer's Guide to the LPC2100 Series

I'll first state I know nothing of the system you're talking about, however
does the manufacturer state that the 20MB/s is for a single block read?
Could it be possible that this is a maximum for reading multiple sequential
blocks?

Andy

-----Original Message-----
From: l... [mailto:l...]On Behalf Of
Jan Thogersen
Sent: 28 September 2006 15:20
To: l...
Subject: [lpc2000] SD/MMC card communication speed using SPI
HI all,

I have a project were I communicate between a LPC2148 and a SD card via
SPI. It's works as it should but it's just to slow... even though I have
the SPI clock running at 15Mhz.

I started digging into the details about the communication and found out
that every time I send the SD cmd 17 "READ_SINGLE_BLOCK" the SD card
lock's up for about 1-15ms before it responds "ready" (0xFE).

The datasheet also states a long delay between a bank shift and the
ready signal.

Anyone what knows reason for this long delay? And how is it then
possible to reach the 20MB/s that the card manufacture states is
possible? Is the situation completely different if the MMC protocol is
used instead of the SPI protocol?

Regards
Jan Thogersen
Jan Thogersen wrote:
> I have a project were I communicate between a LPC2148 and a SD card via
> SPI. It's works as it should but it's just to slow... even though I have
> the SPI clock running at 15Mhz.
>
> I started digging into the details about the communication and found out
> that every time I send the SD cmd 17 "READ_SINGLE_BLOCK" the SD card
> lock's up for about 1-15ms before it responds "ready" (0xFE).
>
> The datasheet also states a long delay between a bank shift and the
> ready signal.
>
> Anyone what knows reason for this long delay? And how is it then
> possible to reach the 20MB/s that the card manufacture states is
> possible? Is the situation completely different if the MMC protocol is
> used instead of the SPI protocol?

I have found some cards that have a very slow response time although I
have not documented the figures. This has nothing to do with the actual
transfer rate but is referred to as latency and could be due to the SD
controller implementation that the manufacturer uses. Try another brand
card, I use SanDisk as my reference card.

*Peter*
Peter Jakacki wrote:
>
> Jan Thogersen wrote:
> > I have a project were I communicate between a LPC2148 and a SD card via
> > SPI. It's works as it should but it's just to slow... even though I have
> > the SPI clock running at 15Mhz.
> >
> > I started digging into the details about the communication and found out
> > that every time I send the SD cmd 17 "READ_SINGLE_ BLOCK" the SD card
> > lock's up for about 1-15ms before it responds "ready" (0xFE).
> >
> > The datasheet also states a long delay between a bank shift and the
> > ready signal.
> >
> > Anyone what knows reason for this long delay? And how is it then
> > possible to reach the 20MB/s that the card manufacture states is
> > possible? Is the situation completely different if the MMC protocol is
> > used instead of the SPI protocol?
>
> I have found some cards that have a very slow response time although I
> have not documented the figures. This has nothing to do with the actual
> transfer rate but is referred to as latency and could be due to the SD
> controller implementation that the manufacturer uses. Try another brand
> card, I use SanDisk as my reference card.
>
Are you using SPI or MMC protocol?
You say that the latency has nothing to do with the transfer speed. But,
indeed it has something to do with the overall performance.
I don't think that its possible to transfer 20 mega byte / sec if there
it a 15ms pause after each 512 bytes transfered.

I have tried using the read multiply block command, and it has the same
behavior as the read single block command! After the 512bytes there is a
pause before the next block is transfered.

So I'm pretty interested in finding out how it's possible to reach the
high transfer speeds on SD cards.

Regards
Jan
Jan Thogersen wrote:
> Are you using SPI or MMC protocol?
> You say that the latency has nothing to do with the transfer speed. But,
> indeed it has something to do with the overall performance.
> I don't think that its possible to transfer 20 mega byte / sec if there
> it a 15ms pause after each 512 bytes transfered.
>
> I have tried using the read multiply block command, and it has the same
> behavior as the read single block command! After the 512bytes there is a
> pause before the next block is transfered.
>
> So I'm pretty interested in finding out how it's possible to reach the
> high transfer speeds on SD cards.

I run my SD cards in SPI mode using SD protocol. As I have stated, I
have found some (very few) cards that have a great deal of latency when
accessing blocks. The only real way you can find out a bit more is to
try a reliable card. BTW, I run my SPI at 16.6Mhz and from go to whoa
when I issue a read command and I have read in the full 512 bytes I
measure around 1.2ms. The actual data transfer takes around 800us which
means I am being rather slack in optimizing my block transfer. I just
took some measurements and I found that a byte transfers in 500ns which
is correct but my routines are taking around 1us between each byte. What
is going on? This is a snippet of my code as follows:-

SDRD: ldr r0,=SSP ;SSP base
sdrdlp: mov r1,#0FFh ;dummy data to read SD
bl SDRW
strb r1,[dst],#1 ;save SPI byte into destination
djnz cnt,sdrdlp ;loop for block size
SDRW:
sdlp1: ldr r2,[r0,#SSPSR-SSP] ;read SPI status
tst r2,#2 ;tx FIFO ready?
beq sdlp1
str r1,[r0,#SSPDR-SSP] ;write R1 to SPI
sdlp2: ldr r2,[r0,#SSPSR-SSP] ;rx FIFO ready?
tst r2,#4
beq sdlp2
ldr r1,[r0,#SSPDR-SSP] ;read SPI rx
ret

Obviously testing the FIFO status is not the way as it does not update
immediately I guess. It would be better perhaps that I check the BSY
flag instead but I will investigate this further anyway.

However, if I optimize the code for back-to-back transfer I should be
able to issue a read command and read a block every 650us. This is
roughly 800K bytes per second but nothing like the 20M bytes/sec which
could only be achieved in high-speed SD mode anyway for which I can't
see why a little 2148 with little memory and no DMA would ever need that
kind of speed.

A quick interrogation of my registers reveals..
SSPCR0 ? 107 ok
SSPCR1 ? 2 ok
SSPCPSR ? 2 ok
SSPSR ? 3 ok
P.S. I think I could set the SSP to 16-bit mode for the data transfers
but that still won't make it much faster.
*Peter*
Hi Jan,
> So I'm pretty interested in finding out how it's possible to reach the
> high transfer speeds on SD cards.
>

I have SD operating @ 24MHz, SPI1, LPC2148.

The Clock Frequency Data Transfer Mode maximum is 25MHz, as referenced
in the ProdManualSDCardv1.9.pdf. This is only the max clock speed, not
related to data throughput.

Clock Frequency Identification Mode. This is the low speed period
during initialization. This should be used to determine the device's
maximum capable clock speed. For best compatibility do not treat clock
speed as constant, but rather determine maximum clock speed based on the
card inserted.
Joel
> -----Original Message-----
> From: l...
> [mailto:l...]On Behalf
> Of Peter Jakacki
> Sent: Friday, September 29, 2006 7:34 AM
> To: l...
> Subject: Re: [lpc2000] SD/MMC card communication speed using SPI
> Jan Thogersen wrote:
> > Are you using SPI or MMC protocol?
> > You say that the latency has nothing to do with the
> transfer speed. But,
> > indeed it has something to do with the overall performance.
> > I don't think that its possible to transfer 20 mega byte /
> sec if there
> > it a 15ms pause after each 512 bytes transfered.
> >
> > I have tried using the read multiply block command, and it
> has the same
> > behavior as the read single block command! After the
> 512bytes there is a
> > pause before the next block is transfered.
> >
> > So I'm pretty interested in finding out how it's possible
> to reach the
> > high transfer speeds on SD cards.
>
> I run my SD cards in SPI mode using SD protocol. As I have stated, I
> have found some (very few) cards that have a great deal of
> latency when
> accessing blocks. The only real way you can find out a bit more is to
> try a reliable card. BTW, I run my SPI at 16.6Mhz and from go to whoa
> when I issue a read command and I have read in the full 512 bytes I
> measure around 1.2ms. The actual data transfer takes around
> 800us which
> means I am being rather slack in optimizing my block transfer. I just
> took some measurements and I found that a byte transfers in
> 500ns which
> is correct but my routines are taking around 1us between each
> byte. What
> is going on? This is a snippet of my code as follows:-
>
> SDRD: ldr r0,=SSP ;SSP base
> sdrdlp: mov r1,#0FFh ;dummy data to read SD
> bl SDRW
> strb r1,[dst],#1 ;save SPI byte into destination
> djnz cnt,sdrdlp ;loop for block size
> SDRW:
> sdlp1: ldr r2,[r0,#SSPSR-SSP] ;read SPI status
> tst r2,#2 ;tx FIFO ready?
> beq sdlp1
> str r1,[r0,#SSPDR-SSP] ;write R1 to SPI
> sdlp2: ldr r2,[r0,#SSPSR-SSP] ;rx FIFO ready?
> tst r2,#4
> beq sdlp2
> ldr r1,[r0,#SSPDR-SSP] ;read SPI rx
> ret
>
> Obviously testing the FIFO status is not the way as it does
> not update
> immediately I guess. It would be better perhaps that I check the BSY
> flag instead but I will investigate this further anyway.
>
> However, if I optimize the code for back-to-back transfer I should be
> able to issue a read command and read a block every 650us. This is
> roughly 800K bytes per second but nothing like the 20M
> bytes/sec which
> could only be achieved in high-speed SD mode anyway for which I can't
> see why a little 2148 with little memory and no DMA would
> ever need that
> kind of speed.
>
> A quick interrogation of my registers reveals..
> SSPCR0 ? 107 ok
> SSPCR1 ? 2 ok
> SSPCPSR ? 2 ok
> SSPSR ? 3 ok
> P.S. I think I could set the SSP to 16-bit mode for the data
> transfers
> but that still won't make it much faster.
> *Peter*
>

I made some modifications to the EFSL library to try to maximize
transfer rates. You can find my modifications on the EFSL sourceforge
site.

To get the maximum speed I could, I ran the SSP at 15MHz, which is the
best I could do with a 60MHz clock. You need to use the FIFO on the
SSP, or it is very difficult. Even with the FIFO, it is difficult to
keep the transfers back to back (or it is in 'C'). Using 16bit mode
helps quite a bit. I could reach 1MB/s read and write, which includes
the card latency. The type of card used does make a difference, and
I found my 1GB Sandisk cards to be one of the slowest. Transcend 1GB
mini SD cards were the fastest that I found.

The only way that one could reach anywhere near the maximum transfer
rate of 20MB/s that SD specifies, is to use the 4 bit mode, and even
then you would need a 40MHz clock (though I believe that some newer
cards support up to 50MHz).

I hope you find this useful,

Mike
> To get the maximum speed I could, I ran the SSP at 15MHz, which is the
> best I could do with a 60MHz clock. You need to use the FIFO on the
> SSP, or it is very difficult. Even with the FIFO, it is difficult to
> keep the transfers back to back (or it is in 'C'). Using 16bit mode
> helps quite a bit. I could reach 1MB/s read and write, which includes
> the card latency. The type of card used does make a difference, and
> I found my 1GB Sandisk cards to be one of the slowest. Transcend 1GB
> mini SD cards were the fastest that I found.
>
> The only way that one could reach anywhere near the maximum transfer
> rate of 20MB/s that SD specifies, is to use the 4 bit mode, and even
> then you would need a 40MHz clock (though I believe that some newer
> cards support up to 50MHz).
>

On LPC214x the minimum divider for SPI1 is 2. I'm running PCLK on
LPC2148 at 48MHz, which gives me 24MHz on the SD card.
Joel
Jan Thogersen wrote:
> So I'm pretty interested in finding out how it's possible to reach the
> high transfer speeds on SD cards.

Jan, after I looked at my code and saw there were rather large gaps of
around 1us or so between each byte I started to investigate. I found
that the SSP seems to update it's status register based on timeouts as
no matter how tight a loop I may run the gap is still around 1us or more.

So I tried a method with filling the FIFO and achieved a performance of
over 2MB/sec "transfer" rate with a clock rate of 22.1184MHz. There are
gaps of around 1us between each fill of the FIFO which seems to suffer
from the same status register update problem.

However, the command/result sequence still takes 344us because no matter
how fast you clock the interface the SD card will still take some time
before it is ready. A random sector read takes just under 600us altogether.

If you are after sustained transfer rates you should find a decent card
and use read-multiple commands. How do you know it's a decent card? Do
what I do and test them with a scope attached to the clock line, you
will be surprised at the differences between brands.

I've included a section of my code below.

*Peter*
My SD assembler primitives (with Forth headers).
_______________________________________________________________________

H$ "RES@" ( -- res )
ASMCODE _GETRES
mov r3,#2000h ;max retrys
reslp: mov r1,#0FFh ;read a byte
bl SDRW
cmp r1,#0FFh ;blank?
bne _RESOK
djnz r3,reslp ;timeout?
_RESOK: mov tos,r1 ;result or timeout
str tos,[PSP],#CELLL ;result -> datastack
NEXT
H$ "SCMD" ( data cmd -- res )
ASMCODE _SCMD
ldr r0,=IO ;CS active
mov r1,#bit(20)
str r1,[r0,#IOCLR-IO]
ldr r0,=SSP ;use SSP
mvn r1,#0 ;pad FF
str r1,[r0,#SSPDR-SSP]
and r1,tos,#3Fh ;mask cmd
orr r1,r1,#bit(6) ;force cmd
str r1,[r0,#SSPDR-SSP] ;cmd
ldr sec,[PSP,#SEC]
mov r1,sec,lsr #18h
str r1,[r0,#SSPDR-SSP] ;dat1
mov r1,sec,lsr #10h
str r1,[r0,#SSPDR-SSP] ;dat2
mov r1,sec,lsr #8
str r1,[r0,#SSPDR-SSP] ;dat3
mov r1,sec
str r1,[r0,#SSPDR-SSP] ;dat4
mov r1,#95h
str r1,[r0,#SSPDR-SSP] ;dummy crc
sub PSP,PSP,#CELLL*2 ;pop data stack
mov r2,#7 ;read back 7 bytes from SPI
scmdw: ldr r1,[r0,#SSPSR-SSP]
tst r1,#bit(2) ;RXFIFO not empty?
beq scmdw
ldr r1,[r0,#SSPDR-SSP] ;read & discard
djnz r2,scmdw
b _GETRES ;command complete, get result

; Block transfer in 8 x 16-bit units
; Seems to be a 1us gap between each block
; - status does not update normally, seems to use timeouts
; At 22.1184MHz a block transfers in 238us
;

H$ "(SDRD)" ,( dst cnt -- ) read block from SD into memory
ASMCODE _SDRD
ldr r0,=SSP ;SSP base
ldr r1,[r0,#SSPCR0-SSP] ;force 16-bit mode
orr r1,r1,#08
str r1,[r0,#SSPCR0-SSP]
ldr sec,[PSP,#SEC] ;read dst into sec
sdrdlp:
ldr r2,[r0,#SSPSR-SSP] ;wait for tx fifo empty
tst r2,#bit(0)
beq sdrdlp
sdrdlp1:
mvn r1,#0 ;write 128 clocks (16x8)
str r1,[r0,#SSPDR-SSP]
str r1,[r0,#SSPDR-SSP]
str r1,[r0,#SSPDR-SSP]
str r1,[r0,#SSPDR-SSP]
str r1,[r0,#SSPDR-SSP]
str r1,[r0,#SSPDR-SSP]
str r1,[r0,#SSPDR-SSP]
str r1,[r0,#SSPDR-SSP]
mov r3,#8 ;read back 16-bit data x8
sdrdrdy:
ldr r2,[r0,#SSPSR-SSP] ;wait for next rx data
tst r2,#bit(2) ;rne - receive fifo not empty?
beq sdrdrdy

ldr r1,[r0,#SSPDR-SSP] ;read 16-bits from rx fifo
mov r1,r1,ror #8 ;store bytes
strb r1,[sec],#1 ;lsb
mov r1,r1,ror #24
strb r1,[sec],#1 ;msb
djnz r3,sdrdrdy

subs tos,tos,#16 ;decrement count by 16 bytes
bne sdrdlp1 ;until cnt = 0
sdrdxt:
sub PSP,PSP,#8 ;pop data stack by 2 items
ldr tos,[PSP,#TOS] ;update tos register from stack
ldr r1,[r0,#SSPCR0-SSP] ;return back to 8-bit mode
bic r1,r1,#08
str r1,[r0,#SSPCR0-SSP]
NEXT
_______________________________________________________________________
> -----Original Message-----
> From: l...
> [mailto:l...]On Behalf
> Of Peter Jakacki
> Sent: Monday, October 02, 2006 6:11 PM
> To: l...
> Subject: Re: [lpc2000] SD/MMC card communication speed using SPI
> Jan Thogersen wrote:
> > So I'm pretty interested in finding out how it's possible
> to reach the
> > high transfer speeds on SD cards.
>
> Jan, after I looked at my code and saw there were rather
> large gaps of
> around 1us or so between each byte I started to investigate. I found
> that the SSP seems to update it's status register based on
> timeouts as
> no matter how tight a loop I may run the gap is still around
> 1us or more.
>
> So I tried a method with filling the FIFO and achieved a
> performance of
> over 2MB/sec "transfer" rate with a clock rate of 22.1184MHz.
> There are
> gaps of around 1us between each fill of the FIFO which seems
> to suffer
> from the same status register update problem.
>
> However, the command/result sequence still takes 344us
> because no matter
> how fast you clock the interface the SD card will still take
> some time
> before it is ready. A random sector read takes just under
> 600us altogether.
>
> If you are after sustained transfer rates you should find a
> decent card
> and use read-multiple commands. How do you know it's a decent
> card? Do
> what I do and test them with a scope attached to the clock line, you
> will be surprised at the differences between brands.
>
> I've included a section of my code below.
>
> *Peter*
> My SD assembler primitives (with Forth headers).
> ______________________________________________________________
> _________
>
> H$ "RES@" ( -- res )
> ASMCODE _GETRES
> mov r3,#2000h ;max retrys
> reslp: mov r1,#0FFh ;read a byte
> bl SDRW
> cmp r1,#0FFh ;blank?
> bne _RESOK
> djnz r3,reslp ;timeout?
> _RESOK: mov tos,r1 ;result or timeout
> str tos,[PSP],#CELLL ;result -> datastack
> NEXT
> H$ "SCMD" ( data cmd -- res )
> ASMCODE _SCMD
> ldr r0,=IO ;CS active
> mov r1,#bit(20)
> str r1,[r0,#IOCLR-IO]
> ldr r0,=SSP ;use SSP
> mvn r1,#0 ;pad FF
> str r1,[r0,#SSPDR-SSP]
> and r1,tos,#3Fh ;mask cmd
> orr r1,r1,#bit(6) ;force cmd
> str r1,[r0,#SSPDR-SSP] ;cmd
> ldr sec,[PSP,#SEC]
> mov r1,sec,lsr #18h
> str r1,[r0,#SSPDR-SSP] ;dat1
> mov r1,sec,lsr #10h
> str r1,[r0,#SSPDR-SSP] ;dat2
> mov r1,sec,lsr #8
> str r1,[r0,#SSPDR-SSP] ;dat3
> mov r1,sec
> str r1,[r0,#SSPDR-SSP] ;dat4
> mov r1,#95h
> str r1,[r0,#SSPDR-SSP] ;dummy crc
> sub PSP,PSP,#CELLL*2 ;pop data stack
> mov r2,#7 ;read back 7 bytes from SPI
> scmdw: ldr r1,[r0,#SSPSR-SSP]
> tst r1,#bit(2) ;RXFIFO not empty?
> beq scmdw
> ldr r1,[r0,#SSPDR-SSP] ;read & discard
> djnz r2,scmdw
> b _GETRES ;command complete, get result
>
>
> ; Block transfer in 8 x 16-bit units
> ; Seems to be a 1us gap between each block
> ; - status does not update normally, seems to use timeouts
> ; At 22.1184MHz a block transfers in 238us
> ;
>
> H$ "(SDRD)" ,( dst cnt -- ) read block from SD into memory
> ASMCODE _SDRD
> ldr r0,=SSP ;SSP base
> ldr r1,[r0,#SSPCR0-SSP] ;force 16-bit mode
> orr r1,r1,#08
> str r1,[r0,#SSPCR0-SSP]
> ldr sec,[PSP,#SEC] ;read dst into sec
> sdrdlp:
> ldr r2,[r0,#SSPSR-SSP] ;wait for tx fifo empty
> tst r2,#bit(0)
> beq sdrdlp
> sdrdlp1:
> mvn r1,#0 ;write 128 clocks (16x8)
> str r1,[r0,#SSPDR-SSP]
> str r1,[r0,#SSPDR-SSP]
> str r1,[r0,#SSPDR-SSP]
> str r1,[r0,#SSPDR-SSP]
> str r1,[r0,#SSPDR-SSP]
> str r1,[r0,#SSPDR-SSP]
> str r1,[r0,#SSPDR-SSP]
> str r1,[r0,#SSPDR-SSP]
> mov r3,#8 ;read back 16-bit data x8
> sdrdrdy:
> ldr r2,[r0,#SSPSR-SSP] ;wait for next rx data
> tst r2,#bit(2) ;rne - receive fifo not empty?
> beq sdrdrdy
>
> ldr r1,[r0,#SSPDR-SSP] ;read 16-bits from rx fifo
> mov r1,r1,ror #8 ;store bytes
> strb r1,[sec],#1 ;lsb
> mov r1,r1,ror #24
> strb r1,[sec],#1 ;msb
> djnz r3,sdrdrdy
>
> subs tos,tos,#16 ;decrement count by 16 bytes
> bne sdrdlp1 ;until cnt = 0
> sdrdxt:
> sub PSP,PSP,#8 ;pop data stack by 2 items
> ldr tos,[PSP,#TOS] ;update tos register from stack
> ldr r1,[r0,#SSPCR0-SSP] ;return back to 8-bit mode
> bic r1,r1,#08
> str r1,[r0,#SSPCR0-SSP]
> NEXT
> ______________________________________________________________
> _________

You can probably still do a bit better, by not completely emptying the
FIFO before you refill it. Once you receive a word, send another word.
You just need to keep track of whether or not you need to send another
word. This could shave another 15us or so off of your code. This is
what I did for my EFSL patch in C, it works well, and keeps the FIFO
full, as long as an interrupt doesn't come along.

Mike

Memfault Beyond the Launch