EmbeddedRelated.com
Forums
Memfault Beyond the Launch

USBMem for LPC288x, Crossworks

Started by Jan Vanek May 18, 2008
Hi Forum,

I tried to get the USB mass storage functionality for
LPC288x running, the
goal was to see the device as a USB disc under Windows
and OSX. I am using
Crossworks, and I used Keil's USBMem example V1.30 as
a starting point.
Following is not an answer to a question, but maybe
somebody will find this
helpful. Ignore otherwise, please.

1. Board init. I didn't use the TargetResetInit()
function, instead I use
own board init which doesn't use fractional dividers
for most of spreading
stages. So most of ESR registers I set to 0 in the SYS
selection stage. This
could potentially improve the USB performance, but I
didn't measure it. The
other part which should influence the USB performance
is to run the app from
cache.

2. Packed structures. GCC doesn't have __packed
keyword, so declarations
like
typedef __packed struct _MSC_CBW {
must be changed to:
typedef struct __packed _MSC_CBW {
where
#define __packed __attribute__ ((packed))

3. Different LPC288x.h file. The one from Rowley
provides more defines and
some defines are different.
USBLock is USBUnlock
USBECtrl is USBCtrl

4. Couple of parts won't compile with GCC:
(BYTE *)pD += ((USB_CONFIGURATION_DESCRIPTOR
*)pD)->wTotalLength;
changed to:
pD = (USB_COMMON_DESCRIPTOR*)((BYTE*)pD +
((USB_CONFIGURATION_DESCRIPTOR*)pD)->wTotalLength);
and:
(BYTE *)pD += pD->bLength;
changed to:
pD = (USB_COMMON_DESCRIPTOR*)((BYTE*)pD +
pD->bLength);
In usbuser.c the P_EP(n) and USB_P_EP changed to:
typedef void (*USB_EndPointType)(DWORD event);
#define P_EP(n) ((USB_EP_EVENT & (1 << (n))) ?
&USB_EndPoint##n :
(USB_EndPointType)NULL)
/* USB Endpoint Events Callback Pointers */
const USB_EndPointType USB_P_EP[USB_LOGIC_EP_NUM] = {

5. USB_WriteEP function. The original implementation
crashes into DABORT
handler when compiled under GCC, because of misaligned
access in *((__packed
DWORD*)pData). The workaround tests the (pData & 0x03)
and if 0 I run the
original loop, otherwise I run another loop where:
USBData = (DWORD)pData[0] | ((DWORD)pData[1] << 8) |
((DWORD)pData[2] << 16)
| ((DWORD)pData[3] << 24);

6. MSC_BulkIn function. The BulkStage is incorrectly
handled, this causes
"PHASE ERROR" later on in MSC_BulkOut. Fix in
MSC_BulkIn:
case MSC_BS_DATA_IN_LAST:
MSC_SetCSW();
BulkStage = MSC_BS_CBW; // this line is added
break;

7. DMA in MSC_BulkOut. If DMA is used, the BulkBuf
must start on 4B alligned
address. This was not the case and the DMA operation
(which ignores the 2
lowest bits of the address) overwrote another
variable, in this case it was
BulkStage. ((Curiously this accidental overwrite
caused the device to be
visible (only when DMA used) even before I found the
bug in point 6.)) The
workaround is to mark the BulkBuf declaration:
BYTE BulkBuf[MSC_HS_MAX_PACKET] __aligned4; /* Bulk
In/Out Buffer */
where
#define __aligned4 __attribute__ ((aligned (4)))

8. DMA in MSC_BulkOut 2nd. If high speed USB is used,
the data packet is
512, and DMA writes the packet length into first 4
bytes of the buffer. The
buffer was only 512 bytes big, so it is necessary to:
BYTE BulkBuf[MSC_HS_MAX_PACKET+4] __aligned4; /* Bulk
In/Out Buffer */
and in the MSC_BulkOut add the +4:
USB_DMA_Setup (MSC_EP_OUT, (DWORD)BulkBuf,
MSC_HS_MAX_PACKET+4);
If this is not done, it is possible to read from the
device, but the write
crashes.

9. Other data aligned. Might not be strictly
necessary.
In usbcore.c:
BYTE EP0Buf[USB_MAX_PACKET0] __aligned4;
USB_SETUP_PACKET SetupPacket __aligned4;
And all the descriptors in usbdesc.c, like:
const BYTE __aligned4 USB_DeviceDescriptor[] = {

10. DMA vs. RAM caching. If DMA used, the RAM caching
should better be
disabled as mentioned in prev. mail.

With those changes, it is possible to read/write files
onto the device in
both DMA and non-DMA under Windows, I didn't test in
OSX yet.

With regards,
Jan

An Engineer's Guide to the LPC2100 Series

Hi group

Further to this, I am fighting alignment problems arising from
conversions between pointers to integers, and pointers to chars...is
there a setting to turn the compiler to fully blocked ?

Appologies in advance I realise this is a newbie question...but I have
looked for it and cannot find it...I just need to confirm...or
suggestions as to what would be the 'normal' approach to this are
welcome

Best regards

Jim ( Active-pcb Reading UK )
At 04:07 PM 5/19/2008 +0000, Jim wrote:
>Hi group
>
>Further to this, I am fighting alignment problems arising from
>conversions between pointers to integers, and pointers to chars...is
>there a setting to turn the compiler to fully blocked ?

Fully blocked?

Why are you converting between pointers to char and pointers to int? That
is generally a very bad thing to do.
>Appologies in advance I realise this is a newbie question...but I have
>looked for it and cannot find it...I just need to confirm...or
>suggestions as to what would be the 'normal' approach to this are
>welcome

The proper approach would be not to do that. What are you trying to do
that makes you want to try this approach?

Robert

"C is known as a language that gives you enough rope to shoot yourself in
the foot." -- David Brown in comp.arch.embedded
http://www.aeolusdevelopment.com/
Sorry Robert...and thank you for your response

Yes...I am re-inventing nomenclature...sorry that came from the
distant past somewhere :-)

What I meant was how do I set the crossworks compiler to byte
alignment

What is happening is that the examples we are working with use
pointers to integers ( eg Flash write, or USB write etc ) writing
four bytes at a time to destination...presumably that is a sensible
thing to do...even though it seemed a little strange to me as well
( I come from an 8-bit background )

The source is likely to be buffers of say unsigned chars sitting in
RAM...or other

What is happening is that the writes are scrambling the data...which
is curious :-)

If we re-align things on four byte boundaries...it comes alive
again...but what a pain !

Regards

Jim

Hi,

I think you can not overcome this. Simply some ARM cores can not make 32
bit accesses to unaligned addresses. So either you get rid of the 32 bit
wide access, or you align your buffers. The second one is a better idea
since it makes the code faster.
CrossWorks is using GCC so you need some "__attribute((align(4)))" like
thing after your variables. For example:

char buffer[255] __attribute((align(4)))

Or if you want to have a portable version you can do this:

union buffer {
unsigned long dummy;
char data[255];
}

This will make most compilers align the whole union to 32 bit boundary.
I think the first one is less pain.

Foltos

Jim wrote:
> Sorry Robert...and thank you for your response
>
> Yes...I am re-inventing nomenclature...sorry that came from the
> distant past somewhere :-)
>
> What I meant was how do I set the crossworks compiler to byte
> alignment
>
> What is happening is that the examples we are working with use
> pointers to integers ( eg Flash write, or USB write etc ) writing
> four bytes at a time to destination...presumably that is a sensible
> thing to do...even though it seemed a little strange to me as well
> ( I come from an 8-bit background )
>
> The source is likely to be buffers of say unsigned chars sitting in
> RAM...or other
>
> What is happening is that the writes are scrambling the data...which
> is curious :-)
>
> If we re-align things on four byte boundaries...it comes alive
> again...but what a pain !
>
> Regards
>
> Jim
>
Good idea foltos

Thanks for the suggestions...appreciated

Jim
Jim Wrote
>Sorry Robert...and thank you for your response
>
>Yes...I am re-inventing nomenclature...sorry that came from the
>distant past somewhere :-)

I'm sure I've done that more than a few times myself.

>What I meant was how do I set the crossworks compiler to byte
>alignment

You really don't want to do that. It will give you larger slower code.

>What is happening is that the examples we are working with use
>pointers to integers ( eg Flash write, or USB write etc ) writing
>four bytes at a time to destination...presumably that is a sensible
>thing to do...even though it seemed a little strange to me as well
>( I come from an 8-bit background )

It can make sense since writing a single aligned integer is likely to take
the same time as a single write of a char. The key word is aligned. If
it's not aligned it will take at least as long as writing 4 chars (on this
architecture)

An optimized memcpy is likely to have all this taken care of (including an
alignment check)

>The source is likely to be buffers of say unsigned chars sitting in
>RAM...or other
>
>What is happening is that the writes are scrambling the data...which
>is curious :-)

Either that or a segment fault is to be expected on an aligned
architecture. What happens is when you read a word on a byte boundary is
the chip simply drops the lowest bits from the address (it's allowed to
create an exception instead) so you get something other than what you
expect unless it happens to line up.

>If we re-align things on four byte boundaries...it comes alive
>again...but what a pain !

Use memcpy (if you were worried about performance you wouldn't be trying to
work with non aligned values anyway ;). Or use a cracking function written
like

uval = (unsigned int)buf[0] | ((unsigned int)buf[1]) << 8;

This also allows you to deal with differing integer sizes and endianess
cleanly (making it superior to memcpy). If the compiler knows the values
are aligned it's allowed to optimize that expression. And it should take
no more time/space in the resulting code than an alignment correcting
keyword would. It is also largely portable between platforms (as you just
discovers plain casting of pointers is not).

And yes, either approach will require some rewrite work but you will end up
with better, more robust code at the end.

Robert
--------------------------------
mail2web.com What can On Demand Business Solutions do for you?
http://link.mail2web.com/Business/SharePoint

Hi Robert
I really appreciate your explanation, that is definately what I have
been seeing, and it clears things up nicely.

I will make a note of your 'cracking' function...I like it

Best regards
Jim

Memfault Beyond the Launch