EmbeddedRelated.com
Forums

Nand flash read isuue

Started by oddparity 6 years ago8 replieslatest reply 6 years ago288 views

Hi,

I have a custom made board based on Atmel SAME70Q20 microcontroller. I am having trouble figuring out a strange bug related to external Flash. I have integrated ELM Chan's FATFS in the code and sometimes if I add some code or modify anything even if it doesn't have anything to do with the file system. It gives me FR_NO_FILESYSTEM error. I traced the problem to flash read. Actually during the initialization when it is looking for the FATFS boot record the read on that sector returns all bytes as 0xFF and doesn't give any error either. The cal to f_mkfs also completes successfully and I know that the data was written to the flash because if I read the boot sector just after formatting the boot record is there, but later when it initializes further the same sector's read returns 0xFF in the whole sector. 

This issue is driving me nuts and I seem to have exhausted all my knowledge and hit a roadblock. If anyone could point to where could the problem lie, or where to investigate, it will be very helpful.


Edit- adding the code thats inside my main

 sysclk_init();

NVIC_SetPriorityGrouping(0);

board_init();

configure_console();

// Check new firmware is available or not. 

if ( is_newFirmware() == SUCCESS ) {

debug_puts("Jumping to boot loader\r");

// Jump to bootloader code.

jump_to_app(BOOTLOADER_ADDRESS);

}

// display welcome message

gstrFcb_utility.welcome_msg();

// initialize nand flash on EBI and mutex for locking bus access.

strCbNand.Init();

debug_puts("initializing file system for iSenseMicro\r");

if( g_strCbFS.init() != FR_OK ) {

#ifdef ismConfigUSE_WDT

// Disable watchdog

wdt_disable(WDT);

#endif

debug_puts("Creating file system on disk 0\r\n");

delay_s(5);

// Format file system & create default files and directories.

if ( ( g_strCbFS.format_disk() == FR_OK ) && ( ISM_FactoryReset() == SUCCESS ) ) 

{

debug_puts("Created file system & loaded default settings\r");

}

else

{

debug_puts(" failed to reset system settings\r\n");

FS_FormatFullFileSystem();

}

delay_s(3);

system_reset();

}


g_strCbFS.init() is the function where problem would occur. Inside it actually I am just mounting the volume and opening the root directory. Opening the root directory gives error FR_NO_FILESYSTEM sometimes.


Update:- I fixed my issue but I still don't understand what was the exact problem. If anyone could help understand it (just to satisfy the curiosity) would great. Actually it was the gcc optimization that was causing the problem. We were using the O1 optimization and I got a hunch that it might have something to do with the optimization. When I disable the optimization it works fine, furthermore if I disable the optimization specifically for files containing the raw nand access api the code works fine.

Although I think I am wrong but what lead me to look in this direction is that I had read that compilers align or not align the data and code at word boundary depending upon some flag because it is faster to access a word at a time for CPU than say something which starts at fraction of word which causes more CPU cycles. Since my issue appeared or disappeared randomly after some code change, I thought maybe some code or data was getting aligned or dis-aligned which somehow affected the execution but that doesn't seem to be the case.

So if anyone can explain what flags in O1 level optimization could affect the code in such way.

Following are the flags in O1 optimization in gcc for reference: 

-fauto-inc-dec 

-fbranch-count-reg 

-fcombine-stack-adjustments 

-fcompare-elim 

-fcprop-registers 

-fdce 

-fdefer-pop 

-fdelayed-branch 

-fdse 

-fforward-propagate 

-fguess-branch-probability 

-fif-conversion2 

-fif-conversion 

-finline-functions-called-once 

-fipa-pure-const 

-fipa-profile 

-fipa-reference 

-fmerge-constants 

-fmove-loop-invariants 

-fomit-frame-pointer 

-freorder-blocks 

-fshrink-wrap 

-fshrink-wrap-separate 

-fsplit-wide-types 

-fssa-backprop 

-fssa-phiopt 

-ftree-bit-ccp 

-ftree-ccp 

-ftree-ch 

-ftree-coalesce-vars 

-ftree-copy-prop 

-ftree-dce 

-ftree-dominator-opts 

-ftree-dse 

-ftree-forwprop 

-ftree-fre 

-ftree-phiprop 

-ftree-scev-cprop 

-ftree-sink 

-ftree-slsr 

-ftree-sra 

-ftree-pta 

-ftree-ter 

-funit-at-a-time

Thanks

[ - ]
Reply by SpiderKennyOctober 17, 2018

There's probably a few ways to debug this, and I don't know much about what else is on your board, nor even which make of external flash you are using. (eg is it an SPI flash?)

So anyway, I'd start by adding some debug code to the flash write functions. (Do you have a spare UART you can send debug string out) and check everything that is being written to the flash.

Or you could attach a logic analyzer to the actual flash pins, and see if somehow something is sending erroneous writes and/or sector erase commands.

Also check that CS is being asserted only at the correct times.

**EDIT** Spelling and grammar.

[ - ]
Reply by oddparityOctober 17, 2018

Hi spiderkenny,

Pardon my ignorance if I did not understand something as I have started pretty recently. The external flash is connected to the controller via SMC interface.

The main problem is I cannot debug the actual write functions as the Flash Transition Layer is a third party code and I only have a static library and no source for it. Still I have asked my project lead, I'll see if I can my hands on it.

I think last resort is the logic analyzer only. I'll try get one and see what I find.

[ - ]
Reply by ivanovpOctober 17, 2018

Hello,

have you got FTL (flash translation layer)? It could be a bug in the implementation of FTL. If you don't have an FTL: using FAT filesystem on a flash media is a very bad idea. Flash blocks only can be erased ~10000..100000 times (check the data sheet). Writing the FAT chains on the beginning of flash will wear out the first blocks. 

Edit: data bytes in worn out blocks will not turn to 0xFFs in my opinion. However you need an FTL for FAT.

Other source of this kind of error is timing: e.g. not delaying ~100ns before reading ready/busy signal (that's only true for parallel NANDs). If you change something in the code, you put or remove little delays which let the code run well/bad.

Best regards,

Peter

[ - ]
Reply by Jim_255October 17, 2018

I agree completely with ivanovp.  A project I designed some 20 years ago was based on a compact flash device.  The powers that be wanted the device to update a file, the current machine status, every second while the device was in operation.  I told them the life of the device would be no more than 6 - 10 months out in the field.  They didn't listen.  4 months after I was "Let go" I got the phone call, "You were right"!!!  Flash is good for long term storage, you just have to "Manage" the usage.  It's not magnetic storage and can't be treated the same.


And that's my 10 cents! (2 cents adjusted for inflation)

-255

[ - ]
Reply by mr_banditOctober 17, 2018

Terrible when management cannot do 3rd grade math... I suspect you were "let go" because you had the temerity to argue with management...

One technique I use is a "write-though cache". Create a buffer of the same data structure as on the flash device (which might be the EEPROM on the chip, ala Atmel). Do all of your read/writes with that buffer. Periodically, run down the buffer and only write the changed bytes to the flash. (Either do a read flash and compare, or dirty bit, or ...)

(NOTE: All of these comments assume the ability to erase a single byte, not just a page. Or you have two pages and can afford to erase the oldest page when you need it.)

This avoids writing to non-changed bytes, and makes read of the data for normal operations a lot easier and faster. You do need to supply an API, but it is easy to write and call. The read of the data can be directly out of the buffer. If you do a read/compare for the write, you can do the write directly into the buffer. In this case, you only need two API functions: one for the read from flash and one for the write to flash.

In general, for an embedded system, you are only caring about (maybe) 200 bytes, usually less. If you have the space on the EEPROM, you can also do some wear-leveling by having a ring buffer on the EEPROM and write around the ring. If your ring is 8 long (for example) you have divided the writes to a particular set of cells by a factor of 8.

Say you have a 16-byte structure that needs to be updated on flash every second. If you have a 1024 byte page, you can fit 64 copies in a ring buffer. Worst case, you have reduced your writes by a factor of 64, leading to a mucgh longer lifetime.

Jim_255 is spot on - you need to carefully manage the EEPROM writes. Spend time on the design. Determine exactly what you need.

[ - ]
Reply by oddparityOctober 17, 2018

Hi ivanovp,

I have the flash transition layer, although I have been thinking of doing a flash friendly filesystem that is not a priority right now. 

I'll investigate more regarding the delay and see what I find.

[ - ]
Reply by Bob11October 17, 2018

Have to agree with SpiderKenny. It sounds like you've exhausted many of the software options. As this is a custom board it's time to break out the oscilloscope and start verifying proper accesses, setup and hold times, and all that. Hopefully you've got some spare GPIO; if so, trigger the 'scope using a GPIO toggle when the error occurs.

[ - ]
Reply by bamosOctober 17, 2018

I agree with SpiderKenny - it sounds like your flash is getting erased somehow - or you're not actually reading it correctly. Double check your low-level commands to ensure they are working properly, as well as hardware configuration - making sure pins are properly setup as outputs, clean looking signals, etc.