Forums

Amtel SAM9 "boot from NAND" is a myth?

Started by Grant Edwards September 15, 2010
Vladimir Vassilevsky wrote:
> Marc Jet wrote: >> Some blocks come already marked as bad from >> factory. It is recommended to preserve this information, as factory >> testing is usually more exhaustive than what you can implement in a >> typical embedded system. > > You are making unfounded assumptions here.
No, he is citing usual data sheets. So while skeptics may still doubt that factory testing happens, Marc's claim certainly is not unfounded, because you can read it in every data sheet :-] Stefan
On 2010-09-20, Ulf Samuelsson <nospam.ulf@atmel.com> wrote:
> 2010-09-20 16:18, Grant Edwards skrev:
> If you add an SPI flash (or a dataflash) and plan to boot Linux, you > are probably better of by putting also u-boot and u-boot environment > in the dataflash. You might also want to consider the kernel.
While the ROM bootloader supports the 25xx series "dataflash" parts we got sold, there is no support in the AT91 bootstrap, U-Boot, or Linux -- at least not that I could ever find. I asked about it on the AT91 forum a few months back and got the usual response (IOW, none at all).
> Reason is the SAM-BA S/W which only knows how to erase the complete > NAND flash.
Oh, I fixed that ages ago. I added a few lines of code to the nand-flash, data-flash, and serial-flash applets so that they can all erase a region of flash. Then I wrote my own ROM-boot-protocol client in Python. [Besides the lack of an "erase region" command, SAM-BA won't work at all using a serial connection on a Linux host, it's not very usable from the command-line, and it isn't very easy to use as a module from other programs.]
> If you plan to program the NAND flash using another method, then of > course, use the NAND flash for everything except bootstrap.
We'll initially use "SAM-BA" replacement program to program prototypes. Then for production, the plan is to have the distributor ship them with U-Boot preprogrammed so that we can use the TFTP server in U-Boot to do the rest. -- Grant Edwards grant.b.edwards Yow! I Know A Joke!! at gmail.com
On 20/09/2010 19:19, Stefan Reuther wrote:

> Remember, we > don't need 100% reliability. After all, all components have a finite > life, and the flash just needs to live a little longer than the plug > connectors or capacitors in the device :-) And by using many bits, I > believe to got the chance that they all refuse to flip low enough. >
This is what some people here apparently have trouble understanding. /Nothing/ is 100% reliable - it's just a matter of taking the reliability of your parts into account when designing a complete system.
> It's a flash. It's electrons that tunnel out gradually. It's not an evil > gnome sitting within the package, deciding "today, I'll annoy the > engineer in an especially evil twisted way", so while the data sheet > allows a NAND flash to keep its old contents unmodifiably in a bad > sector, I assume this doesn't happen in practice. Or, at least, not > often enough to be observable. > > > Stefan >
2010-09-20 20:10, Grant Edwards skrev:
> On 2010-09-20, Ulf Samuelsson<nospam.ulf@atmel.com> wrote: >> 2010-09-20 16:18, Grant Edwards skrev: > >> If you add an SPI flash (or a dataflash) and plan to boot Linux, you >> are probably better of by putting also u-boot and u-boot environment >> in the dataflash. You might also want to consider the kernel. > > While the ROM bootloader supports the 25xx series "dataflash" parts we > got sold, there is no support in the AT91 bootstrap, U-Boot, or Linux > -- at least not that I could ever find. I asked about it on the AT91 > forum a few months back and got the usual response (IOW, none at all). >
It is hidden in a disused lavatory in the cellar marked: "Beware of the Leopard". There are three different AT91bootstraps around. 1) The obvious AT91bootstrap you can download from www.atmel.com 2) My derivative of AT91bootstrap which adds Kconfig etc. and is used by open source projects like Buildroot and OpenEmbedded. 3) There is normally an AT91bootstrap in the "Softpack's". This is different from (1) and (2). It supports the 25xx series SPI flash but relies on libraries not normally available in arm-linux compilers so you may have to compile it using arm-newlib, IAR or Keil.
>> Reason is the SAM-BA S/W which only knows how to erase the complete >> NAND flash. > > Oh, I fixed that ages ago. > > I added a few lines of code to the nand-flash, data-flash, and > serial-flash applets so that they can all erase a region of flash. >
Nice, how about sharing!
> Then I wrote my own ROM-boot-protocol client in Python. >
> [Besides the lack of an "erase region" command, SAM-BA won't work at > all using a serial connection on a Linux host, it's not very usable > from the command-line, and it isn't very easy to use as a module from > other programs.] >
>> If you plan to program the NAND flash using another method, then of >> course, use the NAND flash for everything except bootstrap. > > We'll initially use "SAM-BA" replacement program to program > prototypes. Then for production, the plan is to have the distributor > ship them with U-Boot preprogrammed so that we can use the TFTP server > in U-Boot to do the rest. >
Or, if you have an SD-Connector, you can boot from an SD-Card/ external SPI flash which programs the internal flash. -- Best Regards Ulf Samuelsson These are my own personal opinions, which may (or may not) be shared by my employer Atmel Nordic AB
On 2010-09-20, Ulf Samuelsson <nospam.ulf@atmel.com> wrote:
> 2010-09-20 20:10, Grant Edwards skrev:
>> While the ROM bootloader supports the 25xx series "dataflash" parts we >> got sold, there is no support in the AT91 bootstrap, U-Boot, or Linux >> -- at least not that I could ever find. I asked about it on the AT91 >> forum a few months back and got the usual response (IOW, none at all). > > It is hidden in a disused lavatory in the cellar marked: > "Beware of the Leopard".
Ah! I just listened to Stephen Fry's auidiobook of THHGTTG last weekend while driving home from Chicago.
> There are three different AT91bootstraps around. > > 1) The obvious AT91bootstrap you can download from www.atmel.com
That's the one I looked at.
> 2) My derivative of AT91bootstrap which adds Kconfig etc. and is used > by open source projects like Buildroot and OpenEmbedded.
Though am using buildroot for my rootfs, I found it more convenient to build other things (kernel, bootstrap, U-Boot) separately, so I never really looked into that one.
> 3) There is normally an AT91bootstrap in the "Softpack's". This is > different from (1) and (2). It supports the 25xx series SPI flash > but relies on libraries not normally available in arm-linux > compilers so you may have to compile it using arm-newlib, IAR or > Keil.
That's interesting. I'll keep that in mind.
>>> Reason is the SAM-BA S/W which only knows how to erase the complete >>> NAND flash. >> >> Oh, I fixed that ages ago. >> >> I added a few lines of code to the nand-flash, data-flash, and >> serial-flash applets so that they can all erase a region of flash. > > Nice, how about sharing!
Sure. The changes to the applets can certainly be shared. I'll have to check with management regarding my sam-ba client replacement. I just double-checked, and the erase-region command has been added to the nandflash and dataflash applets, but it never got added to the serialflash (AT25xx) applet.
>>> If you plan to program the NAND flash using another method, then of >>> course, use the NAND flash for everything except bootstrap. >> >> We'll initially use "SAM-BA" replacement program to program >> prototypes. Then for production, the plan is to have the distributor >> ship them with U-Boot preprogrammed so that we can use the TFTP >> server in U-Boot to do the rest. > > Or, if you have an SD-Connector, you can boot from an SD-Card/ > external SPI flash which programs the internal flash.
That's also an option, but since we'll have to connect an Ethernet cable anyway as part of the normal production test process, we want to use Ethernet as the programming interface as well. -- Grant Edwards grant.b.edwards Yow! Is a tattoo real, like at a curb or a battleship? gmail.com Or are we suffering in Safeway?
> State of the art seems to be to use magic numbers for valid data, and > destroy the ECC and/or magic numbers for blocks that are gone bad, so > you can identify them later. That's the "bad block flag".
This seems to be "industry standard" from my experience as well. But IMHO it's not a good solution to the problem. Typical NAND datasheets do not specify the behaviour of bad blocks. The approach you mention, relies on certain behaviour from the bad blocks (e.g. ability to erase or overwrite). This is why I think it is a bad approach. Another approach is the following: The chip is partitioned into data blocks and spare blocks. During mount, all block headers are scanned in a specific order, e.g. ascending order for data blocks, and descending order for spare blocks. Every data block contains a header which contains its physical block number and a (cryptographical) hash signature. Blocks without valid hash signature are considered bad or stale (e.g. powerfail during erase). In the first pass, every data block that passes this test is considered valid - until a spare block overrides it. The spare block header contains its own physical block number, and the physical data block number of the block it replaces, and a hash signature as well. If a spare block exists for a data block, the data block is degraded to "bad". No matter what the data block content has claimed to be. Likewise if another valid spare block refers to the same data block, it overrides the previously read spare block (thus the block scanning order). After all, what we arbitrarily designated to be "spare" blocks, could be bad blocks too.. This method is able to memorize any combination of up to N bad blocks, no matter what the bad block behaviour is. Up to the collision resistence of the hash algorithm, of course. You can achieve any desired reliability by choosing the hash algorithm accordingly. The key point to understand is that the bad block information should be stored in the good blocks, not in the bad ones. The good blocks are the ones that have their behaviour specified.
In article <i7aj23$1qo$1@reader1.panix.com>, invalid@invalid.invalid 
says...
> > On 2010-09-20, Ulf Samuelsson <nospam.ulf@atmel.com> wrote: > > 2010-09-20 20:10, Grant Edwards skrev: > > >> While the ROM bootloader supports the 25xx series "dataflash" parts we > >> got sold, there is no support in the AT91 bootstrap, U-Boot, or Linux > >> -- at least not that I could ever find. I asked about it on the AT91 > >> forum a few months back and got the usual response (IOW, none at all). > > > > It is hidden in a disused lavatory in the cellar marked: > > "Beware of the Leopard". > > Ah! I just listened to Stephen Fry's auidiobook of THHGTTG last > weekend while driving home from Chicago.
So you will also have discovered about "There is no UP for rain to fall from, therefore rainfall of the universe is none" Let alone no sex... Says he the geek with CDs of the original Radio series.. -- Paul Carpenter | paul@pcserviceselectronics.co.uk <http://www.pcserviceselectronics.co.uk/> PC Services <http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font <http://www.gnuh8.org.uk/> GNU H8 - compiler & Renesas H8/H8S/H8 Tiny <http://www.badweb.org.uk/> For those web sites you hate
On Sep 19, 10:06=A0am, David Brown
<david.br...@hesbynett.removethisbit.no> wrote:
> On 19/09/2010 05:09, rickman wrote: > > > > > On Sep 18, 7:34 am, Stefan Reuther<stefan.n...@arcor.de> =A0wrote: > >> Allan Herriman wrote: > >>> On Fri, 17 Sep 2010 12:19:56 -0700, rickman wrote: > >>>> On Sep 17, 7:52 am, Marc Jet<jetm...@hotmail.com> =A0wrote: > >>>>> People commonly expect bad blocks to have more bit errors than thei=
r
> >>>>> ECC copes with. =A0However, nowhere in the datasheets is a guarante=
e for
> >>>>> this. > >> [...] > >>>> Looking back, I never actually used a NAND flash in a design. =A0I > >>>> understand how the bad bits would be managed. =A0But what about bad > >>>> blocks? =A0Is this a spec on delivery or is it allowed for blocks to=
go
> >>>> bad in the field? =A0I can't see how that could be supported without=
a
> >>>> very complex scheme along the lines of RAID drives. > > >>> It's pretty simple actually. =A0When the driver reads a block that ha=
s an
> >>> error, it copies the corrected contents to an unused block and sets t=
he
> >>> bad block flag in the original block, preventing its reuse. > >>> No software will ever clear the bad block flag, which means that the > >>> effective size of the device decreases as blocks go bad in the field. > > >> But where do you store the "bad block" flag? It is pretty common to > >> store it in the bad block itself. The point Marc is making is that thi=
s
> >> is not guaranteed to work. > > > Why do you need a bad block flag? =A0If the block has an ECC failure, i=
t
> > is bad and the OS will note that. =A0You may have to read the block ECC > > the first time it fails, but after that it can be noted in the file > > system as not part of a file and not part of free space on the drive. > > Failures can be intermittent - a partially failed bit could be read > correctly or incorrectly depending on the data stored, the temperature, > or the voltage. =A0So if you see that you are getting failures, you make =
a
> note of them and don't use that block again.
That's fine. But my point is that if the block is "bad" either you can either set a bad block flag or the ECC value be be invalid when the media is read. In either case you can flag it in your access system (don't want to call it a file system) and not use that block again until the next reboot. This only has a performance impact at boot time. You don't have to *rely* on a bad block flag since that can also be faulty. But it can be used in addition to detecting an ECC error. Rick
On Sep 20, 1:01=A0pm, Stefan Reuther <stefan.n...@arcor.de> wrote:
> [I haven't got rickman's post.] > > > > David Brown wrote: > > On 19/09/2010 05:09, rickman wrote: > > >> Why do you need a bad block flag? =A0If the block has an ECC failure, =
it
> >> is bad and the OS will note that. =A0You may have to read the block EC=
C
> >> the first time it fails, but after that it can be noted in the file > >> system as not part of a file and not part of free space on the drive. > > How do you mark it "in the file system" if your file system is actually > inside the NAND flash? Thought experiment: your bad block table is > stored in a particular block. That block goes bad. Where do you mark > that this block is now bad?
Sure, if that is your file system, then it doesn't work very well for NAND flash does it? The bad sector 0 problem is one that hard drives have to this day, don't they? Or maybe the internal controller can remap that "invisibly" now that there are tons of embedded smarts in them. But that is the point. If your device can't "fix" a bad block in the lowest level of the file system on the drive, then it is subject to failure. If on boot, the software does what it has to do to recover the structure of the file system, then it will be robust.
> State of the art seems to be to use magic numbers for valid data, and > destroy the ECC and/or magic numbers for blocks that are gone bad, so > you can identify them later. That's the "bad block flag".
Yes and once you find a "bad block" the "access system" (I really shouldn't call it a file system since you might not be working at that level) will have to remember this block in memory, not on the drive. Each time the system is booted, it will have to either read a valid bad block table, or construct its own. I supposed that each time the device needs a new block to write data it could search for a working block. That would be a very primitive system as well as slow, but it would work and would not require a bad block table. BTW, I assume that in order to trust a block on a NAND drive each write would need to be verified in some manner. Is that also included in a NAND access system?
> > Failures can be intermittent - a partially failed bit could be read > > correctly or incorrectly depending on the data stored, the temperature, > > or the voltage. =A0So if you see that you are getting failures, you mak=
e a
> > note of them and don't use that block again. > > From what I've seen, those temporary failed bits are still within the > specs of the NAND flash as long as you're running the part within specs. > However, when you're way out of spec (say, 30=B0C over limit), all hell > breaks loose.
Not sure what you mean by "within the specs". Are you saying the spec allows some level of intermittent failure on reads and/or writes? If so, there is still some level of intermittent that would be outside the spec and needs to be flagged as bad. Rick
On 22/09/2010 18:09, rickman wrote:
> On Sep 19, 10:06 am, David Brown > <david.br...@hesbynett.removethisbit.no> wrote: >> On 19/09/2010 05:09, rickman wrote: >> >> >> >>> On Sep 18, 7:34 am, Stefan Reuther<stefan.n...@arcor.de> wrote: >>>> Allan Herriman wrote: >>>>> On Fri, 17 Sep 2010 12:19:56 -0700, rickman wrote: >>>>>> On Sep 17, 7:52 am, Marc Jet<jetm...@hotmail.com> wrote: >>>>>>> People commonly expect bad blocks to have more bit errors than their >>>>>>> ECC copes with. However, nowhere in the datasheets is a guarantee for >>>>>>> this. >>>> [...] >>>>>> Looking back, I never actually used a NAND flash in a design. I >>>>>> understand how the bad bits would be managed. But what about bad >>>>>> blocks? Is this a spec on delivery or is it allowed for blocks to go >>>>>> bad in the field? I can't see how that could be supported without a >>>>>> very complex scheme along the lines of RAID drives. >> >>>>> It's pretty simple actually. When the driver reads a block that has an >>>>> error, it copies the corrected contents to an unused block and sets the >>>>> bad block flag in the original block, preventing its reuse. >>>>> No software will ever clear the bad block flag, which means that the >>>>> effective size of the device decreases as blocks go bad in the field. >> >>>> But where do you store the "bad block" flag? It is pretty common to >>>> store it in the bad block itself. The point Marc is making is that this >>>> is not guaranteed to work. >> >>> Why do you need a bad block flag? If the block has an ECC failure, it >>> is bad and the OS will note that. You may have to read the block ECC >>> the first time it fails, but after that it can be noted in the file >>> system as not part of a file and not part of free space on the drive. >> >> Failures can be intermittent - a partially failed bit could be read >> correctly or incorrectly depending on the data stored, the temperature, >> or the voltage. So if you see that you are getting failures, you make a >> note of them and don't use that block again. > > That's fine. But my point is that if the block is "bad" either you > can either set a bad block flag or the ECC value be be invalid when > the media is read. In either case you can flag it in your access > system (don't want to call it a file system) and not use that block > again until the next reboot. This only has a performance impact at > boot time. You don't have to *rely* on a bad block flag since that > can also be faulty. But it can be used in addition to detecting an > ECC error. > > Rick
You can track bad blocks in all sorts of different ways. Some will involve more work when the bad block is discovered, others will involve more checking before using a block. But any file system, or "access system" if you like, has to have some way of tracking whether a block is in use or not. If you think of bad blocks as being in use in a special file that can't be accessed normally, then you have got simple and efficient bad block tracking (at least, it's as simple and efficient as the rest of your file system).