Hi Al,
On 1/19/2015 10:24 AM, alb wrote:
> Don Y <this@is.not.me.com> wrote: []
>>> Uhm, wear leveling algorithms do not *hope* anything, actually they
>>> spread the maximum allowed number of E/P cycles all over the device
>>> instead of a single block, since endurance is defined _per block_.
>>
>> In an ideal world (process), all blocks would be identical in terms of
>> performance, durability, etc. In that case, you can trivially cycle
>> locks through the algorithm in any "fair" order and never see any
>> difference in performance (i.e., durability/error resistance).
>>
>> However, processes aren't ideal. And, data "patterns" only add to the
>> uncertainty. You can't *know* that each block -- having ALL been
>> "recycled" exactly N times -- will exhibit the same sorts of "read
>> failures" (again, just talking about reads to simplify the discussion)
>> when asked to store a particular data pattern (in the context of the
>> *other* data patterns located "around" it).
>>
>> You should be able to test this by deliberately exhausting some sample
>> devices and/or partial writes (cut the write timer short,
>> deliberately) to *some* blocks but not others. I.e., don't let
>> entropy average all of that out across the device. *Then*, try to use
>> the device with a "fair" block swapping algorithm and track how long
>> each *actual* (physical) block "survives" (provides reliable data
>> below ECC threshold that suggests a need to rescrub) before your
>> controller decides that it needs to be "recycled" -- erased and
>> rewritten (or, reused to replace another block).
>
> The main point here is about your reference to *entropy*. In the end you
> have to take *entropy* into accound no matter what kind of data patterns
> you use. Blocks that show better performance now will be used *more* and
> your /wear-leveling/ is not leveling anymore, inducing more wear in the
> preferred blocks, hence reducing their performance (actually if we limit
> the discussion to the 'read disturb' issue I'm not convinced that
> there's any wear going on.)
The "wear" (from the "read disturb") comes because those blocks eventually
get recycled and incur write-erase cycles.
You can't claim that the "preferred" blocks will see more wear -- there are
other blocks IN USE that are "less preferred". Presumably, they are less
preferred because they trip the ECC *EARLIER*. As such, they will experience
a recycle event *sooner* than the "more preferred" blocks (because the
more preferred blocks trip the ECC less often).
>> If you can associate a metric with each physical block that represents
>> its historical performance, you can "level" based on *performance*
>> instead of *use*.
>>
>> I.e., one leveling algorithm just moves recycled blocks to the end of
>> the queue and always picks the first block in the queue for use when
>> some existing block needs to be recycled out of service. This ensures
>> each block gets *erased* the same number of times -- regardless of how
>> long a particular physical block "survived" IN ITS MOST RECENT (or sum
>> of all preceding) USE.
>>
>> You could, instead, track the number of *reads*-before-error-limit for
>> each block and arrange for those with the highest "durability" to be
>> reused *first*. Over time, their performance will degrade (i.e., they
>> will start to exhibit more errors, sooner) so that some other block
>> that had previously been considered "less durable" is regarded as MORE
>> durable than this "worn" block.
>
> Than what would be the benefit w.r.t. to equally spreading load?
Think of it this way: you have N blocks in the device. You need M (M<N).
ASSUME all blocks are NOT created equal (process variation, usage patterns,
etc.). You can now *choose* which blocks you want to use to STORE YOUR DATA
(i.e., that's the goal -- storing information and later being able to
confidently RETRIEVE it!). Would you choose the blocks that tend to exhibit
the MOST errors (over time)? The least errors? Or, with no concern at
all for their past performance??
ASSUME you *can* recover from an ECC event (i.e., the first such event
is a "warning" -- and no data has been lost prior to this event). Do you
*welcome* the possibility of having to take an exception, *now*, and
rewrite that data that you've now been warned about? Esp given that
the rewrite event has some cost *and* probability of failing?
Wouldn't you rather *avoid* this possibility, where possible?
> Another point is that unfortunately we *cannot* avoid to take into
> account other failure modes (on write/erase cycles), since they *will*
> certainly induce wear.
>
> Again any metric that would /rank/ blocks according to performance would
> lead to uneven wear that you need to compensate at the end if you want
> to keep memory size /constant/ (i.e. no loss of blocks due to wear).
As a block wears, it's performance metric *degrades* (assuming wear causes
ECC events to be triggered more frequently). So, a "preferred block"
eventually loses that status to other blocks that previously had *worse*
"performance". I.e., over time, the *performance* of blocks is what you
end up leveling -- not the number of write-erase cycles!
>> [Sorry, I'm being verbose but I don't feel I'm being *clear*. :< ]
>
> I think I got your point, but I'm not seeing the added value of any
> ranking mechanism.
>
>> I'm oversimplifying this as there are other issues related to geometry
>> that play a part. My point is to show that *how* you choose to reuse
>> blocks can be implemented in different ways with different criteria.
>> The point being that you want to minimize the number of "recycle
>> events" that you take on -- especially if the block you *pick* as a
>> replacement block may not write properly... so, you end up with a
>> "double fault" of sorts.
>
> But you need to factor in that behavior anyhow, since betting on the
> best performance is not going to pay on the long run and sooner or later
> you'll be facing 'double fault' sorts of errors anyhow. If your
> algorithm is not ready to take that into account you'll be in for a
> nasty surprise.
All I'm changing in the algorithm is the "block selection algorithm" -- how
you pick *which* block to use from your overprovisioning store to handle
the *next* "recycle" event. If *it* fails the write, then you pick another
block -- using the same criteria. I.e., how you handle an ECC event *or*
a write-fault doesn't change -- just how you *pick* which block to use
for your recovery attempt after such an event.
>>> You count the number of E/P cycles each block has accumulated and you
>>> choose to write data in available blocks with the least lived E/P
>>> cycles (for example). The E/P cycles available become effectively
>>> multiplied by the number of available blocks.
>>
>> That's the "each block gets erased the same number of times" approach
>> I described above. Wouldn't you rather find a set of blocks that
>> *needs* to be erased the least number of times??
>
> Assuming you find the necessary metric to /rank/ the blocks, what is the
> overall *gain* if you end up wearing those *more* than others and
> effectively losing that gap in performance more quickly?
You're not "wearing" them faster (see above).
Given: You are going to perform N reads in some unit of time/effort.
The goal is to minimize the number of read errors that you encounter.
Read errors lead to write-erase cycles (recycle events) which cost
resources (time/performance) AND reduce the overall durability of
the device. Wouldn't you *naturally* want to pick the blocks that
give you the least number of read errors AND, thus, recycle events?
>>> Depending on your ECC there might be a limited choice, i.e. assume
>>> your ECC recovers only one bit flip and report multiple ones.
>>> Whenever I detect (and correct) *one* single error, I have no other
>>> choice but to act otherwise, if I wait too long my ECC won't help me
>>> anymore.
>>
>> This depends on the nature of the source of your errors. E.g., do you
>> replace the RAM in your PC when it reports *any* CORRECTED errors?
>> (I've asked this question before and its fun to watch people try to
>> rationalize how they use that "feature" -- do they even *know* when
>> there are error events being corrected? when do they decide there are
>> "too many"?)
>
> AFAIK the ECC used in PCs are more like BHC, with low overhead and
> strong correction capability at the expense of a more complex algorithm.
The point of my parenthetical comment was: "How do you decide when the
ECC is *too* active? What criteria do you use to determine that errors
may be slipping through undetected? How do you (PC user) even KNOW
that this is happening??
> You could set a threshold on the amount of detected and corrected and
> take an action only when the threshold is crossed. You still have
> reliable data but you better correct those errors before the capability
> of your ECC is not sufficient anymore.
>
> In my case, in order to reduce complexity in the fabric (the algorithm
> will be implemented in an FPGA), I'll be more inclined towards a simple
> hamming that can correct *one* and detect more than *two*. Therefore
> I'll need to take action on *any* error.
>
>>> There are attempts to show some sort of error patterns that are more
>>> likely to show up and hence can be managed accordingly, but I'm a
>>> fond of the K.I.S.S. principle and I'm not so much interested in
>>> building a read/write patterns aware wear leveling algorithm :-).
>>
>> The resource that you can't exhaust is write-erase cycles. So, you
>> really want to come up with an algorithm that minimizes the number of
>> times you "give up" on a written block -- in favor of refreshing it
>> here or elsewhere. That's why I was saying (earlier) to rethink your
>> algorithm to track "measured durability" (i.e., how *long* before
>> errors detected) and use that to govern the selection of replacement
>> blocks -- instead of just a FIFO (or even random) algorithm.
>
> The thing is that metric 'accumulated' on read operations do not
> necessarily provide any wear level. AFAIK there's no indication that
> your write-erase cycle is *close* to the end of life.
When you run out of write-erase cycles, your "recycle" events (or, write
events) start failing. The point of my "read history" metric is to
keep blocks from going *through* write-erase cycles needlessly (i.e.,
if a particular block isn't giving you ECC errors, why rewrite it?
Why incur that "wear event" if not needed? Similarly, why not pick
blocks that don't need to be rewritten as frequently if you have such
a metric?
[The point of my initial comment was to suggest that these algorithms
are too complex to implement in dedicated logic. Esp if you are
trying to get the best (most reliable) performance from the array.]
> 'Read disturb' errors are not related to wear, as well as errors induced
> by heavy ions so it would be difficult to make an algorithm that ranks
> the /strenght/ of each block.
Each write-erase cycle alters the characteristics of the cells in the block.
I.e., "wear" is nonreversible. As a cell wears, it is less capable of
preserving the correct state of charge (i.e., storing the desired "data")
This manifests as a greater susceptibility to disturb events (read/write)
and an earlier ECC event. I.e., "time til ECC event gives you a practical
way of measuring how "reliable" a particular block is GIVEN YOUR USAGE
PATTERNS.
This should be easy for you to test. Pick a block and run a bunch of
write-erase cycles on it. I.e., pre-wear THAT block so it is expected
to have less useful life than other blocks in the array. Now, apply
an algorithm and watch to see how well *it* performs compared to
other "unworn" blocks (accelerated wear testing). See which blocks
get marked as permanently bad, *first* using a "fair" algorithm.
Similarly, track the number of recycle events that you incur.
Then, tweek the algorithm to alter the recycled block selection choice
and see how the performance changes (do you get more use from the
array? fewer recycle events? more total life before blocks are
considered "bad"?)
>>> The advantage of the microcontroller is not the hardware it
>>> represents but the languages that can be used to instruct it. I see
>>> easily how moving a buffer from one point to another is a perfect
>>> task for an FPGA, dealing with low level protocol with the Flash
>>> Interface, but taking the decision about what to do on a failure is
>>> where the software has an added value and can be 'economical' and
>>> possibly more reliable given that the complexity it can handle is
>>> certainly superior (I doubt I'd be able to write a journalized flash
>>> file system with bad block management, wear leveling, scrubbing and
>>> garbage collection within an FPGA!).
>>
>> Exactly. As I said, hardware is great for parallel operations. But,
>> when the algorithm being implemented is thought of in a serial
>> fashion, it is often easier to resort to a sequencer -- a
>> microcontroller.
>> WHEN this happens, do that... UNLESS this other thing happens, in
>> which case...
>
> I've seen an interesting presentation
[snipped line length]
> on a memory controller implemented on an FPGA. It claims it fits in a
> 10% of a 3MGates equivalent FPGA (ProAsic3E A3PE3000), which seems a bit
> less than what I expected.
>
> It would be interesting to see how much resources would take a small
> embedded microcontroller which would implement the FTL in a software
> stack.
The difference that I am highlighting is the range of algorithms that are
EFFECTIVELY available to each type of implementation (software FTL
vs. dedicated hardware). Duality ensures you can implement any in
*each* approach. But, getting it *correct* is a different matter...