EmbeddedRelated.com
Forums

Multithreaded disk access

Started by Don Y October 15, 2021
As a *rough* figure, what would you expect the bandwidth of
a disk drive (spinning rust) to do as a function of number of
discrete files being accessed, concurrently?

E.g., if you can monitor the rough throughput of each
stream and sum them, will they sum to 100% of the drive's
bandwidth?  90%?  110?  etc.

[Note that drives have read-ahead and write caches so
the speed of the media might not bleed through to the
application layer.  And, filesystem code also throws
a wrench in the works.  Assume caching in the system
is disabled/ineffective.]

Said another way, what's a reasonably reliable way of
determining when you are I/O bound by the hardware
and when more threads won't result in more performance?
On 10/15/21 10:08 AM, Don Y wrote:
> As a *rough* figure, what would you expect the bandwidth of > a disk drive (spinning rust) to do as a function of number of > discrete files being accessed, concurrently? > > E.g., if you can monitor the rough throughput of each > stream and sum them, will they sum to 100% of the drive's > bandwidth?  90%?  110?  etc. > > [Note that drives have read-ahead and write caches so > the speed of the media might not bleed through to the > application layer.  And, filesystem code also throws > a wrench in the works.  Assume caching in the system > is disabled/ineffective.] > > Said another way, what's a reasonably reliable way of > determining when you are I/O bound by the hardware > and when more threads won't result in more performance?
You know that you can't actually get data off the media faster than the fundamental data rate of the media. As you mention, cache can give an apparent rate faster than the media, but you seem to be willing to assume that caching doesn't affect your rate, and each chunk will only be returned once. Pathological access patterns can reduce this rate dramatically, and worse case can result in rates of only a few percent of this factor if you force significant seeks between each sector read (and overload the buffering so it can't hold larger reads for a given stream). Non-Pathological access can often result in near 100% of the access rate. The best test of if you are I/O bound is if the I/O system is constantly in use, and every I/O request has another pending when it finishes, then you are totally I/O bound.
On 10/15/2021 17:08, Don Y wrote:
> As a *rough* figure, what would you expect the bandwidth of > a disk drive (spinning rust) to do as a function of number of > discrete files being accessed, concurrently? > > E.g., if you can monitor the rough throughput of each > stream and sum them, will they sum to 100% of the drive's > bandwidth?  90%?  110?  etc. > > [Note that drives have read-ahead and write caches so > the speed of the media might not bleed through to the > application layer.  And, filesystem code also throws > a wrench in the works.  Assume caching in the system > is disabled/ineffective.]
If caching is disabled things can get really bad quite quickly, think on updating directory entries to reflect modification/access dates, file sizes, scattering etc., think also allocation table accesses etc. E.g. in dps on a larger disk partition (say >100 gigabytes) the first CAT (cluster allocation table) access after boot takes some noticeable time, a second maybe; then it stops being noticeable at all as the CAT is updated rarely and on a modified area basis only (this on a a processor capable of 20 Mbytes/second) (dps needs the entire CAT to allocate new space in order to do its (enhanced) worst fit scheme). IOW if you torture the disk with constant seeks and scattered accesses you can slow it down from somewhat to a lot, depends on way too many factors to be worth wondering about.
> > Said another way, what's a reasonably reliable way of > determining when you are I/O bound by the hardware > and when more threads won't result in more performance?
Just try it out for some time and make your pick. Recently I did that dfs (distributed file system, over tcp) for dps and had to watch much of this going on, at some point you reach something between 50 and 100% of the hardware limit, depending on file sizes you copy and who knows what else overhead you can think of. Dimiter ====================================================== Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/
On 10/15/2021 8:38 AM, Richard Damon wrote:
> You know that you can't actually get data off the media faster than the > fundamental data rate of the media.
Yes, but you don't know that rate *and* that rate varies based on "where" you're accesses land on the physical medium (e.g., ZDR, shingled drives, etc.)
> As you mention, cache can give an apparent rate faster than the media, but you > seem to be willing to assume that caching doesn't affect your rate, and each > chunk will only be returned once.
Cache in the filesystem code will be counterproductive. Cache in the drive may be a win for some accesses and a loss for others (e.g., if the drive read ahead thinking the next read was going to be sequential with the last -- and that proves to be wrong -- the drive may have missed an opportunity to respond more quickly to the ACTUAL access that follows). [I'm avoiding talking about reads AND writes just to keep the discussion complexity manageable -- to avoid having to introduce caveats with every statement]
> Pathological access patterns can reduce this rate dramatically, and worse case > can result in rates of only a few percent of this factor if you force > significant seeks between each sector read (and overload the buffering so it > can't hold larger reads for a given stream).
Exactly. But, you don't necessarily know where your next access will take you. This variation in throughput is what makes defining "i/o bound" tricky; if the access patterns at some instant (instant being a period over which you base your decision) make the drive look slow, then you would opt NOT to spawn a new thread to take advantage of excess throughput. Similarly, if the drive "looks" serendipitously fast, you may spawn another thread and its accesses will eventually conflict with those of the first thread to lower overall throughput.
> Non-Pathological access can often result in near 100% of the access rate. > > The best test of if you are I/O bound is if the I/O system is constantly in > use, and every I/O request has another pending when it finishes, then you are > totally I/O bound.
But, if you make that assessment when the access pattern is "unfortunate", you erroneously conclude the disk is at its capacity. And, vice versa. Without control over the access patterns, it seems like there is no reliable strategy for determining when another thread can be advantageous (?)
On 10/15/2021 8:46 AM, Dimiter_Popoff wrote:
> On 10/15/2021 17:08, Don Y wrote: > > If caching is disabled things can get really bad quite quickly, > think on updating directory entries to reflect modification/access > dates, file sizes, scattering etc., think also allocation > table accesses etc.
My point re: filesystem cache (not the on-board disk cache) was that the user objects accessed will only be visited once. So, no value to caching *them* in the filesystem's buffers.
> E.g. in dps on a larger disk partition > (say >100 gigabytes) the first CAT (cluster allocation table) > access after boot takes some noticeable time, a second maybe; > then it stops being noticeable at all as the CAT is updated > rarely and on a modified area basis only (this on a a processor > capable of 20 Mbytes/second) (dps needs the entire CAT to allocate > new space in order to do its (enhanced) worst fit scheme). > IOW if you torture the disk with constant seeks and scattered > accesses you can slow it down from somewhat to a lot, depends > on way too many factors to be worth wondering about.
I'm trying NOT to be aware of any particulars of the specific filesystem *type* (FAT*, NTFS, *BSD, etc.) and make decisions just from high level observations of disk performance.
>> Said another way, what's a reasonably reliable way of >> determining when you are I/O bound by the hardware >> and when more threads won't result in more performance? > > Just try it out for some time and make your pick. Recently > I did that dfs (distributed file system, over tcp) for dps > and had to watch much of this going on, at some point you > reach something between 50 and 100% of the hardware limit, > depending on file sizes you copy and who knows what else > overhead you can think of.
I think the cost of any extra complexity in the algorithm (to dynamically try to optimize number of threads) is hard to justify -- given no control over the actual media. I.e., it seems like it's best to just aim for "simple" and live with whatever throughput you get...
On 10/15/2021 19:19, Don Y wrote:
> On 10/15/2021 8:46 AM, Dimiter_Popoff wrote: > .... >> Just try it out for some time and make your pick. Recently >> I did that dfs (distributed file system, over tcp) for dps >> and had to watch much of this going on, at some point you >> reach something between 50 and 100% of the hardware limit, >> depending on file sizes you copy and who knows what else >> overhead you can think of. > > I think the cost of any extra complexity in the algorithm > (to dynamically try to optimize number of threads) is > hard to justify -- given no control over the actual > media.  I.e., it seems like it's best to just aim for > "simple" and live with whatever throughput you get...
I meant going the simplest way, not adding algorithms. Just leave it for now and have a few systems running, look at what is going on and pick some sane figure, perhaps try it out either way before you settle.
On 10/15/21 12:00 PM, Don Y wrote:
> On 10/15/2021 8:38 AM, Richard Damon wrote: >> You know that you can't actually get data off the media faster than >> the fundamental data rate of the media. > > Yes, but you don't know that rate *and* that rate varies based on > "where" you're accesses land on the physical medium (e.g., ZDR, > shingled drives, etc.)
But all of these still have a 'maximum' rate, so you can still define a maximum. It does say that the 'expected' rate you can get gets more variable.
> >> As you mention, cache can give an apparent rate faster than the media, >> but you seem to be willing to assume that caching doesn't affect your >> rate, and each chunk will only be returned once. > > Cache in the filesystem code will be counterproductive.  Cache in > the drive may be a win for some accesses and a loss for others > (e.g., if the drive read ahead thinking the next read was going to > be sequential with the last -- and that proves to be wrong -- the > drive may have missed an opportunity to respond more quickly to the > ACTUAL access that follows). > > [I'm avoiding talking about reads AND writes just to keep the > discussion complexity manageable -- to avoid having to introduce > caveats with every statement] >
Yes, the drive might try to read ahead and hurt itself, or it might not. That is mostly out of your control.
>> Pathological access patterns can reduce this rate dramatically, and >> worse case can result in rates of only a few percent of this factor if >> you force significant seeks between each sector read (and overload the >> buffering so it can't hold larger reads for a given stream). > > Exactly.  But, you don't necessarily know where your next access will > take you.  This variation in throughput is what makes defining > "i/o bound" tricky;  if the access patterns at some instant (instant > being a period over which you base your decision) make the drive > look slow, then you would opt NOT to spawn a new thread to take > advantage of excess throughput.  Similarly, if the drive "looks" > serendipitously fast, you may spawn another thread and its > accesses will eventually conflict with those of the first thread > to lower overall throughput.
> >> Non-Pathological access can often result in near 100% of the access rate. >> >> The best test of if you are I/O bound is if the I/O system is >> constantly in use, and every I/O request has another pending when it >> finishes, then you are totally I/O bound. > > But, if you make that assessment when the access pattern is "unfortunate", > you erroneously conclude the disk is at its capacity.  And, vice versa. > > Without control over the access patterns, it seems like there is no > reliable strategy for determining when another thread can be > advantageous (?)
Yes, adding more threads might change the access pattern. it will TEND to make the pattern less sequential, and thus more towards that pathological case (and thus more threads actually decrease the rate you can do I/O and thus slow down your I//O bound rate). It is possible that it just happens to be fortunate to make things more sequential, if the system can see that one thread wants sector N and another wants sector N+1, something can schedule the reads together and drop a seek. Predicting that sort of behavior can't be done 'in the abstract'. You need to think about the details of the system. As a general principle, if the I/O system is saturated, the job is I/O bound. Adding more threads will only help if you have the resources to queue up more requests and can optimize the order of servicing them to be more efficient with I/O. Predicting that means you need to know and have some control over the access pattern. Note, part of this is being able to trade memory to improve I/O speed. If you know that EVENTUALLY you will want the next sector after the one you are reading, reading that now and caching it will be a win, but only if you will be able to use that data before you need to claim that memory for other uses. This sort of improvement really does require knowing details you want to try to assume you don't want to know, so you are limiting your ability to make accurate decisions.
On 10/15/2021 9:48 AM, Richard Damon wrote:
> On 10/15/21 12:00 PM, Don Y wrote: >> On 10/15/2021 8:38 AM, Richard Damon wrote: >>> You know that you can't actually get data off the media faster than the >>> fundamental data rate of the media. >> >> Yes, but you don't know that rate *and* that rate varies based on >> "where" you're accesses land on the physical medium (e.g., ZDR, >> shingled drives, etc.) > > But all of these still have a 'maximum' rate, so you can still define a > maximum. It does say that the 'expected' rate you can get gets more variable.
But only if you have control over the hardware. How long will a "backup" take on your PC. Today? Tomorrow? Last week? If you removed the disk and put it in another PC, how would those figures change? If you restore (using file access and not sector access), and then backup again, how will the numbers change?
>>> As you mention, cache can give an apparent rate faster than the media, but >>> you seem to be willing to assume that caching doesn't affect your rate, and >>> each chunk will only be returned once. >> >> Cache in the filesystem code will be counterproductive. Cache in >> the drive may be a win for some accesses and a loss for others >> (e.g., if the drive read ahead thinking the next read was going to >> be sequential with the last -- and that proves to be wrong -- the >> drive may have missed an opportunity to respond more quickly to the >> ACTUAL access that follows). >> >> [I'm avoiding talking about reads AND writes just to keep the >> discussion complexity manageable -- to avoid having to introduce >> caveats with every statement] > > Yes, the drive might try to read ahead and hurt itself, or it might not. That > is mostly out of your control.
Exactly. So, I can't do anything other than OBSERVE the performance I am getting.
>>> Non-Pathological access can often result in near 100% of the access rate. >>> >>> The best test of if you are I/O bound is if the I/O system is constantly in >>> use, and every I/O request has another pending when it finishes, then you >>> are totally I/O bound. >> >> But, if you make that assessment when the access pattern is "unfortunate", >> you erroneously conclude the disk is at its capacity. And, vice versa. >> >> Without control over the access patterns, it seems like there is no >> reliable strategy for determining when another thread can be >> advantageous (?) > > Yes, adding more threads might change the access pattern. it will TEND to make > the pattern less sequential, and thus more towards that pathological case (and > thus more threads actually decrease the rate you can do I/O and thus slow down > your I//O bound rate). It is possible that it just happens to be fortunate to > make things more sequential, if the system can see that one thread wants sector > N and another wants sector N+1, something can schedule the reads together and > drop a seek.
The point of additional threads is that another thread can schedule the next access while the processor is busy processing the previous one. So, the I/O is always kept busy instead of letting it idle between accesses.
> Predicting that sort of behavior can't be done 'in the abstract'. You need to > think about the details of the system. > > As a general principle, if the I/O system is saturated, the job is I/O bound.
The goal is to *ensure* the I/O system is completely saturated.
> Adding more threads will only help if you have the resources to queue up more > requests and can optimize the order of servicing them to be more efficient with
Ordering them is an optimization that requires knowledge of how they will interact *in* the drive. However, simply having ANOTHER request ready as soon as the previous one is completed (neglecting the potential for the drive to queue requests internally) is an enhancement to throughput.
> I/O. Predicting that means you need to know and have some control over the > access pattern. > > Note, part of this is being able to trade memory to improve I/O speed. If you > know that EVENTUALLY you will want the next sector after the one you are > reading, reading that now and caching it will be a win, but only if you will be > able to use that data before you need to claim that memory for other uses. This > sort of improvement really does require knowing details you want to try to > assume you don't want to know, so you are limiting your ability to make > accurate decisions.
Moving the code to another platform (something that the user can do in a heartbeat) will invalidate any assumptions I have made about the performance on my original platform. Hence the desire to have the *code* sort out what it *can* do to increase performance by observing its actual performance on TODAY'S actual hardware.
Don Y <blockedofcourse@foo.invalid> wrote:
> As a *rough* figure, what would you expect the bandwidth of > a disk drive (spinning rust) to do as a function of number of > discrete files being accessed, concurrently? > > E.g., if you can monitor the rough throughput of each > stream and sum them, will they sum to 100% of the drive's > bandwidth? 90%? 110? etc. > > [Note that drives have read-ahead and write caches so > the speed of the media might not bleed through to the > application layer. And, filesystem code also throws > a wrench in the works. Assume caching in the system > is disabled/ineffective.] > > Said another way, what's a reasonably reliable way of > determining when you are I/O bound by the hardware > and when more threads won't result in more performance?
Roughly speaking a drive spinning at 7500 rpm divided by 60 Is 125 revolutions a second and a seek takes half a revolution and the next file is another half a revolution away on average, which gets you 125 files a second roughly speaking depending on the performance of the drive if my numbers are not too far off. This is plenty to support a dozen Windows VM&rsquo;s on average if it were not for Windows updates that saturate the disks with hundreds of little file updates at once, causing Microsoft SQL timeouts for the VM&rsquo;s.
On 10/17/2021 1:27 PM, Brett wrote:
> Don Y <blockedofcourse@foo.invalid> wrote: >> As a *rough* figure, what would you expect the bandwidth of >> a disk drive (spinning rust) to do as a function of number of >> discrete files being accessed, concurrently? >> >> E.g., if you can monitor the rough throughput of each >> stream and sum them, will they sum to 100% of the drive's >> bandwidth? 90%? 110? etc. >> >> [Note that drives have read-ahead and write caches so >> the speed of the media might not bleed through to the >> application layer. And, filesystem code also throws >> a wrench in the works. Assume caching in the system >> is disabled/ineffective.] >> >> Said another way, what's a reasonably reliable way of >> determining when you are I/O bound by the hardware >> and when more threads won't result in more performance? > > Roughly speaking a drive spinning at 7500 rpm divided by 60 Is 125 > revolutions a second and a seek takes half a revolution and the next file > is another half a revolution away on average, which gets you 125 files a > second roughly speaking depending on the performance of the drive if my > numbers are not too far off.
You're assuming files are laid out contiguously -- that no seeks are needed "between sectors". You're also assuming moving to another track (seek time) is instantaneous (or, within the half-cylinder rotational delay). For a 7200 rpm (some are as slow as 5400, some as fast as 15K) drive, AVERAGE rotational delay is 8.3+ ms/2 = ~4ms. But, seek time can be 10, 15, + ms. (on my enterprise drives, its 4; but, average rotational delay is 2.) And, if the desired sector lies on a "distant" cylinder, you can scale that almost linearly. I.e., looking at the disk's specs is largely useless unless you know how the data on it is laid out. The only way to know that is to *look* at it. But, looking at part of the data doesn't mean you can extrapolate to ALL of the data. So, I'm back to my assumption that you can't really alter your approach -- with any degree of predictable success -- before hand. E.g., I can keep spawning threads until I find them queuing (more than one deep) on the disk driver. But, even then, a moment from now, the backlog can clear. Or, it can get worse (which means I've wasted the resources that the threads consume AND added complexity to the algorithm with no direct benefit) Below, you're also assuming Windows. And, for writes, shingled drives throw all of that down the toilet.
> This is plenty to support a dozen Windows VM&rsquo;s on average if it were not > for Windows updates that saturate the disks with hundreds of little file > updates at once, causing Microsoft SQL timeouts for the VM&rsquo;s.