Multithreaded disk access

As a *rough* figure, what would you expect the bandwidth of
a disk drive (spinning rust) to do as a function of number of
discrete files being accessed, concurrently?

E.g., if you can monitor the rough throughput of each
stream and sum them, will they sum to 100% of the drive's
bandwidth?  90%?  110?  etc.

[Note that drives have read-ahead and write caches so
the speed of the media might not bleed through to the
application layer.  And, filesystem code also throws
a wrench in the works.  Assume caching in the system
is disabled/ineffective.]

Said another way, what's a reasonably reliable way of
determining when you are I/O bound by the hardware
and when more threads won't result in more performance?

Reply by Richard Damon ●October 15, 20212021-10-15

On 10/15/21 10:08 AM, Don Y wrote:
> As a *rough* figure, what would you expect the bandwidth of
> a disk drive (spinning rust) to do as a function of number of
> discrete files being accessed, concurrently?
> 
> E.g., if you can monitor the rough throughput of each
> stream and sum them, will they sum to 100% of the drive's
> bandwidth?&nbsp; 90%?&nbsp; 110?&nbsp; etc.
> 
> [Note that drives have read-ahead and write caches so
> the speed of the media might not bleed through to the
> application layer.&nbsp; And, filesystem code also throws
> a wrench in the works.&nbsp; Assume caching in the system
> is disabled/ineffective.]
> 
> Said another way, what's a reasonably reliable way of
> determining when you are I/O bound by the hardware
> and when more threads won't result in more performance?

You know that you can't actually get data off the media faster than the 
fundamental data rate of the media.

As you mention, cache can give an apparent rate faster than the media, 
but you seem to be willing to assume that caching doesn't affect your 
rate, and each chunk will only be returned once.

Pathological access patterns can reduce this rate dramatically, and 
worse case can result in rates of only a few percent of this factor if 
you force significant seeks between each sector read (and overload the 
buffering so it can't hold larger reads for a given stream).

Non-Pathological access can often result in near 100% of the access rate.

The best test of if you are I/O bound is if the I/O system is constantly 
in use, and every I/O request has another pending when it finishes, then 
you are totally I/O bound.

Reply by Dimiter_Popoff ●October 15, 20212021-10-15

On 10/15/2021 17:08, Don Y wrote:
> As a *rough* figure, what would you expect the bandwidth of
> a disk drive (spinning rust) to do as a function of number of
> discrete files being accessed, concurrently?
> 
> E.g., if you can monitor the rough throughput of each
> stream and sum them, will they sum to 100% of the drive's
> bandwidth?&nbsp; 90%?&nbsp; 110?&nbsp; etc.
> 
> [Note that drives have read-ahead and write caches so
> the speed of the media might not bleed through to the
> application layer.&nbsp; And, filesystem code also throws
> a wrench in the works.&nbsp; Assume caching in the system
> is disabled/ineffective.]

If caching is disabled things can get really bad quite quickly,
think on updating directory entries to reflect modification/access
dates, file sizes, scattering etc., think also allocation
table accesses etc. E.g. in dps on a larger disk partition
(say >100 gigabytes) the first CAT (cluster allocation table)
access after boot takes some noticeable time, a second maybe;
then it stops being noticeable at all as the CAT is updated
rarely and on a modified area basis only (this on a a processor
capable of 20 Mbytes/second) (dps needs the entire CAT to allocate
new space in order to do its (enhanced) worst fit scheme).
IOW if you torture the disk with constant seeks and scattered
accesses you can slow it down from somewhat to a lot, depends
on way too many factors to be worth wondering about.

> 
> Said another way, what's a reasonably reliable way of
> determining when you are I/O bound by the hardware
> and when more threads won't result in more performance?

Just try it out for some time and make your pick. Recently
I did that dfs (distributed file system, over tcp) for dps
and had to watch much of this going on, at some point you
reach something between 50 and 100% of the hardware limit,
depending on file sizes you copy and who knows what else
overhead you can think of.

Dimiter

======================================================
Dimiter Popoff, TGI             http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/

Reply by Don Y ●October 15, 20212021-10-15

On 10/15/2021 8:38 AM, Richard Damon wrote:
> You know that you can't actually get data off the media faster than the 
> fundamental data rate of the media.

Yes, but you don't know that rate *and* that rate varies based on
"where" you're accesses land on the physical medium (e.g., ZDR,
shingled drives, etc.)

> As you mention, cache can give an apparent rate faster than the media, but you 
> seem to be willing to assume that caching doesn't affect your rate, and each 
> chunk will only be returned once.

Cache in the filesystem code will be counterproductive.  Cache in
the drive may be a win for some accesses and a loss for others
(e.g., if the drive read ahead thinking the next read was going to
be sequential with the last -- and that proves to be wrong -- the
drive may have missed an opportunity to respond more quickly to the
ACTUAL access that follows).

[I'm avoiding talking about reads AND writes just to keep the
discussion complexity manageable -- to avoid having to introduce
caveats with every statement]

> Pathological access patterns can reduce this rate dramatically, and worse case 
> can result in rates of only a few percent of this factor if you force 
> significant seeks between each sector read (and overload the buffering so it 
> can't hold larger reads for a given stream).

Exactly.  But, you don't necessarily know where your next access will
take you.  This variation in throughput is what makes defining
"i/o bound" tricky;  if the access patterns at some instant (instant
being a period over which you base your decision) make the drive
look slow, then you would opt NOT to spawn a new thread to take
advantage of excess throughput.  Similarly, if the drive "looks"
serendipitously fast, you may spawn another thread and its
accesses will eventually conflict with those of the first thread
to lower overall throughput.

> Non-Pathological access can often result in near 100% of the access rate.
> 
> The best test of if you are I/O bound is if the I/O system is constantly in 
> use, and every I/O request has another pending when it finishes, then you are 
> totally I/O bound.

But, if you make that assessment when the access pattern is "unfortunate",
you erroneously conclude the disk is at its capacity.  And, vice versa.

Without control over the access patterns, it seems like there is no
reliable strategy for determining when another thread can be
advantageous (?)

Reply by Don Y ●October 15, 20212021-10-15

On 10/15/2021 8:46 AM, Dimiter_Popoff wrote:
> On 10/15/2021 17:08, Don Y wrote:
> 
> If caching is disabled things can get really bad quite quickly,
> think on updating directory entries to reflect modification/access
> dates, file sizes, scattering etc., think also allocation
> table accesses etc.

My point re: filesystem cache (not the on-board disk cache) was that
the user objects accessed will only be visited once.  So, no value
to caching *them* in the filesystem's buffers.

> E.g. in dps on a larger disk partition
> (say >100 gigabytes) the first CAT (cluster allocation table)
> access after boot takes some noticeable time, a second maybe;
> then it stops being noticeable at all as the CAT is updated
> rarely and on a modified area basis only (this on a a processor
> capable of 20 Mbytes/second) (dps needs the entire CAT to allocate
> new space in order to do its (enhanced) worst fit scheme).
> IOW if you torture the disk with constant seeks and scattered
> accesses you can slow it down from somewhat to a lot, depends
> on way too many factors to be worth wondering about.

I'm trying NOT to be aware of any particulars of the specific
filesystem *type* (FAT*, NTFS, *BSD, etc.) and make decisions
just from high level observations of disk performance.

>> Said another way, what's a reasonably reliable way of
>> determining when you are I/O bound by the hardware
>> and when more threads won't result in more performance?
> 
> Just try it out for some time and make your pick. Recently
> I did that dfs (distributed file system, over tcp) for dps
> and had to watch much of this going on, at some point you
> reach something between 50 and 100% of the hardware limit,
> depending on file sizes you copy and who knows what else
> overhead you can think of.

I think the cost of any extra complexity in the algorithm
(to dynamically try to optimize number of threads) is
hard to justify -- given no control over the actual
media.  I.e., it seems like it's best to just aim for
"simple" and live with whatever throughput you get...

Reply by Dimiter_Popoff ●October 15, 20212021-10-15

On 10/15/2021 19:19, Don Y wrote:
> On 10/15/2021 8:46 AM, Dimiter_Popoff wrote:
> ....
>> Just try it out for some time and make your pick. Recently
>> I did that dfs (distributed file system, over tcp) for dps
>> and had to watch much of this going on, at some point you
>> reach something between 50 and 100% of the hardware limit,
>> depending on file sizes you copy and who knows what else
>> overhead you can think of.
> 
> I think the cost of any extra complexity in the algorithm
> (to dynamically try to optimize number of threads) is
> hard to justify -- given no control over the actual
> media.&nbsp; I.e., it seems like it's best to just aim for
> "simple" and live with whatever throughput you get...

I meant going the simplest way, not adding algorithms.
Just leave it for now and have a few systems running,
look at what is going on and pick some sane figure,
perhaps try it out either way before you settle.

Reply by Richard Damon ●October 15, 20212021-10-15

On 10/15/21 12:00 PM, Don Y wrote:
> On 10/15/2021 8:38 AM, Richard Damon wrote:
>> You know that you can't actually get data off the media faster than 
>> the fundamental data rate of the media.
> 
> Yes, but you don't know that rate *and* that rate varies based on
> "where" you're accesses land on the physical medium (e.g., ZDR,
> shingled drives, etc.)

But all of these still have a 'maximum' rate, so you can still define a 
maximum. It does say that the 'expected' rate you can get gets more 
variable.

> 
>> As you mention, cache can give an apparent rate faster than the media, 
>> but you seem to be willing to assume that caching doesn't affect your 
>> rate, and each chunk will only be returned once.
> 
> Cache in the filesystem code will be counterproductive.&nbsp; Cache in
> the drive may be a win for some accesses and a loss for others
> (e.g., if the drive read ahead thinking the next read was going to
> be sequential with the last -- and that proves to be wrong -- the
> drive may have missed an opportunity to respond more quickly to the
> ACTUAL access that follows).
> 
> [I'm avoiding talking about reads AND writes just to keep the
> discussion complexity manageable -- to avoid having to introduce
> caveats with every statement]
> 

Yes, the drive might try to read ahead and hurt itself, or it might not. 
That is mostly out of your control.

>> Pathological access patterns can reduce this rate dramatically, and 
>> worse case can result in rates of only a few percent of this factor if 
>> you force significant seeks between each sector read (and overload the 
>> buffering so it can't hold larger reads for a given stream).
> 
> Exactly.&nbsp; But, you don't necessarily know where your next access will
> take you.&nbsp; This variation in throughput is what makes defining
> "i/o bound" tricky;&nbsp; if the access patterns at some instant (instant
> being a period over which you base your decision) make the drive
> look slow, then you would opt NOT to spawn a new thread to take
> advantage of excess throughput.&nbsp; Similarly, if the drive "looks"
> serendipitously fast, you may spawn another thread and its
> accesses will eventually conflict with those of the first thread
> to lower overall throughput.

> 
>> Non-Pathological access can often result in near 100% of the access rate.
>>
>> The best test of if you are I/O bound is if the I/O system is 
>> constantly in use, and every I/O request has another pending when it 
>> finishes, then you are totally I/O bound.
> 
> But, if you make that assessment when the access pattern is "unfortunate",
> you erroneously conclude the disk is at its capacity.&nbsp; And, vice versa.
> 
> Without control over the access patterns, it seems like there is no
> reliable strategy for determining when another thread can be
> advantageous (?)

Yes, adding more threads might change the access pattern. it will TEND 
to make the pattern less sequential, and thus more towards that 
pathological case (and thus more threads actually decrease the rate you 
can do I/O and thus slow down your I//O bound rate). It is possible that 
it just happens to be fortunate to make things more sequential, if the 
system can see that one thread wants sector N and another wants sector 
N+1, something can schedule the reads together and drop a seek.

Predicting that sort of behavior can't be done 'in the abstract'. You 
need to think about the details of the system.

As a general principle, if the I/O system is saturated, the job is I/O 
bound. Adding more threads will only help if you have the resources to 
queue up more requests and can optimize the order of servicing them to 
be more efficient with I/O. Predicting that means you need to know and 
have some control over the access pattern.

Note, part of this is being able to trade memory to improve I/O speed. 
If you know that EVENTUALLY you will want the next sector after the one 
you are reading, reading that now and caching it will be a win, but only 
if you will be able to use that data before you need to claim that 
memory for other uses. This sort of improvement really does require 
knowing details you want to try to assume you don't want to know, so you 
are limiting your ability to make accurate decisions.

Reply by Don Y ●October 15, 20212021-10-15

On 10/15/2021 9:48 AM, Richard Damon wrote:
> On 10/15/21 12:00 PM, Don Y wrote:
>> On 10/15/2021 8:38 AM, Richard Damon wrote:
>>> You know that you can't actually get data off the media faster than the 
>>> fundamental data rate of the media.
>>
>> Yes, but you don't know that rate *and* that rate varies based on
>> "where" you're accesses land on the physical medium (e.g., ZDR,
>> shingled drives, etc.)
> 
> But all of these still have a 'maximum' rate, so you can still define a 
> maximum. It does say that the 'expected' rate you can get gets more variable.

But only if you have control over the hardware.

How long will a "backup" take on your PC.  Today?
Tomorrow?  Last week?

If you removed the disk and put it in another PC,
how would those figures change?

If you restore (using file access and not sector access),
and then backup again, how will the numbers change?

>>> As you mention, cache can give an apparent rate faster than the media, but 
>>> you seem to be willing to assume that caching doesn't affect your rate, and 
>>> each chunk will only be returned once.
>>
>> Cache in the filesystem code will be counterproductive.  Cache in
>> the drive may be a win for some accesses and a loss for others
>> (e.g., if the drive read ahead thinking the next read was going to
>> be sequential with the last -- and that proves to be wrong -- the
>> drive may have missed an opportunity to respond more quickly to the
>> ACTUAL access that follows).
>>
>> [I'm avoiding talking about reads AND writes just to keep the
>> discussion complexity manageable -- to avoid having to introduce
>> caveats with every statement]
> 
> Yes, the drive might try to read ahead and hurt itself, or it might not. That 
> is mostly out of your control.

Exactly.  So, I can't do anything other than OBSERVE the performance
I am getting.

>>> Non-Pathological access can often result in near 100% of the access rate.
>>>
>>> The best test of if you are I/O bound is if the I/O system is constantly in 
>>> use, and every I/O request has another pending when it finishes, then you 
>>> are totally I/O bound.
>>
>> But, if you make that assessment when the access pattern is "unfortunate",
>> you erroneously conclude the disk is at its capacity.  And, vice versa.
>>
>> Without control over the access patterns, it seems like there is no
>> reliable strategy for determining when another thread can be
>> advantageous (?)
> 
> Yes, adding more threads might change the access pattern. it will TEND to make 
> the pattern less sequential, and thus more towards that pathological case (and 
> thus more threads actually decrease the rate you can do I/O and thus slow down 
> your I//O bound rate). It is possible that it just happens to be fortunate to 
> make things more sequential, if the system can see that one thread wants sector 
> N and another wants sector N+1, something can schedule the reads together and 
> drop a seek.

The point of additional threads is that another thread can schedule
the next access while the processor is busy processing the previous
one.  So, the I/O is always kept busy instead of letting it idle
between accesses.

> Predicting that sort of behavior can't be done 'in the abstract'. You need to 
> think about the details of the system.
> 
> As a general principle, if the I/O system is saturated, the job is I/O bound. 

The goal is to *ensure* the I/O system is completely saturated.

> Adding more threads will only help if you have the resources to queue up more 
> requests and can optimize the order of servicing them to be more efficient with 

Ordering them is an optimization that requires knowledge of how they
will interact *in* the drive.  However, simply having ANOTHER request
ready as soon as the previous one is completed (neglecting the
potential for the drive to queue requests internally) is an
enhancement to throughput.

> I/O. Predicting that means you need to know and have some control over the 
> access pattern.
> 
> Note, part of this is being able to trade memory to improve I/O speed. If you 
> know that EVENTUALLY you will want the next sector after the one you are 
> reading, reading that now and caching it will be a win, but only if you will be 
> able to use that data before you need to claim that memory for other uses. This 
> sort of improvement really does require knowing details you want to try to 
> assume you don't want to know, so you are limiting your ability to make 
> accurate decisions.

Moving the code to another platform (something that the user can do
in a heartbeat) will invalidate any assumptions I have made about the
performance on my original platform.  Hence the desire to have the
*code* sort out what it *can* do to increase performance by observing
its actual performance on TODAY'S actual hardware.

Reply by Brett ●October 17, 20212021-10-17

Don Y <blockedofcourse@foo.invalid> wrote:
> As a *rough* figure, what would you expect the bandwidth of
> a disk drive (spinning rust) to do as a function of number of
> discrete files being accessed, concurrently?
> 
> E.g., if you can monitor the rough throughput of each
> stream and sum them, will they sum to 100% of the drive's
> bandwidth?  90%?  110?  etc.
> 
> [Note that drives have read-ahead and write caches so
> the speed of the media might not bleed through to the
> application layer.  And, filesystem code also throws
> a wrench in the works.  Assume caching in the system
> is disabled/ineffective.]
> 
> Said another way, what's a reasonably reliable way of
> determining when you are I/O bound by the hardware
> and when more threads won't result in more performance?

Roughly speaking a drive spinning at 7500 rpm divided by 60 Is 125
revolutions a second and a seek takes half a revolution and the next file
is another half a revolution away on average, which gets you 125 files a
second roughly speaking depending on the performance of the drive if my
numbers are not too far off.

This is plenty to support a dozen Windows VM&rsquo;s on average if it were not
for Windows updates that saturate the disks with hundreds of little file
updates at once, causing Microsoft SQL timeouts for the VM&rsquo;s.

Reply by Don Y ●October 17, 20212021-10-17

On 10/17/2021 1:27 PM, Brett wrote:
> Don Y <blockedofcourse@foo.invalid> wrote:
>> As a *rough* figure, what would you expect the bandwidth of
>> a disk drive (spinning rust) to do as a function of number of
>> discrete files being accessed, concurrently?
>>
>> E.g., if you can monitor the rough throughput of each
>> stream and sum them, will they sum to 100% of the drive's
>> bandwidth?  90%?  110?  etc.
>>
>> [Note that drives have read-ahead and write caches so
>> the speed of the media might not bleed through to the
>> application layer.  And, filesystem code also throws
>> a wrench in the works.  Assume caching in the system
>> is disabled/ineffective.]
>>
>> Said another way, what's a reasonably reliable way of
>> determining when you are I/O bound by the hardware
>> and when more threads won't result in more performance?
> 
> Roughly speaking a drive spinning at 7500 rpm divided by 60 Is 125
> revolutions a second and a seek takes half a revolution and the next file
> is another half a revolution away on average, which gets you 125 files a
> second roughly speaking depending on the performance of the drive if my
> numbers are not too far off.

You're assuming files are laid out contiguously -- that no seeks are needed
"between sectors".

You're also assuming moving to another track (seek time) is instantaneous
(or, within the half-cylinder rotational delay).

For a 7200 rpm (some are as slow as 5400, some as fast as 15K) drive,
AVERAGE rotational delay is 8.3+ ms/2 = ~4ms.

But, seek time can be 10, 15, + ms.  (on my enterprise drives, its 4; but,
average rotational delay is 2.)  And, if the desired sector lies on a
"distant" cylinder, you can scale that almost linearly.

I.e., looking at the disk's specs is largely useless unless you know how
the data on it is laid out.  The only way to know that is to *look* at it.

But, looking at part of the data doesn't mean you can extrapolate to ALL
of the data.  So, I'm back to my assumption that you can't really alter
your approach -- with any degree of predictable success -- before hand.
E.g., I can keep spawning threads until I find them queuing (more than
one deep) on the disk driver.  But, even then, a moment from now, the
backlog can clear.  Or, it can get worse (which means I've wasted
the resources that the threads consume AND added complexity to the
algorithm with no direct benefit)

Below, you're also assuming Windows.

And, for writes, shingled drives throw all of that down the toilet.

> This is plenty to support a dozen Windows VM&rsquo;s on average if it were not
> for Windows updates that saturate the disks with hundreds of little file
> updates at once, causing Microsoft SQL timeouts for the VM&rsquo;s.

Previous12 3 Next

Multithreaded disk access

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group