On Tuesday, November 4, 2014 2:09:00 PM UTC+2, David Brown wrote:> On 04/11/14 12:43, Dimiter_Popoff wrote: > > On 04.11.2014 г. 12:08, David Brown wrote: > >> ...... > >>>> One is the type of filesystem. Some, such as FAT, have a very simple > >>>> structure that is prone to fragmentation. > >>> > >>> This is wrong. FAT has nothing to do with fragmentation. It is the > >>> allocation strategy the OS employs which is responsible. Nothing is > >>> stopping the OS to do worst fit on a FAT disk image (I have done it > >>> for example, when emulating FAT disks under DPS - the latter does > >>> enhanced worst fit so it was transferred automatically to FAT images > >>> as well). > >> > >> It is certainly possible to use FAT without fragmenting, as you have > >> done. As I said, filesystem type is only one aspect - filesystem > >> implementation and usage patterns are also important (I haven't said > >> anything about which factors are /most/ important, since the answer > >> varies too much). > > > > So explain how is FAT "prone to fragmentation" in a way different to any > > other system. > > > > > >> Please don't start telling me what I know or don't know - I would rather > >> not get into a fight again because you needlessly (and perhaps > >> unknowingly) write in provocative terms. > > > > So stick in your posts to things you do understand. > > I will go no further, let us first see to what length you will go > > to defend your "FAT being prone to fragmentation" because of its > > simplicity claim. > > > > Since you failed to read what I wrote in either post, but leapt at it as > a chance to attack me again, then I see little point in posting anything > more in reply to you.Ah, so you now know you were wrong. Some progress being made here, though you pretend to be offended. I did not aim to attack you at all, just pointed you to something wrong you were saying in an authoritative voice. You are doing yourself a disservice by posting ignorant claims then pretending to be offended when someone notices it. Me noticing is of no consequence to you but I am not the only reader of this group. No need to thank me... Dimiter
Disk imaging strategy
Started by ●November 2, 2014
Reply by ●November 4, 20142014-11-04
Reply by ●November 4, 20142014-11-04
On Tue, 4 Nov 2014 04:42:27 +0000 (UTC), glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:>In comp.arch.embedded Robert Wessel <robertwessel2@yahoo.com> wrote: > >(snip) > >> I've seen that before, unfortunately he's off on some of the details. >> First it's a 3390 module (an "HDA", which was actually two drives or >> "actuators"), so it's from no earlier than 1989 (not late 70s/early >> 80s), and it's not 10MB, it's about 1.89GB (assuming it's a model-1), >> or 3.78GB (if it's a double density model-2), and there were >> additional models of higher capacity later. It's also not worth $250K >> - you could buy an entire -B28 for $275K at the time, and that >> contained six double density modules (HDAs) of the type he >> disassembled. You'd usually buy a "string" of three units (one -Axx >> and two -Bxx units), for a total of 16 HDAs, which would set you back >> about $750K. So the value is more along the lines of $50K (assuming >> the enclosure is free). > >I think that there was also a 3390 model that ran at 1200PRM, instead >of the usual 3600RPM, and stores three logical tracks on one >physical track. Access time is longer, so ti doesn't always help.It's been the reduction in seek times over the years that's driven the increase in rotational rates. 2400RPM was a semi-standard for much of the sixties, when average seek times were 40+ms (at which point an average 12.5ms latency is not the major contributor to access time). In the seventies and eighties IBM (and others) went to 3600 RPM (8.4ms latency), as seek times slowly dropped (30ms average on the 3330 in 1973, to 17ms on the 3380 in 1983, and then to 9.5ms on the single density, and 4200RPM, 3390s in 1989). On the flip side the biggest contribution to seek time reduction since the 80s has been the reduction in disk diameter. Prior to that the 14 inch size was very common on large systems, the 3390s reduced that to 11 inches. Modern 15K RPM drives are usually 2.5 (or so) inch platters, even when they're in a nominal 3.5 inch chassis. 3390s are still the DASD of choice in MVS/zOS, but are now, of course, emulated on top of ordinary fixed-block disks, and the performance characteristics are quite different, and larger 3390 model volumes are not slower (although they may have more contention for a I/O path if parallel access volumes are not in use). So these days, most people use 3390-27s or -54s (which were never real devices, but the common names do indicate the relative capacities), unless they're using EAV, which allows more than 64K cylinders, but the 3390 track geometry is still maintained.
Reply by ●November 4, 20142014-11-04
Hi David, On 11/4/2014 1:29 AM, David Brown wrote:>>> I am thoroughly confused as to what you are trying to do here. >> >> Because I'm trying NOT to let the thread drift! :> > > When you post to comp.arch.embedded, thread drift is a risk - when you > cross-post to sci.electronics.design, thread drift is guaranteed or your > money back :-)Yeah, well... I can *try*! :>>>> On the one hand, you want a filesystem aware process so that you can >>> image only real files, not empty space (or leftovers in deleted space). >> >> That conclusion doesn't necessarily follow. What I want to do is not >> waste "image space" on "dead data" (deleted files) -- WITHOUT EXPLICITLY >> KNOWING WHAT IS DEAD (because I have no metadata from a "file system" to >> tell me what is live/dead) > > I think there are four ways to identify the data that you need to store: > > 1. Metadata to tell you what is live. 2. Metadata to tell you what is dead. > 3. Identify the live data from its contents. 4. Identify the dead data from > its contents. > > You have ruled out 1 and 2 by not allowing the imaging system to know > anything about the filesystem. I doubt that 3 is practical - certainly not > for a general purpose filesystem. That leaves number 4 - identify the dead > data from its contents (alternatively, fill it with highly compressible > zeros).Agreed. Though doing any "in-band" signaling opens the door to coincidentally encountering that same data in the LIVE data. You're trying to store a sector's worth of data *plus* one extra bit -- "This data is live" -- in a sector's worth of data (WITHOUT that extra bit!) In that case, you risk silently corrupting the restored image if you opt NOT to restore it (on the assumption that it is DEAD and, thus, "don't care"). I.e., the contents of that sector may no longer coincide with what they were -- yet restoring the metadata elsewhere on the medium will tell the recreated filesystem that those contents are actually "valid"/live! Note that you can use anything to mark the sector (when you "fill" it with dummy files) -- as long as you recognize that "pattern" as indicative of "this is one of the sectors that I filled -- and then deleted -- so is (probably) empty"> It's easy enough to zero out the whole disk before using it, which gives you > a start. Writing out a large file full of zeros, as you suggested before, > is the only general-purpose way to put zeros into the empty space. I gave > some disadvantages of that earlier, but it may be the only general solution > given the restraints you have given yourself.Doing this before "building" the system is counterproductive (it was my initial approach). Far easier/quicker to write a tight routine that just pumps zeroes (or any data) out than it is to try to co-operate with an OS that has been installed. But, the software installation process (esp windows) dirties far too much of the medium (temporary files, etc). So, you have to go back and fill "empty space" (defined by "wherever the OS lets you creat a new file!).> On Linux, an alternative technique would be to look at the "fstrim" command > - intercept the generated SATA TRIM commands and replace them with commands > to write zero blocks. I believe that should be safe, and it should let you > zero out all the dead data. You could even just store the fstrim output and > use that as a list of dead blocks, if you have some way of getting the > information to the imaging software - but be sure nothing more is written to > the filesystem after the fstrim.The appeal of the fs-agnostic approach is that it (should) work universally. And, "restores" would be the same effort regardless of platform ("imaging" would vary from platform to platform as file creation, etc. would have to operate *in* the host environ)>>> First, how many machines are we talking about? >> >> Personally, about 30 or 40 drives (e.g., some "machines" have multiple >> drives). Note that a "machine" does not have to be a PC. Nor a >> SPARCstation. etc. >> >> For my pro bono work, probably 200 - 400 yearly (but, that will hopefully >> only be 20-40 different "model numbers", 10 or more instances of each) >> >>> What sort of different systems varieties do you have in the OS's and >>> filesystems? >> >> Personally, three different flavors of Solaris, three Windows, three >> NetBSD, a couple of oddball "OTS" systems (Jaluna, Inferno, etc.) and >> probably a dozen different "appliance"/proprietary implementations >> (effectively black boxes). >> >> Pro bono is much easier. They'll either be PC's or Macs. But, their OS's >> will largely be defined by whatever happens to *run* on that particular >> hardware (donations may be of various "ages"). I'm guessing three >> different Windows (though within that, there can be minor variations like >> Home, Pro, Business, etc. editions -- possibly even on the same make/model >> hardware). Probably two different OSX versions (??). > > This suggests two different setups to me. One should be a clonezilla server > system that will work for the windows systems (and also, I think Macs and > "simple" Linux or BSD setups). This will cover the bulk of your systems, > especially all of your pro bono stuff. The Windows systems will all be > identified automatically, and since the filesystem is known, there is no > problem imaging just the live data.The pro bono systems have to be recoverable without my involvement. And, as I mentioned, expecting a (homeless) student to keep track of a "restore DVD" (or three) is an invitation to ongoing involvement ("Can't you make me a new DVD?" "Yeah, and how long before you misplace THIS one?") Jaded?? :> So, I would have to install Clonezilla on the "recovery/restore partition" on each of these machines. Then, dumb it down so it was a turn-key operation for them. So, you put this big OS in place that tries to bring up all the hardware in the machine (instead of just the disk), then take away all of that functionality -- just to be fs-aware... The laptops that I have in front of me have four (MBR) partitions. At least two of which are not recognized by Clonezilla (CZ). So, it just does the dd | gzip trick when commanded to image/restore those. [I'd also have to script CZ so that it would restore all three of the "non-recovery" partitions.] And, I'm not sure how the UEFI (secure boot issue) will come to bite CZ *and* "my proposed solution" when W8 machines start appearing in donations! (a couple of years, tops?) As for the machines that I have here, the SPARCs are problematic for CZ. And, the (x86) *BSD machines have a different disklabel(8) approach which doesn't expose them via the MBR. I.e., I can't even image (ALL) my "computers", here, with CZ! [that doesn't even address the appliances!]> Then you have another system for your "weird" systems. You may find that > these ones are small enough that you can just image the entire disk with > "dd". Yes, it is a little inefficient - but it will be safe, reliable, and > easy to understand and implement.I think the printers are probably small enough (<100G) that it may be easier just to pull the drives and dd(1)-image them. Converting all of this "code" to PS would just be a silly exercise. The NAS boxes (all, regrettably, different) would be a different issue...> And if you walk under a bus, or decide to retire, there is a chance that > someone else will understand how it all works.When I'm gone, my systems are coming with me! :> As for the pro bono stuff, my "absence" has already shown that entropy quickly governs! E.g., they can't even keep track of the *two* "privileged" account passwords on the systems I built for them last year! And, their (contract) IT guy is only a 20W lightbulb... as evidenced by his solutions to trivial problems: "I need to install some new printer drivers on these machines" when, in fact, the "problem" was a loose cable on the back of the printer! (Gee, the machines WERE working. Then, ALL of them stopped being able to print. Their configurations are LOCKED DOWN. Wouldn't you go looking for something OUTSIDE the machines -- like the PRINTER -- for the cause of the problem??)>>> What about the different developers and users - are they at similar >>> levels and are they cooperative/competent, or are you going to have to >>> do these backups and images because they are not good at following >>> version control routines? >> >> They aren't "backups" (see my thread to George). They are "restore >> images". They aren't regularly performed (like backups would be). Rather, >> a machine is imaged (typically *once*) and the image saved in order to >> recreate the machine's state at a future time (if it gets munged). >> >> So, I expect this to be far more involved than the routine "backups" I do >> for my working files/configuration. But, I expect the "restore" to be far >> simpler (UX) -- "push this button and wait". > > For unusual systems, images will be taken much more often than they will be > restored - the image is for emergencies.That may not be the case. E.g., I would only image the printers once. Install all the cruft that they "need", image them, then USE them in that configuration, thereafter. (getting stuff onto them is more tedious than "real computers" so the advantage of the image can pay off big if I have to replace or upgrade a disk)> For duplicating (nearly) identical > machines, images will be "restored" much more often than they are made. > These are different purposes, and I believe you should be looking at > different setups for these purposes.>>> Is there something special about your needs that makes such a system >>> impractical? > > OK, I've got a lot better picture of your aims now. And as I said above, I > really think you should separate this into two systems - I believe each will > be much easier than trying to make a general system that covers both.Off to vote...
Reply by ●November 4, 20142014-11-04
On 11/4/2014 2:46 AM, Jan Panteltje wrote:> <snipped rant>(sigh) And it is obvious that you didn't look through the sources.> Look, you want to compress a partition image without knowing what filesystem is on it, > that leaves gzip (or maybe zip). > > Get a life.Given the above, I suggest you "get an education".
Reply by ●November 4, 20142014-11-04
In comp.arch.embedded Robert Wessel <robertwessel2@yahoo.com> wrote: (snip, I wrote)>>I think that there was also a 3390 model that ran at 1200PRM, instead >>of the usual 3600RPM, and stores three logical tracks on one >>physical track. Access time is longer, so ti doesn't always help.> It's been the reduction in seek times over the years that's driven the > increase in rotational rates. 2400RPM was a semi-standard for much of > the sixties, when average seek times were 40+ms (at which point an > average 12.5ms latency is not the major contributor to access time). > In the seventies and eighties IBM (and others) went to 3600 RPM (8.4ms > latency), as seek times slowly dropped (30ms average on the 3330 in > 1973, to 17ms on the 3380 in 1983, and then to 9.5ms on the single > density, and 4200RPM, 3390s in 1989).But note that increasing rotational rate decreases capacity if you can't increase the bit rate. Also, at constant bit rate, decreasing rotational rate increases capacity.> On the flip side the biggest contribution to seek time reduction since > the 80s has been the reduction in disk diameter. Prior to that the 14 > inch size was very common on large systems, the 3390s reduced that to > 11 inches. Modern 15K RPM drives are usually 2.5 (or so) inch > platters, even when they're in a nominal 3.5 inch chassis.Along with putting a cache local to the drive.> 3390s are still the DASD of choice in MVS/zOS, but are now, of course, > emulated on top of ordinary fixed-block disks, and the performance > characteristics are quite different, and larger 3390 model volumes are > not slower (although they may have more contention for a I/O path if > parallel access volumes are not in use). So these days, most people > use 3390-27s or -54s (which were never real devices, but the common > names do indicate the relative capacities), unless they're using EAV, > which allows more than 64K cylinders, but the 3390 track geometry is > still maintained.As I understand it, 3390 always emulated CKD using fixed-sized blocks, but in earlier models they were internal to the drive and controller. Later on, they were emulated using ordinary FB-512 drives. (I believe this is visible in the blocks per track calculation. That physical gaps between C, K, and D don't exist like on previous drives.) But there was discussion, and right now I don't see any reference, to a physical 1200 RPM drive, which I believe was 3390-27, to increase capacity without increasing bit rate, by slowing the drive. (At some point, bit rate is limited by head inductance and other physical factors.) But yes, with FB-512 emulation, that may have been an early favorite for emulation, with its nice large size. -- glen
Reply by ●November 4, 20142014-11-04
On a sunny day (Tue, 04 Nov 2014 08:27:39 -0700) it happened Don Y <this@is.not.me.com> wrote in <m3ar9i$j8l$1@speranza.aioe.org>:>On 11/4/2014 2:46 AM, Jan Panteltje wrote: > >> <snipped rant> > >(sigh) And it is obvious that you didn't look through the sources. > >> Look, you want to compress a partition image without knowing what filesystem is on it, >> that leaves gzip (or maybe zip). >> >> Get a life. > >Given the above, I suggest you "get an education".No, you are clueless, and try to invent things that have already been invented. Try: gzip -c /dev/sdaX > my_gzipped_partition.gz If ANY regular struture is present on that device, then it will be replaced with some token. You'r rude too, well no worry I'm sure you will not invent a better gzip... You are clueless!
Reply by ●November 4, 20142014-11-04
On Tue, 04 Nov 2014 08:31:57 -0600, Robert Wessel <robertwessel2@yahoo.com> Gave us:> Modern 15K RPM drives are usually 2.5 (or so) inch >platters, even when they're in a nominal 3.5 inch chassis. >1.8 inch. Usually in a 2.5" form factor case. The On-the-drive caching has also made up for a lot of the problems fragmentation DID cause for FAT in the early years. Fragmentation WAS a problem, when all those seek transitions across those expansive platters added up. Now, with full sector caching, and even full cylinder caching, all this goes away. Fragmentation is NOT a problem today, even when a drive IS fragmented, and ONLY poses a small problem for the pro set, and they manage it out 100% on a daily basis.
Reply by ●November 4, 20142014-11-04
On Mon, 03 Nov 2014 21:48:56 -0700, Don Y <this@is.not.me.com> wrote:>IIRC, the Bullet Server could (did?) create contiguously stored >files. But, that was largely possible because of its "write once" >semantics (size declared a priori).It could, but usually did not because the client usually could not give it a size. Typically, a Bullet file was allocated in one or more largish extents and then consolidated when the file was closed. Any copies made of an existing file - e.g., to/from a remote server - always were contiguously stored. George
Reply by ●November 4, 20142014-11-04
On Tue, 4 Nov 2014 15:39:35 +0000 (UTC), glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:>In comp.arch.embedded Robert Wessel <robertwessel2@yahoo.com> wrote: > >(snip, I wrote) >>>I think that there was also a 3390 model that ran at 1200PRM, instead >>>of the usual 3600RPM, and stores three logical tracks on one >>>physical track. Access time is longer, so ti doesn't always help. > >> It's been the reduction in seek times over the years that's driven the >> increase in rotational rates. 2400RPM was a semi-standard for much of >> the sixties, when average seek times were 40+ms (at which point an >> average 12.5ms latency is not the major contributor to access time). >> In the seventies and eighties IBM (and others) went to 3600 RPM (8.4ms >> latency), as seek times slowly dropped (30ms average on the 3330 in >> 1973, to 17ms on the 3380 in 1983, and then to 9.5ms on the single >> density, and 4200RPM, 3390s in 1989). > >But note that increasing rotational rate decreases capacity if you >can't increase the bit rate. Also, at constant bit rate, decreasing >rotational rate increases capacity.True, but it's been the management of the physical magnetic spot size that's been the main driver in recent decades, not so much the actual data rate. The slow (one-third speed) 3390-9s are an interesting case. The normal speed 3390s (-1/2/3s) were already maxing out the data rate of parallel channels, so 3390s with triple the physical track capacities would have been exceptional poor performers, needing all I/Os to go completely through the speed matching buffers. OTOH, ESCON was announced less than a year before the 3390-9s, and that *could* have handled the higher data rate, but 3390-9s, could, IIRC, still be attached to parallel channels. OTTH, tripling the data rate would likely have required a much bigger electronics upgrade in the 3390 (and 3990 controller). So while I don't know for sure, I've always assumed that the -9s were more a product of opportunity, building on the existing 3390 hardware, and having to live within some of the limits imposed by that, and not really any demonstration of the state-of-the-art of disk technology (which had already passed to fixed block devices by that time anyway - IBM itself shipped the 0681 5.25 drive, with 850MB, just a few months after the 950MB 3390-1s were announced).>> On the flip side the biggest contribution to seek time reduction since >> the 80s has been the reduction in disk diameter. Prior to that the 14 >> inch size was very common on large systems, the 3390s reduced that to >> 11 inches. Modern 15K RPM drives are usually 2.5 (or so) inch >> platters, even when they're in a nominal 3.5 inch chassis. > >Along with putting a cache local to the drive.Cached accesses don't really count as "normal" accesses with seek time and rotational latency. FSVO "local", even 3380 and 3390s could be attached to controllers with cache.>> 3390s are still the DASD of choice in MVS/zOS, but are now, of course, >> emulated on top of ordinary fixed-block disks, and the performance >> characteristics are quite different, and larger 3390 model volumes are >> not slower (although they may have more contention for a I/O path if >> parallel access volumes are not in use). So these days, most people >> use 3390-27s or -54s (which were never real devices, but the common >> names do indicate the relative capacities), unless they're using EAV, >> which allows more than 64K cylinders, but the 3390 track geometry is >> still maintained. > >As I understand it, 3390 always emulated CKD using fixed-sized blocks, >but in earlier models they were internal to the drive and controller. >Later on, they were emulated using ordinary FB-512 drives. (I believe >this is visible in the blocks per track calculation. That physical >gaps between C, K, and D don't exist like on previous drives.)3380s and 3390s both physically used fixed size cells on disks. In the case of 3390s, these were 34 bytes in size. There was still (considerable) overhead for the key and data segments, but it all rounded to an integer number of 34 byte cells. The size (in cells) calculation for 3390s is on page 10 of: http://bitsavers.trailing-edge.com/pdf/ibm/dasd/reference_summary/GX26-4577-0_3390_Reference_Summary_Jun89.pdf But even if there was not a physical, 2314-style gap between the count key and data sections, there was still overhead. You needed about ~4% more nominal cells bytes than your key or record, *plus* there was a fixed overhead of nine cells for each segment (plus ten cells for the count, in all cases), which is certainly a gap of sorts. The net result is that there were gaps, and the exact calculation is different, but the general form is the same. One the 3350, for example, you assumed overhead of 185 or 267 bytes (the latter if you had keys) for each record - that's pretty much the same concept as 19 or 29 cells overhead per record on 3390s. We can quibble over whether or not any of that is emulation. 3380 were similar, although with 32 byte cells, and a somewhat simpler overhead calculation.>But there was discussion, and right now I don't see any reference, >to a physical 1200 RPM drive, which I believe was 3390-27, to >increase capacity without increasing bit rate, by slowing the drive. >(At some point, bit rate is limited by head inductance and other >physical factors.)My post would have been clearer, had I not managed to clip the first paragraph: "The "real" single, double and triple density 3390s (model-1/-2/-3) all ran at 4200 RPM (faster than the preceding 3380s at 3600 RPM). The 9X density 3390-9s were the ones that ran at a third that speed." There were never "real" -27s or -54s, it was the -9s which were slow.>But yes, with FB-512 emulation, that may have been an early favorite >for emulation, with its nice large size.All of the 3390s had the same track geometry, and pretty much all of the 3390 emulations work on a per-track basis. IOW, they simulate, in some form, what you can store in the 1749 physical 34 byte cells of the (visible) track on any 3390 device. 3390-1/2/3/9s all had ~56KB tracks and 15 of those per cylinder, as far as the OS could tell, emulation of a particular model really just altered how many cylinders were on the emulated volume. The large volumes came as the OS's cleaned up their support for larger number of cylinders.
Reply by ●November 4, 20142014-11-04
David Brown wrote:> On 04/11/14 03:10, DecadentLinuxUserNumeroUno wrote: >>On Tue, 4 Nov 2014 01:31:50 +0000 (UTC), Andrew Smallshaw >>>No, on _any_ general purpose file system. It is inevitable. You >>>seem to be of the impression that Linux is somehow the ultimate >>>system, >> >> I was "under the impression" that the ext series of file systems, as >>well as MS' NTFS fought such occurrences with a different operation and >>management paradigm, than the old, FAT type method. > > There are three major things that affect the fragmentation rate on a > filesystem, and a number of things that influence the effect of the > fragmentation. > > One is the type of filesystem. Some, such as FAT, have a very simple > structure that is prone to fragmentation. Others, such as ext4 or xfs, > have features such as "extents" and "allocation groups" that greatly > reduce the fragmentation.Actually, one can even argue the other way around. The argument used to explain why ext2 does not fragment was that it over-allocates files, and if you write two files in parallel, they won't step on each other's toes all the time like they do with a simple stupid FAT implementation that just uses the next available cluster. But that optimisation can easily be applied to a FAT implementation as well, it is nothing intrinsic to ext2 or FAT. However, when you write a single large file to a blank FAT partition, it will not fragment. All data will arrive on the disk without any gaps inbetween. In contrast, in ext2, the file *will* have gaps - one could say, it is fragmented -, because of the division of the disk into block groups. That is, every so many megabytes, the file has a gap which is used by block group headers, inodes, block bitmaps etc. And *that* is intrinsic to ext2 and FAT disk layouts. (Not sure whether this still applies to ext3/ext4.) Stefan







