On Sun, 02 Nov 2014 10:42:55 -0700, Don Y <this@is.not.me.com> Gave us:>So, you can easily have just 5 or 10% of the disk "in use" but 90% of it >"dirty". Dirt doesn't compress nearly as well as "virgin media" :>I think you have the wrong "idear" about how they work. Compression of ANY file has NOTHING to do with the state the disk file tables or previous written sectors are in. Absolutely NOTHING. Now, IF you are compressing a volume, you should use a method that does not fuss with empty sectors. It defines their location and stores THAT info, NOT "empty but dirty" sector data. You lie, and you likely do not even know why.
Disk imaging strategy
Started by ●November 2, 2014
Reply by ●November 2, 20142014-11-02
Reply by ●November 2, 20142014-11-02
On 11/2/2014 12:21 PM, Jan Panteltje wrote:> On a sunny day (Sun, 02 Nov 2014 12:06:10 -0700) it happened Don Y > <this@is.not.me.com> wrote in <m35vb1$3qp$1@speranza.aioe.org>: > >>> Not quite sure whatyouwant, but I have done this a lot: >>> start some recue disk, plug in some USB disk. >>> mount the partition you want, then: >>> tar -zcvf partition_sda1_image.tgz /dev/sda1 >> >> The problem is creating "partition_sda1_image.tgz" *without* being concerned >> with the underlying filesystem. So, you have no knowledge (from the filesystem >> layer) of the "valid" contents of the volume (vs. blank/deleted content). > > Sure you can dd that partition, but now you really are in trouble. > Its safer to tar a filesystem (that should NOT be currently running, else you are in trouble too), > you can always untar it into an other filesystem (ext2, ext4, reiserfs, etc) that is compatible.By imaging the volume OFF-LINE (i.e., when the OS is NOT in control of the hardware), you ensure that the filesystem's state is self-consistent (assuming you shut the OS down cleanly, etc.). This neatly avoids the issues of "locked files" and privilege that might otherwise interfere with your imaging of the *entire* filesystem (e.g., even those parts not directly accessible to "applications" -- regardless of privilege) But, this comes at the cost of not knowing which parts (sectors) of the medium actually have "content" that must be preserved! So, you have to include every sector in your image! [we'll ignore any chicanery that tries to hide information in unused sectors] If, just prior to creating the image, an application has constructed a large file full of completely random (i.e., noncompressible) data AND THEN DELETED THAT FILE, the sectors previously occupied by that file are now full of uncompressible data. EVEN THOUGH THE "SYSTEM" WILL NEVER *LOOK* AT THAT "data" AGAIN (because it has been "deleted"). [Of course, if the filesystem/hardware automatically scrubs deleted sectors, then the issue changes -- not necessarily better OR worse!] So, your "image" will contain a chunk of data virtually identical in size (and content!) to the "deleted data" present on these dirty parts of the medium (because you MUST preserve it as you don't know if it is "live data" or not without knowledge of the filesystem's structure). If, instead, you deliberately create a huge file (one that effectively consumes ALL of the unused space on the medium) and fill that file with VERY compressible data (YOU define what is maximally compressible by knowing how YOUR compressor will operate!) and *then* delete the file, you still have the problem of having to image "the sectors previously occupied by that file" BUT, now those sectors contain highly compressible data! (assuming the system doesn't get around to overwriting them with cruft between the time you delete them and the time the system comes to a halt).> The other thing I have noticed is that when copying partitions of disk > images to bluray with my LG writer, some bytes at the start seem to get > changed, could be an error, but this does not seem to happen when copying > to a filesystem. > The other thing is in case restore of an image to a similar device, > I found that for example 8 GB card 1 (as source) has a different _real_ > size than 8 GB card 2 (same make, same type, same specified size, bought > at the same time) this could be related to bad sector managing of FLASH.Experience teaches you to always leave a bit of the medium "unused" (i.e., not present in ANY partition) to accommodate "small" variations between drives. Many megabytes on a 100G drive is "noise". Likewise, a GB on a TB drive is similarly "noise". I've found this easier than worrying about resizing partitions (which typically requires the partition to have been "defragmented" beforehand to ensure there is nothing present at the tail end of the partition).> If the new card (or disk) that you copy to, is smaller, then you are in trouble too. > I _always_ run a full byte by byte compare after burning a bluray... > wrote a program for that dvdimagecmp. > I use it to check FLASH images too. > All things of importance these days go to bluray kept in a dark metal case. > It has close to a thousand disks in it. > >> Then, restore it *without* the rescue disk. > > Na, never work on a running filesystem, UNLESS its just a file or you know > _exactly_ what is going on.That is exactly the point: you're running on bare metal knowing exactly what that metal GUARANTEES to the bootstrap loader. No need to worry that some flakey driver is going to have to be accommodated, etc. No need for ANY "OS" at all! Every opcode that executes is one that you wrote (aside from the "well behaved" BSP/BIOS/OFW/etc.
Reply by ●November 2, 20142014-11-02
On 02/11/14 21:00, Don Y wrote:> Hi Matt, > > On 11/2/2014 12:08 PM, mroberds@att.net wrote: >> In sci.electronics.design Don Y <this@is.not.me.com> wrote: >>> I.e., without knowledge of the specific filesystem(s) involved, you >>> don't know how to recognize live data from deleted data. >> >> You could use a program that already has this knowledge, like Partimage >> http://www.partimage.org/ or (I'm pretty sure) Acronis True Image. I > > Yes, my Clonezilla approach to date builds on partimage/partclone. > But, you shouldn't *need* to know anything about the filesystem > as that just effects "efficiency" (compressibility?) >You obviously know at least a little about Clonezilla - what do you find to be the problem with it? If you don't want to make a direct copy of the entire disk image, then you need to know the filesystem (or autodetect it) - that's the only way to be able to tell the important parts from the unimportant parts. And once you have that information, surely partimage/partclone will do the job?
Reply by ●November 2, 20142014-11-02
With an unprotected system like MSDOS booted on a floppy or a flash disk, a disk editor can copy the sectors on one partition to another. Simtel\msdos has those editors, I believe, but searching for a simtel site is required. The simplest procedure is to format the first half of a disk and use the other half for the backup image. Hul In comp.arch.embedded Don Y <this@is.not.me.com> wrote:> Hi,> I'm writing a bit of code to image disk contents REGARDLESS OF THE > FILESYSTEM(s) contained thereon.> This doesn't have to be "ideal" (defined as "effortless", "minimal > image size", etc.) but should be pretty close.> It is not intended to be performed often -- "write once, read multiple" > (i.e., RESTORE *far* more often than IMAGE).> The challenge comes in the filesystem(s) neutral aspect. E.g., I > should be able to image a disk containing FAT32, NTFS, FFSv1/2, QFS, > individual RAID* volumes, little/BIG endian, etc. -- with the same > executable!> A naive approach to this would be to plumb dd to a compressor -- running > both OUTSIDE the native OS. But, for large/dirty volumes, this gives you > an unacceptably large resulting image -- because you end up having to store > "discarded data" which could potentially be HUGE (consider a large volume > that has seen lots of write/delete cycles) esp in comparison with the > actual precious data!> [I'd like to be able to store the image on a (set of) optical media and/or > an unused "partition" somewhere]> I.e., without knowledge of the specific filesystem(s) involved, you don't > know how to recognize live data from deleted data.> The *hack* that I am currently evaluating is to invoke a trivial executable > UNDER THE NATIVE OS that simply creates large "blank" (i.e., highly > compressible) files until the volume is "full", then unlinks them all. > Doing this while the system is reasonably quiescent isn't guaranteed to > "vacuum" all available space but would make a big dent in it (if the > system is brought down shortly thereafter).> [Yes, I understand privilege constraints in various OS's, quotas, etc. > Those are all easy to work around...]> Then, dd | compress (on bare iron).> Again, not ideal but probably the best bang for the least buck?
Reply by ●November 2, 20142014-11-02
On 02/11/14 16:25, Don Y wrote:> Hi, > > I'm writing a bit of code to image disk contents REGARDLESS OF THE > FILESYSTEM(s) contained thereon. > > This doesn't have to be "ideal" (defined as "effortless", "minimal > image size", etc.) but should be pretty close. > > It is not intended to be performed often -- "write once, read multiple" > (i.e., RESTORE *far* more often than IMAGE). > > The challenge comes in the filesystem(s) neutral aspect. E.g., I > should be able to image a disk containing FAT32, NTFS, FFSv1/2, QFS, > individual RAID* volumes, little/BIG endian, etc. -- with the same > executable! > > A naive approach to this would be to plumb dd to a compressor -- running > both OUTSIDE the native OS. But, for large/dirty volumes, this gives you > an unacceptably large resulting image -- because you end up having to store > "discarded data" which could potentially be HUGE (consider a large volume > that has seen lots of write/delete cycles) esp in comparison with the > actual precious data! > > [I'd like to be able to store the image on a (set of) optical media and/or > an unused "partition" somewhere] > > I.e., without knowledge of the specific filesystem(s) involved, you don't > know how to recognize live data from deleted data.Try detecting the filesystem (as a hack, do a mount from Linux without specifying the filesystem, and see what system was mounted - it will autodetect most types).> > The *hack* that I am currently evaluating is to invoke a trivial executable > UNDER THE NATIVE OS that simply creates large "blank" (i.e., highly > compressible) files until the volume is "full", then unlinks them all. > Doing this while the system is reasonably quiescent isn't guaranteed to > "vacuum" all available space but would make a big dent in it (if the > system is brought down shortly thereafter). >This has many disadvantages. On many systems, you'll create a sparse file, taking no space on the disk. On some SSD's (ones that don't have compression or other zero detection), you will cripple the disk's performance. And on disks where you actually end up writing zeros to the disk, it will take forever.> [Yes, I understand privilege constraints in various OS's, quotas, etc. > Those are all easy to work around...] > > Then, dd | compress (on bare iron). > > Again, not ideal but probably the best bang for the least buck?
Reply by ●November 2, 20142014-11-02
On 02/11/14 18:42, Don Y wrote:> On 11/2/2014 10:09 AM, glen herrmannsfeldt wrote: > >>> A naive approach to this would be to plumb dd to a compressor -- running >>> both OUTSIDE the native OS. But, for large/dirty volumes, this gives >>> you >>> an unacceptably large resulting image -- because you end up having to >>> store >>> "discarded data" which could potentially be HUGE (consider a large >>> volume >>> that has seen lots of write/delete cycles) esp in comparison with the >>> actual precious data! >> >> For older disks, that are usually relatively small, that is probably >> the best choice. > > The problem comes with newer disks. E.g., I keep ~1T on each workstation > and *only* drag "current projects" onto them counting on the file servers > to maintain most of my stuff "semi-offline". > > So, you can easily have just 5 or 10% of the disk "in use" but 90% of it > "dirty". Dirt doesn't compress nearly as well as "virgin media" :> >How about replacing the 1 TB harddisk with an 80 MB SSD? Then you enforce the rule that only a small amount is on the disk locally, you can use a simple dd with compression, and everything works faster.
Reply by ●November 2, 20142014-11-02
Hi David, On 11/2/2014 3:10 PM, David Brown wrote:> On 02/11/14 21:00, Don Y wrote: >> Hi Matt, >> >> On 11/2/2014 12:08 PM, mroberds@att.net wrote: >>> In sci.electronics.design Don Y <this@is.not.me.com> wrote: >>>> I.e., without knowledge of the specific filesystem(s) involved, you >>>> don't know how to recognize live data from deleted data. >>> >>> You could use a program that already has this knowledge, like Partimage >>> http://www.partimage.org/ or (I'm pretty sure) Acronis True Image. I >> >> Yes, my Clonezilla approach to date builds on partimage/partclone. >> But, you shouldn't *need* to know anything about the filesystem >> as that just effects "efficiency" (compressibility?) > > You obviously know at least a little about Clonezilla - what do you find to be > the problem with it? If you don't want to make a direct copy of the entire > disk image, then you need to know the filesystem (or autodetect it) - that's > the only way to be able to tell the important parts from the unimportant > parts. And once you have that information, surely partimage/partclone will do > the job?To avoid thread drift (and the pissing and moaning that comes with it), this doesn't address the question I asked... (i.e., remove the FS-specific code from partclone and then tell me how to make it work!)
Reply by ●November 2, 20142014-11-02
Hi David, On 11/2/2014 3:16 PM, David Brown wrote:> On 02/11/14 18:42, Don Y wrote: >> On 11/2/2014 10:09 AM, glen herrmannsfeldt wrote: >> >>>> A naive approach to this would be to plumb dd to a compressor -- running >>>> both OUTSIDE the native OS. But, for large/dirty volumes, this gives >>>> you >>>> an unacceptably large resulting image -- because you end up having to >>>> store >>>> "discarded data" which could potentially be HUGE (consider a large >>>> volume >>>> that has seen lots of write/delete cycles) esp in comparison with the >>>> actual precious data! >>> >>> For older disks, that are usually relatively small, that is probably >>> the best choice. >> >> The problem comes with newer disks. E.g., I keep ~1T on each workstation >> and *only* drag "current projects" onto them counting on the file servers >> to maintain most of my stuff "semi-offline". >> >> So, you can easily have just 5 or 10% of the disk "in use" but 90% of it >> "dirty". Dirt doesn't compress nearly as well as "virgin media" :> > > How about replacing the 1 TB harddisk with an 80 MB SSD? Then you enforce the > rule that only a small amount is on the disk locally, you can use a simple dd > with compression, and everything works faster.How does that give me 1T of storage?
Reply by ●November 2, 20142014-11-02
On 11/2/2014 3:13 PM, Hul Tytus wrote:> With an unprotected system like MSDOS booted on a floppy or a flash disk, > a disk editor can copy the sectors on one partition to another. Simtel\msdos > has those editors, I believe, but searching for a simtel site is required. > The simplest procedure is to format the first half of a disk and use the > other half for the backup image.What I currently do is similar -- except no DOS, etc. (just write a boot loader that effectively does the decompress & copy without the overhead of a "real OS"). Not using compression is highly wasteful of disk space (for the "restore image"). If the image is to co-reside on the medium with the live data, then it'd be nice not to have to "throw away" half the medium for this "feature" E.g., the laptops that I build for students tend to have ~160G drives that I can cut into a "system" partition (which I want to be able to restore on-demand) as well as a "data" partition (which I will leave to the student to maintain... if their data gets clobbered, that's THEIR problem; at least the machine will still be runnable after recovery)
Reply by ●November 2, 20142014-11-02
Hi David, On 11/2/2014 3:13 PM, David Brown wrote:> On 02/11/14 16:25, Don Y wrote: >> I'm writing a bit of code to image disk contents REGARDLESS OF THE >> FILESYSTEM(s) contained thereon. >> >> This doesn't have to be "ideal" (defined as "effortless", "minimal >> image size", etc.) but should be pretty close. >> >> It is not intended to be performed often -- "write once, read multiple" >> (i.e., RESTORE *far* more often than IMAGE). >> >> The challenge comes in the filesystem(s) neutral aspect. E.g., I >> should be able to image a disk containing FAT32, NTFS, FFSv1/2, QFS, >> individual RAID* volumes, little/BIG endian, etc. -- with the same >> executable! >> >> A naive approach to this would be to plumb dd to a compressor -- running >> both OUTSIDE the native OS. But, for large/dirty volumes, this gives you >> an unacceptably large resulting image -- because you end up having to store >> "discarded data" which could potentially be HUGE (consider a large volume >> that has seen lots of write/delete cycles) esp in comparison with the >> actual precious data! >> >> [I'd like to be able to store the image on a (set of) optical media and/or >> an unused "partition" somewhere] >> >> I.e., without knowledge of the specific filesystem(s) involved, you don't >> know how to recognize live data from deleted data. > > Try detecting the filesystem (as a hack, do a mount from Linux without > specifying the filesystem, and see what system was mounted - it will autodetect > most types).That's not the issue. The problem statement is: - provide a means of imaging a volume WITHOUT KNOWLEDGE OF THE FILESYSTEM(S) THAT IT MIGHT (or might NOT!) contain - do so with a goal of minimizing the image preserved>> The *hack* that I am currently evaluating is to invoke a trivial executable >> UNDER THE NATIVE OS that simply creates large "blank" (i.e., highly >> compressible) files until the volume is "full", then unlinks them all. >> Doing this while the system is reasonably quiescent isn't guaranteed to >> "vacuum" all available space but would make a big dent in it (if the >> system is brought down shortly thereafter). > > This has many disadvantages. On many systems, you'll create a sparse file, > taking no space on the disk.Only if you are writing /dev/zero to the medium. E.g., an OS can't decide that *arbitrary* (yet highly repeatable/compressible!) data is "blank" without SOME way of recording what that data *was*. So, if you want to assume zeroes can be ignored, then you *must* record "fives" when they are presented to you (Or, "supercalifragilisticexpialidocious", etc.)> On some SSD's (ones that don't have compression > or other zero detection), you will cripple the disk's performance. And on > disks where you actually end up writing zeros to the disk, it will take forever.There are tradeoffs with every design decision. I can tar(1) to a PPT device. It wouldn't be *practical* to do this for a MB file! (but, that shouldn't preclude me doing it for a smaller file -- where *I* define my own idea of "smaller"... perhaps even hundreds of KB!) The point is to leave the policy decision to the user, not the person who codes it ("Oh, this would be bad for XYZ so I won't allow it to work with XYZ")>> [Yes, I understand privilege constraints in various OS's, quotas, etc. >> Those are all easy to work around...] >> >> Then, dd | compress (on bare iron). >> >> Again, not ideal but probably the best bang for the least buck? >







