On Sun, 02 Nov 2014 23:11:52 -0600, Robert Wessel <robertwessel2@yahoo.com> Gave us:>On Sun, 02 Nov 2014 10:42:55 -0700, Don Y <this@is.not.me.com> wrote: > >>On 11/2/2014 10:09 AM, glen herrmannsfeldt wrote: >> >>>> A naive approach to this would be to plumb dd to a compressor -- running >>>> both OUTSIDE the native OS. But, for large/dirty volumes, this gives you >>>> an unacceptably large resulting image -- because you end up having to store >>>> "discarded data" which could potentially be HUGE (consider a large volume >>>> that has seen lots of write/delete cycles) esp in comparison with the >>>> actual precious data! >>> >>> For older disks, that are usually relatively small, that is probably >>> the best choice. >> >>The problem comes with newer disks. E.g., I keep ~1T on each workstation >>and *only* drag "current projects" onto them counting on the file servers >>to maintain most of my stuff "semi-offline". >> >>So, you can easily have just 5 or 10% of the disk "in use" but 90% of it >>"dirty". Dirt doesn't compress nearly as well as "virgin media" :> >> >>>> [I'd like to be able to store the image on a (set of) optical media and/or >>>> an unused "partition" somewhere] >>> >>>> I.e., without knowledge of the specific filesystem(s) involved, you don't >>>> know how to recognize live data from deleted data. >>> >>>> The *hack* that I am currently evaluating is to invoke a trivial executable >>>> UNDER THE NATIVE OS that simply creates large "blank" (i.e., highly >>>> compressible) files until the volume is "full", then unlinks them all. >>>> Doing this while the system is reasonably quiescent isn't guaranteed to >>>> "vacuum" all available space but would make a big dent in it (if the >>>> system is brought down shortly thereafter). >>> >>> Be sure to fsck or chkdsk first. >> >>Yes, of course. The point is that I am willing to expend a fair bit of >>effort -- including "unscripted" actions -- to get the initial "master" >>disk image "Correct". But, want most of that effort to be in the native OS >>instead of having to implement hooks for every conceivable file system type. > > >FWIW, Windows has had an option ("/w") on the "cipher" command to wipe >all unused areas on an NTFS volume since the W2K days, and it will do >that on a live volume. > >I'm not sure what it leaves in the empty space, but at least a decade >ago it took several passes (writing zero, writing ones, writing random >numbers, etc.) to all of the unallocated space on the volume. For >your purposes* it would hopefully not have that random data pass as >the last one. I also don't know if that ever worked on non-NTFS >volumes. > >IIRC, this was an add-on you had to download from MS in W2K, and part >of the standard installation of *some* XP versions (Pro and server, I >think), and has been standard on all Vista, Win7 and Win8 versions. > > >*The purpose of the command/option is to prevent people from >recovering data from deallocated space, not prepping a volume for >(image) compression.Hey, you could image your drive like this guy did. He made a video "image" of his drive. Hehehehe... BRL! He doesn't know how to count heads or platters though. Don't know if I have seen bigger idiots. Well... there is Sloman. They inscribed 10 MB onto a clear roll of shipping tape to demonstrate how a laser cube storage medium would work. Two intersecting lasers scan each layer in a single pass, and the entire datagram (whole page)is read. Too fragile, and would require a hard, miniature optical bench inside about an old 5.25" full height form factor canister for a 1" cube device. Way too fragile.
Disk imaging strategy
Started by ●November 2, 2014
Reply by ●November 3, 20142014-11-03
Reply by ●November 3, 20142014-11-03
On Sun, 02 Nov 2014 22:08:06 -0800, DecadentLinuxUserNumeroUno <DLU1@DecadentLinuxUser.org> Gave us:>On Sun, 02 Nov 2014 23:11:52 -0600, Robert Wessel ><robertwessel2@yahoo.com> Gave us: > >>On Sun, 02 Nov 2014 10:42:55 -0700, Don Y <this@is.not.me.com> wrote: >> >>>On 11/2/2014 10:09 AM, glen herrmannsfeldt wrote: >>> >>>>> A naive approach to this would be to plumb dd to a compressor -- running >>>>> both OUTSIDE the native OS. But, for large/dirty volumes, this gives you >>>>> an unacceptably large resulting image -- because you end up having to store >>>>> "discarded data" which could potentially be HUGE (consider a large volume >>>>> that has seen lots of write/delete cycles) esp in comparison with the >>>>> actual precious data! >>>> >>>> For older disks, that are usually relatively small, that is probably >>>> the best choice. >>> >>>The problem comes with newer disks. E.g., I keep ~1T on each workstation >>>and *only* drag "current projects" onto them counting on the file servers >>>to maintain most of my stuff "semi-offline". >>> >>>So, you can easily have just 5 or 10% of the disk "in use" but 90% of it >>>"dirty". Dirt doesn't compress nearly as well as "virgin media" :> >>> >>>>> [I'd like to be able to store the image on a (set of) optical media and/or >>>>> an unused "partition" somewhere] >>>> >>>>> I.e., without knowledge of the specific filesystem(s) involved, you don't >>>>> know how to recognize live data from deleted data. >>>> >>>>> The *hack* that I am currently evaluating is to invoke a trivial executable >>>>> UNDER THE NATIVE OS that simply creates large "blank" (i.e., highly >>>>> compressible) files until the volume is "full", then unlinks them all. >>>>> Doing this while the system is reasonably quiescent isn't guaranteed to >>>>> "vacuum" all available space but would make a big dent in it (if the >>>>> system is brought down shortly thereafter). >>>> >>>> Be sure to fsck or chkdsk first. >>> >>>Yes, of course. The point is that I am willing to expend a fair bit of >>>effort -- including "unscripted" actions -- to get the initial "master" >>>disk image "Correct". But, want most of that effort to be in the native OS >>>instead of having to implement hooks for every conceivable file system type. >> >> >>FWIW, Windows has had an option ("/w") on the "cipher" command to wipe >>all unused areas on an NTFS volume since the W2K days, and it will do >>that on a live volume. >> >>I'm not sure what it leaves in the empty space, but at least a decade >>ago it took several passes (writing zero, writing ones, writing random >>numbers, etc.) to all of the unallocated space on the volume. For >>your purposes* it would hopefully not have that random data pass as >>the last one. I also don't know if that ever worked on non-NTFS >>volumes. >> >>IIRC, this was an add-on you had to download from MS in W2K, and part >>of the standard installation of *some* XP versions (Pro and server, I >>think), and has been standard on all Vista, Win7 and Win8 versions. >> >> >>*The purpose of the command/option is to prevent people from >>recovering data from deallocated space, not prepping a volume for >>(image) compression. > > > Hey, you could image your drive like this guy did. > > He made a video "image" of his drive. Hehehehe... BRL! > > He doesn't know how to count heads or platters though. > > Don't know if I have seen bigger idiots. > > Well... there is Sloman.DAMN!!! I forgot the link again! http://www.youtube.com/watch?v=Y9Z8vF46fXo I think that might be Sloman... Bwuahahaha! BRL!!
Reply by ●November 3, 20142014-11-03
On Sun, 02 Nov 2014 22:08:06 -0800, DecadentLinuxUserNumeroUno <DLU1@DecadentLinuxUser.org> Gave us:> > Hey, you could image your drive like this guy did. > > He made a video "image" of his drive. Hehehehe... BRL!Hey guys! Image THIS quarter million dollar hard drive! http://www.youtube.com/watch?v=CBjoWMA5d84 I really wish I had it. Damned Aussie lucky dogs! He talks funny too. :-)
Reply by ●November 3, 20142014-11-03
On a sunny day (Sun, 02 Nov 2014 15:07:33 -0700) it happened Don Y <this@is.not.me.com> wrote in <m369v4$tj$1@speranza.aioe.org>:>On 11/2/2014 12:21 PM, Jan Panteltje wrote: >> On a sunny day (Sun, 02 Nov 2014 12:06:10 -0700) it happened Don Y >> <this@is.not.me.com> wrote in <m35vb1$3qp$1@speranza.aioe.org>: >> >>>> Not quite sure whatyouwant, but I have done this a lot: >>>> start some recue disk, plug in some USB disk. >>>> mount the partition you want, then: >>>> tar -zcvf partition_sda1_image.tgz /dev/sda1 >>> >>> The problem is creating "partition_sda1_image.tgz" *without* being concerned >>> with the underlying filesystem. So, you have no knowledge (from the filesystem >>> layer) of the "valid" contents of the volume (vs. blank/deleted content). >> >> Sure you can dd that partition, but now you really are in trouble. >> Its safer to tar a filesystem (that should NOT be currently running, else you are in trouble too), >> you can always untar it into an other filesystem (ext2, ext4, reiserfs, etc) that is compatible. > >By imaging the volume OFF-LINE (i.e., when the OS is NOT in control of the >hardware), you ensure that the filesystem's state is self-consistent >(assuming you shut the OS down cleanly, etc.). This neatly avoids the >issues of "locked files" and privilege that might otherwise interfere >with your imaging of the *entire* filesystem (e.g., even those parts not >directly accessible to "applications" -- regardless of privilege) > >But, this comes at the cost of not knowing which parts (sectors) of the >medium actually have "content" that must be preserved! So, you have to >include every sector in your image!Not if you tar a filesystem (on that partition) see the script I posted. So mount partition, say: mount /dev/sdd1 /mnt/sdd1 tar that filesystem: tar -zcvf my_sdd1_backup.tgz /mnt/sdd1/* If the partition has no files the tgz will be very very small. If you want it back, create any filesystem, and tar -zxvf my_sdd1_backup.tgz All links and timestamps sare preservd too that way.>> The other thing I have noticed is that when copying partitions of disk >> images to bluray with my LG writer, some bytes at the start seem to get >> changed, could be an error, but this does not seem to happen when copying >> to a filesystem. >> The other thing is in case restore of an image to a similar device, >> I found that for example 8 GB card 1 (as source) has a different _real_ >> size than 8 GB card 2 (same make, same type, same specified size, bought >> at the same time) this could be related to bad sector managing of FLASH. > >Experience teaches you to always leave a bit of the medium "unused" >(i.e., not present in ANY partition) to accommodate "small" variations >between drives. Many megabytes on a 100G drive is "noise". Likewise, >a GB on a TB drive is similarly "noise". > >I've found this easier than worrying about resizing partitions (which >typically requires the partition to have been "defragmented" beforehand >to ensure there is nothing present at the tail end of the partition).I think I have not 'defragmented' anything ever in my life in Linux, there is no need.>> If the new card (or disk) that you copy to, is smaller, then you are in trouble too. >> I _always_ run a full byte by byte compare after burning a bluray... >> wrote a program for that dvdimagecmp. >> I use it to check FLASH images too. >> All things of importance these days go to bluray kept in a dark metal case. >> It has close to a thousand disks in it. >> >>> Then, restore it *without* the rescue disk. >> >> Na, never work on a running filesystem, UNLESS its just a file or you know >> _exactly_ what is going on. > >That is exactly the point: you're running on bare metal knowing exactly >what that metal GUARANTEES to the bootstrap loader. No need to worry >that some flakey driver is going to have to be accommodated, etc. No >need for ANY "OS" at all! Every opcode that executes is one that you wrote >(aside from the "well behaved" BSP/BIOS/OFW/etc.Sorry cannot follow you there....
Reply by ●November 3, 20142014-11-03
On 03/11/14 02:39, Don Y wrote:> Hi David, > > On 11/2/2014 5:08 PM, David Brown wrote: >> On 02/11/14 23:30, Don Y wrote: >>> On 11/2/2014 3:16 PM, David Brown wrote: >>>> On 02/11/14 18:42, Don Y wrote: >>>>> On 11/2/2014 10:09 AM, glen herrmannsfeldt wrote: >>>>> >>>>>>> A naive approach to this would be to plumb dd to a >>>>>>> compressor -- running both OUTSIDE the native OS. But, >>>>>>> for large/dirty volumes, this gives you an unacceptably >>>>>>> large resulting image -- because you end up having to >>>>>>> store "discarded data" which could potentially be HUGE >>>>>>> (consider a large volume that has seen lots of >>>>>>> write/delete cycles) esp in comparison with the actual >>>>>>> precious data! >>>>>> >>>>>> For older disks, that are usually relatively small, that >>>>>> is probably the best choice. >>>>> >>>>> The problem comes with newer disks. E.g., I keep ~1T on >>>>> each workstation and *only* drag "current projects" onto them >>>>> counting on the file servers to maintain most of my stuff >>>>> "semi-offline". >>>>> >>>>> So, you can easily have just 5 or 10% of the disk "in use" >>>>> but 90% of it "dirty". Dirt doesn't compress nearly as well >>>>> as "virgin media" :> >>>> >>>> How about replacing the 1 TB harddisk with an 80 MB SSD? Then >>>> you enforce the rule that only a small amount is on the disk >>>> locally, you can use a simple dd with compression, and >>>> everything works faster. >>> >>> How does that give me 1T of storage? >> >> It doesn't give you 1 TB on each machine - that's the point. Keep >> your main data safe in some big system (with raid, backups, etc., >> in whatever way you see fit) and just have the software and >> /necessary/ working sets on the local machines. So instead of >> having 5-10% of 1 TB "in use" and the rest "dirty" or >> "semi-offline", you have 90% of 100 MB in use. > > That's exactly what *I* do: > > The problem comes with newer disks. E.g., I keep ~1T on each > workstation and *only* drag "current projects" onto them counting on > the file servers to maintain most of my stuff "semi-offline". > > Executables (and their documentation, support, etc.) are typically in > the ~100G ballpark. The balance of the 1T is for whatever documents > and "originals", libraries, etc. that I happen to be working on at > the time. > > [dynamically loading executables from an off-line store just doesn't > work on many machines. And, none of this would work for a student's > laptop!] > > The point of my 1T example (pick ANY number for "total system > capacity") is that most of the sectors can be "dirty" -- have "seen" > data at some point in the past -- so you can't assume that "empty" > would mean "compress readily" (as would be the case for a solution > that was FS *aware*!) > > I.e., the advantage of a FS-aware approach is you know which portions > of the medium are significant -- "worth preserving" -- so the balance > can compress to take NO space in the image. > >> Such a system won't suit everyone, but when it works, it >> simplifies backup and machine independence significantly. It also >> makes it a lot easier to track versions and "current" data, instead >> of having local copies and server copies that are a bit different - >> you have /only/ server copies and tracked backups of them. > > That's exactly what I do -- though I may keep multiple branches on > the machine while I am working on it so I can spin down the archive > until I really need to check something back *in* (assuming I *will* > do so) >I am thoroughly confused as to what you are trying to do here. On the one hand, you want a filesystem aware process so that you can image only real files, not empty space (or leftovers in deleted space). On the other hand, you want something completely independent of the filesystem. On the one hand, you have only a small amount of real data in use, and on the other hand you might have large amounts of data copies that you also want to image. So lets get back to basics, and try to understand your setup. First, how many machines are we talking about? What sort of different systems varieties do you have in the OS's and filesystems? What about the different developers and users - are they at similar levels and are they cooperative/competent, or are you going to have to do these backups and images because they are not good at following version control routines? The way I would organise this all is that the server (or servers) are masters. You have full control of these - you use raid to protect against hardware failures, and regular snapshot-style backups (such as with rsync or btrfs snapshots). /All/ your data is there. Where practical, it is in the form of version control repositories. Other data, especially more static data, may be just an area of shared files, which is backed up with snapshots. On local machines, you have only temporary copies of any data while working. Losing that data is an inconvenience, but should not be a disaster. Users check out from the repositories, do their work, and check in changes. If you have need of data that is not part of the repositories, but should be kept safe, it is either accessed directly as part of the servers shared files, or you use rsync or similar backup strategies to copy from the local machine to a safe area on the server. Imaging of the local machines is just a convenience to get the system running again faster if there is a hardware failure. It is usually done after setting up the basic system and installing key programs, and perhaps on occasions afterwards after major upgrades or installations. It is not about data backup, but merely saving time. Usually something like clonezilla or Norton Ghost to an external disk will be fine - if something goes wrong with the main disk, you can simply put in the imaged copy. Imaging can also be useful if you have multiple systems with the same setup - you might want images stored on a central server in this case. But you don't image the data - you only image the OS, programs, and setup. Is there something special about your needs that makes such a system impractical?
Reply by ●November 3, 20142014-11-03
Hi David, On 11/3/2014 3:36 AM, David Brown wrote:>>>>>>>> A naive approach to this would be to plumb dd to a >>>>>>>> compressor -- running both OUTSIDE the native OS. But, >>>>>>>> for large/dirty volumes, this gives you an unacceptably >>>>>>>> large resulting image -- because you end up having to >>>>>>>> store "discarded data" which could potentially be HUGE >>>>>>>> (consider a large volume that has seen lots of >>>>>>>> write/delete cycles) esp in comparison with the actual >>>>>>>> precious data! >>>>>>> >>>>>>> For older disks, that are usually relatively small, that >>>>>>> is probably the best choice. >>>>>> >>>>>> The problem comes with newer disks. E.g., I keep ~1T on >>>>>> each workstation and *only* drag "current projects" onto them >>>>>> counting on the file servers to maintain most of my stuff >>>>>> "semi-offline". >>>>>> >>>>>> So, you can easily have just 5 or 10% of the disk "in use" >>>>>> but 90% of it "dirty". Dirt doesn't compress nearly as well >>>>>> as "virgin media" :> >>>>> >>>>> How about replacing the 1 TB harddisk with an 80 MB SSD? Then >>>>> you enforce the rule that only a small amount is on the disk >>>>> locally, you can use a simple dd with compression, and >>>>> everything works faster. >>>> >>>> How does that give me 1T of storage? >>> >>> It doesn't give you 1 TB on each machine - that's the point. Keep >>> your main data safe in some big system (with raid, backups, etc., >>> in whatever way you see fit) and just have the software and >>> /necessary/ working sets on the local machines. So instead of >>> having 5-10% of 1 TB "in use" and the rest "dirty" or >>> "semi-offline", you have 90% of 100 MB in use. >> >> That's exactly what *I* do: >> >> The problem comes with newer disks. E.g., I keep ~1T on each >> workstation and *only* drag "current projects" onto them counting on >> the file servers to maintain most of my stuff "semi-offline". >> >> Executables (and their documentation, support, etc.) are typically in >> the ~100G ballpark. The balance of the 1T is for whatever documents >> and "originals", libraries, etc. that I happen to be working on at >> the time. >> >> [dynamically loading executables from an off-line store just doesn't >> work on many machines. And, none of this would work for a student's >> laptop!] >> >> The point of my 1T example (pick ANY number for "total system >> capacity") is that most of the sectors can be "dirty" -- have "seen" >> data at some point in the past -- so you can't assume that "empty" >> would mean "compress readily" (as would be the case for a solution >> that was FS *aware*!) >> >> I.e., the advantage of a FS-aware approach is you know which portions >> of the medium are significant -- "worth preserving" -- so the balance >> can compress to take NO space in the image. >> >>> Such a system won't suit everyone, but when it works, it >>> simplifies backup and machine independence significantly. It also >>> makes it a lot easier to track versions and "current" data, instead >>> of having local copies and server copies that are a bit different - >>> you have /only/ server copies and tracked backups of them. >> >> That's exactly what I do -- though I may keep multiple branches on >> the machine while I am working on it so I can spin down the archive >> until I really need to check something back *in* (assuming I *will* >> do so) > > I am thoroughly confused as to what you are trying to do here.Because I'm trying NOT to let the thread drift! :>> On the one hand, you want a filesystem aware process so that you can > image only real files, not empty space (or leftovers in deleted space).That conclusion doesn't necessarily follow. What I want to do is not waste "image space" on "dead data" (deleted files) -- WITHOUT EXPLICITLY KNOWING WHAT IS DEAD (because I have no metadata from a "file system" to tell me what is live/dead)> On the other hand, you want something completely independent of the > filesystem.Correct.> On the one hand, you have only a small amount of real data in use, and > on the other hand you might have large amounts of data copies that you > also want to image.I want to image the entire disk -- because I can't know what's live/dead. But, I don't want to waste space/time on "dead content". So, want a scheme (which may include "procedures" and not just "code") that will effectively give me that information without EXPLICITLY seeking it. E.g., as I proposed, if I create files filled with some highly compressible data ("Dear Compressor, when you encounter me in your input stream, please represent me in your compressed output stream by the special token 'BIG_CANNED_STRING'. In doing so, you will know exactly how to reconstruct me without wasting much space on my actual content. Chances are, you will encounter me many, many, many times as you scan through the blocks of this drive..."), that data gets moved onto real platters (when disk cache is flushed, etc.). Once I "run out of room" in the filesystem (more or less), I will have consumed the previous "dead space" (free space) with files of this type. I can then unlink all of these files thereby recreating the "dead space". But, while the previous content of this dead space was unrecognizable (without knowledge of the filesystem), it can now be recognized as such a filesystem agnostic piece of code (later).> So lets get back to basics, and try to understand your setup. > > First, how many machines are we talking about?Personally, about 30 or 40 drives (e.g., some "machines" have multiple drives). Note that a "machine" does not have to be a PC. Nor a SPARCstation. etc. For my pro bono work, probably 200 - 400 yearly (but, that will hopefully only be 20-40 different "model numbers", 10 or more instances of each)> What sort of different systems varieties do you have in the OS's and filesystems?Personally, three different flavors of Solaris, three Windows, three NetBSD, a couple of oddball "OTS" systems (Jaluna, Inferno, etc.) and probably a dozen different "appliance"/proprietary implementations (effectively black boxes). Pro bono is much easier. They'll either be PC's or Macs. But, their OS's will largely be defined by whatever happens to *run* on that particular hardware (donations may be of various "ages"). I'm guessing three different Windows (though within that, there can be minor variations like Home, Pro, Business, etc. editions -- possibly even on the same make/model hardware). Probably two different OSX versions (??).> What about > the different developers and users - are they at similar levels and are > they cooperative/competent, or are you going to have to do these backups > and images because they are not good at following version control routines?They aren't "backups" (see my thread to George). They are "restore images". They aren't regularly performed (like backups would be). Rather, a machine is imaged (typically *once*) and the image saved in order to recreate the machine's state at a future time (if it gets munged). So, I expect this to be far more involved than the routine "backups" I do for my working files/configuration. But, I expect the "restore" to be far simpler (UX) -- "push this button and wait". For example, when I build a new system, here, I image the disk at various stages in that process. This lets me quickly return to one of those points in time (if, for example, I make some annoying mistake in a subsequent stage and want to "undo" it). Prior to putting the system into daily use, I have a final snapshot image. I.e., I can reproduce the software installation and configuration process very quickly if I have to at a later date (because a disk died, because some app scribbled somewhere that it shouldn't, etc.). Instead of DAYS to rebuild the system (individually installing and configuring each application, etc.), I can do it all in a matter of minutes. And not worry about the things that an incremental approach might fail to address (have I removed ALL the cruft? have I added all the changes back in? etc.) For the students, they tend to be careless users. And, there's always a certain amount of "I didn't pay for it so I take it for granted" attitude involved. ("If it breaks, I'll just ask for a new one. THAT won't cost (me) anything, either!") Originally, I was pursuing a "build a set of CD's/DVD's for each machine" that would allow them to restore their machines (without my involvement). But, these would get lost, misplaced, etc. No real incentive for the student to keep track of them (most are also homeless so that would be one more thing for them to keep track of, "just in case"). You'd be naive to imagine that this WOULDN'T turn into "I lost my restore DVD. Can you just make me another one (DVD)?" "Well, if you can't do that, can I just trade this machine in and get ANOTHER machine?" If, instead, the restore mechanism is on the disk (just like a factory restore partition -- but, with the *final* disk image instead of the *initial*/factory image), the student has no excuse to claim he's lost the DVD or "doesn't know how to repair/restore the machine". Additionally, if the student feels his machine may have been "compromised" (perhaps an AV update points to the presence of a virus on his machine), he can "clean it" himself. (Hysterically, this has resulted in machines being returned and the system being rebuilt from scratch. *I* have no desire to be in that business -- ESP in an unpaid capacity! :> )> The way I would organise this all is that the server (or servers) are > masters. You have full control of these - you use raid to protect > against hardware failures, and regular snapshot-style backups (such as > with rsync or btrfs snapshots). /All/ your data is there. Where > practical, it is in the form of version control repositories. Other > data, especially more static data, may be just an area of shared files, > which is backed up with snapshots.That;s what I do for my personal machines. But, that doesn't mean it is easily usable in that form! E.g., I have ISOs of every CD/DVD I've purchased. But, if I have to go through the trouble of *installing* it to be able to *use* it, then having the original is just a small part of the solution. Typically, I build specific machines for specific roles/purposes. Once built, I image their drives and preserve the images on a bunch of (removable) SATA/PATA/SCA/FW drives (depending on how I will ultimately need to restore that image) -- along with the installation log that documents every step in that particular build process. This just saves me from having to repeat all that labor (install & configure) in a hardware failure, screwup on my part or if I just want to upgrade the local drives to larger ones, etc.> On local machines, you have only temporary copies of any data whileThe "data" is the key, here. I can drag the documentation, schematics, PCB artwork, sources, etc. for a project out of the repository, work on it and then discard or recommit any changes -- PAINLESSLY. Because it's just "data" and not "workstation executables". There is no installation or configuration involved. I can erase it and know that there are no vestiges hiding somewhere unseen. cd /Playpen rm -r *> working. Losing that data is an inconvenience, but should not be a > disaster. Users check out from the repositories, do their work, and > check in changes. If you have need of data that is not part of the > repositories, but should be kept safe, it is either accessed directly as > part of the servers shared files, or you use rsync or similar backup > strategies to copy from the local machine to a safe area on the server. > > Imaging of the local machines is just a convenience to get the system > running again faster if there is a hardware failure. It is usually done > after setting up the basic system and installing key programs, andExactly.> perhaps on occasions afterwards after major upgrades or installations.I image each machine exactly once. I rarely update applications, especially if the reason for the update is solely security related. When the machine is upgraded, I move on to the newer applications which get folded into the image for that newer machine.> It is not about data backup, but merely saving time. Usually something > like clonezilla or Norton Ghost to an external disk will be fine - if > something goes wrong with the main disk, you can simply put in the > imaged copy. Imaging can also be useful if you have multiple systems > with the same setup - you might want images stored on a central server > in this case. But you don't image the data - you only image the OS, > programs, and setup.Exactly. That is the case with the pro bono effort: archive images for each type (make/model) of machine encountered. Install the COMPLETE image (including the "recovery partition") from the server. If I encounter another "identical" (make/model) machine in the future (often!), then I don't have to bother recreating a suitable image for that machine. And, thereafter, the user (student) can restore the "system partition" (but none of their data -- because the recovery partition has no knowledge of their data!) at will instead of relying on me to perform that task for them. I.e., they no longer have an "excuse" to ask (expect) someone to solve THEIR problem (because, chances are, the reason their computer is "all gunked up" is because of poor practices on their part!) Those that try to "beat the system" by actually BREAKING their computer are "rewarded" by going to the end of the line: "Gee, it's too bad you dropped your laptop off the bus! We'll see if we can salvage any PARTS off of it. And, put you in line to get a replacement. But, there are 187 people ahead of you so you probably won't get one before sometime in the NEXT school year!" [This is not an exaggeration. :< ]> Is there something special about your needs that makes such a system > impractical?
Reply by ●November 3, 20142014-11-03
On Mon, 03 Nov 2014 10:09:48 GMT, Jan Panteltje <pNaonStpealmtje@yahoo.com> wrote:>I think I have not 'defragmented' anything ever in my life in Linux, >there is no need.That isn't entirely true - at least not with inode filesystems. The n-way tree structuring and inode caching reduce the need to defragment, but where sequential read performance is important, it still pays to defragment. George
Reply by ●November 3, 20142014-11-03
On 11/3/2014 3:09 AM, Jan Panteltje wrote:> On a sunny day (Sun, 02 Nov 2014 15:07:33 -0700) it happened Don Y > <this@is.not.me.com> wrote in <m369v4$tj$1@speranza.aioe.org>: > >> On 11/2/2014 12:21 PM, Jan Panteltje wrote: >>> On a sunny day (Sun, 02 Nov 2014 12:06:10 -0700) it happened Don Y >>> <this@is.not.me.com> wrote in <m35vb1$3qp$1@speranza.aioe.org>: >>> >>>>> Not quite sure whatyouwant, but I have done this a lot: >>>>> start some recue disk, plug in some USB disk. >>>>> mount the partition you want, then: >>>>> tar -zcvf partition_sda1_image.tgz /dev/sda1 >>>> >>>> The problem is creating "partition_sda1_image.tgz" *without* being concerned >>>> with the underlying filesystem. So, you have no knowledge (from the filesystem >>>> layer) of the "valid" contents of the volume (vs. blank/deleted content). >>> >>> Sure you can dd that partition, but now you really are in trouble. >>> Its safer to tar a filesystem (that should NOT be currently running, else you are in trouble too), >>> you can always untar it into an other filesystem (ext2, ext4, reiserfs, etc) that is compatible. >> >> By imaging the volume OFF-LINE (i.e., when the OS is NOT in control of the >> hardware), you ensure that the filesystem's state is self-consistent >> (assuming you shut the OS down cleanly, etc.). This neatly avoids the >> issues of "locked files" and privilege that might otherwise interfere >> with your imaging of the *entire* filesystem (e.g., even those parts not >> directly accessible to "applications" -- regardless of privilege) >> >> But, this comes at the cost of not knowing which parts (sectors) of the >> medium actually have "content" that must be preserved! So, you have to >> include every sector in your image! > > > Not if you tar a filesystem (on that partition) see the script I posted. > So > mount partition, say: > mount /dev/sdd1 /mnt/sdd1mount(8) brings filesystem specific code into the environment. Tell me how you are going to do this WITHOUT invoking the mount command!> tar that filesystem: > tar -zcvf my_sdd1_backup.tgz /mnt/sdd1/* > > If the partition has no files the tgz will be very very small.Try gzip'ing /dev/sdd1 and look at the size of the resulting file! (i.e., /dev/sdd1 being the raw/block device without ANY knowledge of the filesystem it is currently supporting!)> If you want it back, > create any filesystem, and tar -zxvf my_sdd1_backup.tgz > > All links and timestamps sare preservd too that way. > >>> The other thing I have noticed is that when copying partitions of disk >>> images to bluray with my LG writer, some bytes at the start seem to get >>> changed, could be an error, but this does not seem to happen when copying >>> to a filesystem. >>> The other thing is in case restore of an image to a similar device, >>> I found that for example 8 GB card 1 (as source) has a different _real_ >>> size than 8 GB card 2 (same make, same type, same specified size, bought >>> at the same time) this could be related to bad sector managing of FLASH. >> >> Experience teaches you to always leave a bit of the medium "unused" >> (i.e., not present in ANY partition) to accommodate "small" variations >> between drives. Many megabytes on a 100G drive is "noise". Likewise, >> a GB on a TB drive is similarly "noise". >> >> I've found this easier than worrying about resizing partitions (which >> typically requires the partition to have been "defragmented" beforehand >> to ensure there is nothing present at the tail end of the partition). > > I think I have not 'defragmented' anything ever in my life in Linux, > there is no need.Then, you can't arbitrarily shrink a "filesystem" because you don't know where the "live data" resides on it, currently. A file could sit in the last N sectors of the partition and you wouldn't know it. Shrinking the partition by M>N sectors means your file gets cut off the end! (you have to explicitly or implicitly MOVE the file to ensure it doesn't fall past the end of the trimmed partition)>>> If the new card (or disk) that you copy to, is smaller, then you are in trouble too. >>> I _always_ run a full byte by byte compare after burning a bluray... >>> wrote a program for that dvdimagecmp. >>> I use it to check FLASH images too. >>> All things of importance these days go to bluray kept in a dark metal case. >>> It has close to a thousand disks in it. >>> >>>> Then, restore it *without* the rescue disk. >>> >>> Na, never work on a running filesystem, UNLESS its just a file or you know >>> _exactly_ what is going on. >> >> That is exactly the point: you're running on bare metal knowing exactly >> what that metal GUARANTEES to the bootstrap loader. No need to worry >> that some flakey driver is going to have to be accommodated, etc. No >> need for ANY "OS" at all! Every opcode that executes is one that you wrote >> (aside from the "well behaved" BSP/BIOS/OFW/etc. > > Sorry cannot follow you there....I'm going to give you a RAW disk. It has data on it and "deleted data". I don't want to waste space preserving the "deleted data" in the image that I create. When you install that disk in your machine, you are going to discover that you can't "mount" it! I have changed the partition ID to some wacky value that the system from which I pulled it recognizes as "Customized FFSv2 Partition". The only thing that is really "customized" about it is this oddball partition type identifier *and* a macro wrapper that causes each reference to an inode to refer, instead, to "~inode". The system on which the drive was mounted (containing these two changes) has no problem creating, accessing and deleting data on that medium. With virtually identical performance to a "genuine" FFSv2 filesystem. But, YOUR tools won't recognize its contents. (I suspect simply changing the magic number assigned as the partition type would be enough to cause problems!) [Of course, this is a hypothetical machine. I pose this to illustrate the case for ANY FILESYSTEM TYPE NOT CURRENTLY KNOWN TO YOUR IMAGING TOOLS!] By contrast, the scheme that I outlined (upthread) will allow me to "fill" unused areas of the drive with "predictable, highly compressible content" USING THE NORMAL USER TOOLS PRESENT ON THAT ORIGINAL SYSTEM. Then, unlink those files. And, finally, run my executable OUTSIDE the scope of that OS (as it only needs to deal with the raw disk hardware). You, OTOH, can best hope to do something like: dd if=/dev/raw_drive | gzip > image.gz And, your image.gz will typically be much larger because it will not be able to determine which is "deleted data" in the raw disk contents.
Reply by ●November 3, 20142014-11-03
On Mon, 03 Nov 2014 12:11:28 -0500, George Neuner <gneuner2@comcast.net> Gave us:>On Mon, 03 Nov 2014 10:09:48 GMT, Jan Panteltje ><pNaonStpealmtje@yahoo.com> wrote: > > >>I think I have not 'defragmented' anything ever in my life in Linux, >>there is no need. > >That isn't entirely true - at least not with inode filesystems. The >n-way tree structuring and inode caching reduce the need to >defragment, but where sequential read performance is important, it >still pays to defragment. > >GeorgeThere would be no fragmentation unless those sequentially read files were constantly being opened and added to, and even THOSE file writes are full commits, free of fragmentation on those file systems. Kind of like saying "inconceivable". "I do not think that word means what you think it means." Sequential read performance is ONLY degraded on FILE reads of fragmented files. So unless you are operating a database, and keep all your dynamic data on the same volumes as you system and static files, you would see the same number, even if the volume does have some fragmented files on it. But again, you speak of the file system with seeming good intimacy. But I was under the impression that this file system operates in such a way that fragmentation like that which occurs on a FAT type system, never happens. You are saying EXT fs DO fragment files? I think the actual file sizes might play into one's thinking here too. Sequentially reading large scattered chunks is not that hard either. It is the database file that has had 50 0.5 kB commits done on it in the last hour that fragment a FAT drive. Unchanging files do not fragment. The "holes" between them and the deleted files do not pose a huge problem either. It is that ONE file that has so many segmented locations to string together in a single "read". Still... I did not know that ext fs drives fragment.
Reply by ●November 3, 20142014-11-03
On Mon, 03 Nov 2014 10:17:18 -0700, Don Y <this@is.not.me.com> Gave us:> >I'm going to give you a RAW disk. It has data on it and "deleted data". >I don't want to waste space preserving the "deleted data" in the image >that I create. > >When you install that disk in your machine, you are going to discover that >you can't "mount" it! I have changed the partition ID to some wacky value >that the system from which I pulled it recognizes as "Customized FFSv2 >Partition". The only thing that is really "customized" about it is this >oddball partition type identifier *and* a macro wrapper that causes each >reference to an inode to refer, instead, to "~inode". > >The system on which the drive was mounted (containing these two changes) >has no problem creating, accessing and deleting data on that medium. >With virtually identical performance to a "genuine" FFSv2 filesystem. > >But, YOUR tools won't recognize its contents. (I suspect simply changing >the magic number assigned as the partition type would be enough to cause >problems!) > >[Of course, this is a hypothetical machine. I pose this to illustrate the >case for ANY FILESYSTEM TYPE NOT CURRENTLY KNOWN TO YOUR IMAGING TOOLS!] > >By contrast, the scheme that I outlined (upthread) will allow me to >"fill" unused areas of the drive with "predictable, highly compressible >content" USING THE NORMAL USER TOOLS PRESENT ON THAT ORIGINAL SYSTEM. >Then, unlink those files. And, finally, run my executable OUTSIDE the >scope of that OS (as it only needs to deal with the raw disk hardware). > >You, OTOH, can best hope to do something like: > dd if=/dev/raw_drive | gzip > image.gz >And, your image.gz will typically be much larger because it will not >be able to determine which is "deleted data" in the raw disk contents.You need to author your own version of a forensic duplicator. You are not duplicating a volume or its contents. You are duplicating every last sector on the drive, and IF you insist on not "mounting" a volume the ONLY type of success you will get would be an entire copy, including deleted data. Bit for bit, then compress that datagram. If ANY "deterministic" cues are used to decide what is or is not deleted data, you ARE looking at files and you ARE looking at them via the file system, and WILL have to mount the drive and use its tables to do so.







