On 12.11.2014 г. 19:21, Don Y wrote:> Hi Dimiter, > > On 11/3/2014 11:50 PM, Dimiter_Popoff wrote: >> On 03.11.2014 г. 23:22, Don Y wrote: > >>> First, can a user create files having arbitrary names and contents >>> under your FS? >> >> Yes of course. Pretty much everything you would expect from a >> filesystem. > > To be clear, by "user" I mean can a human being walk up and > enter "arbitrary text" into a file, etc.? I.e., I had assumed > your filesystem handled files that the *instrument* created > (e.g., observational data, instrument generated reports, etc.). > Could a purchaser store his email addresses in a file called > MYADDRS.TXT? Could he, likewise, create a file filled with > repeated strings of "Kilroy was here!"?Hey, of course you can do that. To cut things short, all of my programming is done under DPS, using its text editor, writing dps shell scripts etc. etc. The emails we exchange have never been here on a disk other than a dps one. *All* my design & programming work is done under dps, I can happily survive the day when windows and unix disappear from the face of the Earth :-).> >>> Can he copy & rename files? >> >> Yes, what filesystem would it be without that :D . > > Again, my questions are meant to clarify that a *user* can > do these things on demand -- not just the *instrument* deciding > that it needs to do a "COPY", etc. (for its own purposes)Well yes, though mostly using command line (shell). I have started a file browser thing (began to make sense to have one once I introduced the longnamed directories) but at the moment it is on hold, other tasks of higher priority are in the way. Hopefully I can resume next month.> >>> E.g., could he <somehow> introduce a file having some particular >>> contents (like "DELETEDDELETEDDELETEDDELETED...") to your FS? >> >> I'd be tempted to go to all 0 files for the "highly compressed" >> pattern - for no good reason really, except perhaps because disks >> come as all 0 from the factory. But you will want to fill them >> up anyway so this is not a consideration. >> >>> Then, could he replicate it many times? (copy to a different >>> filename) >>> >>> Having done that until the copy failed ("No space left on device"), >>> presumably, he could delete each of them? (perhaps made simpler >>> by creating them all in a single subdirectory/folder and then >>> just deleting the folder AND its contents) >> >> Yes, all of the above. Making multiple copies of a single file will >> take 2-3 lines of script, to increment the name somehow. Deleting >> all in a directory goes the usual del * way, if you want recursion >> there is a script doing it (rm path/ -R ). I have deliberately >> kept recursive disk operations in scripts, makes new bugs show up, >> costs no overhead to speak of, can be retried/resumed, prompts me >> to write necessary extensions when there is some new need etc. > > OK. So, I *could* fill YOUR disk with files containing a specific > 512 character string (or larger). Then, once the OS complains > "no space left on device", I could delete them all thereby freeing > up all that space -- yet, leaving that 512 character string on the media > (in the deleted files)Yes, this would work OK. Dimiter (sorry for cutting it somewhat short, those tasks of "higher priority" have me at the moment....
Disk imaging strategy
Started by ●November 2, 2014
Reply by ●November 12, 20142014-11-12
Reply by ●November 13, 20142014-11-13
On 11/4/2014 11:56 AM, Stefan Reuther wrote:> Don Y wrote: >> On 11/3/2014 10:56 AM, Stefan Reuther wrote: >>>> Sure you can dd that partition, but now you really are in trouble. >>>> Its safer to tar a filesystem (that should NOT be currently running, >>>> else you are in trouble too), >>>> you can always untar it into an other filesystem (ext2, ext4, >>>> reiserfs, etc) that is compatible. >>> >>> It depends. >>> >>> 'dd'ing the raw partition is almost guaranteed to produce a working >>> image after unpacking. If you 'tar' a mounted file system, the operating >>> system you run the 'tar' on must suppot all nuances of the file system >>> you want to clone. Back in Win95 days, cloning (or backup/restore) a >>> Win95 installation using 'tar' from Linux did not work, because it did >>> not restore all required file attributes. I wouldn't expect Linux 'tar' >>> to capture all NTFS attributes (like "compressed", ACLs, ADS) as well. >>> Copying the partition blockwise would not have all these problems. >> >> Exactly. But, you don't want your image to HAVE TO BE as large as the >> original. Esp as most disks have a fair bit of unused space. >> >> Hence the problem I posed: how do you sort out what is "unused space" >> from "used space" -- in a manner that allows you to ignore the actual >> metadata/etc. imposed by the particular filesystem implementation. > > I have often used the "fill the file system with a file that is all > zeroes" trick you already mentioned, so I cannot add anything new for > that, other than a "+1, yes this works".There are many different ways of "conditioning" the media to minimize the size of the image that you obtain (obviously, tailoring the imaging process to the conditioning that you employed). I started out by characterizing the content of each of the machines that I have here along with those to which I have access. Then, exploring how well various archivers (an obvious choice for an imaging solution!) process each of these (I've encountered some *spectacular* compression rates with home-grown archivers that far outpace what can be done with generic, OTS companders... rather easy to achieve when you control the data that is being compressed! ;> ) As I'm after a "restore" solution and not a "backup AND restore" solution, far more effort can be expended in creating the initial image if the restore can be relatively clean. Just like an archiver/compander, I can look at the content, evaluate multiple different imaging approaches (different companders), then choose the appropriate one to implement "portably" to give me a common "restore" algorithm that handles multiple different targets with comparable results on each. I.e., store an "image type" code in the image that then drives the restorer! Being stuck living with the "GCD" of operators that are "always" exported by a disk operating system means you can really only safely create/delete, read/write files -- and possibly subdirectories. Unless you want to tie your solution to a particular OS/filesystem (which, in my case, would mean solving the same problem a dozen different times -- especially as the next batch of donated machines might introduce some new, "proprietary" filesystem that must be reverse-engineered/accommodated). Of course, anything you want to do to the medium *before* the OS and applications are in place gives you free reign! :> Similarly, the restore operation wants to run on bare metal so it needs to be well defined without any supporting framework. So far, it looks like I can achieve image sizes on a par with that of archivers (which KNOW about the OS on which their archives were created) WITHOUT specific knowledge of the OS/filesystem. (Of course, *I* don't have to deal with unconstrained data/environments so its an unfair comparison!)
Reply by ●November 13, 20142014-11-13
Hi Dimiter, On 11/12/2014 3:15 PM, Dimiter_Popoff wrote:> On 12.11.2014 г. 19:21, Don Y wrote: >> On 11/3/2014 11:50 PM, Dimiter_Popoff wrote: >>> On 03.11.2014 г. 23:22, Don Y wrote: >> >>>> First, can a user create files having arbitrary names and contents >>>> under your FS? >>> >>> Yes of course. Pretty much everything you would expect from a >>> filesystem. >> >> To be clear, by "user" I mean can a human being walk up and >> enter "arbitrary text" into a file, etc.? I.e., I had assumed >> your filesystem handled files that the *instrument* created >> (e.g., observational data, instrument generated reports, etc.). >> Could a purchaser store his email addresses in a file called >> MYADDRS.TXT? Could he, likewise, create a file filled with >> repeated strings of "Kilroy was here!"? > > Hey, of course you can do that. To cut things short, all of my > programming is done under DPS, using its text editor, writing > dps shell scripts etc. etc. The emails we exchange have never > been here on a disk other than a dps one. > *All* my design & programming work is done under dps, I can > happily survive the day when windows and unix disappear from > the face of the Earth :-).Grrr... my bad! I kept thinking solely about the netmca and how *it* is used. Forgot entirely about your development environment! :-/ [My comments/reservations should make more sense in the netmca context...]
Reply by ●November 13, 20142014-11-13
On 11/4/2014 8:50 AM, Jan Panteltje wrote:> On a sunny day (Tue, 04 Nov 2014 08:27:39 -0700) it happened Don Y > <this@is.not.me.com> wrote in <m3ar9i$j8l$1@speranza.aioe.org>: > >> On 11/4/2014 2:46 AM, Jan Panteltje wrote: >> >>> <snipped rant> >> >> (sigh) And it is obvious that you didn't look through the sources. >> >>> Look, you want to compress a partition image without knowing what >>> filesystem is on it, that leaves gzip (or maybe zip). >>> >>> Get a life. >> >> Given the above, I suggest you "get an education". > > No, you are clueless, and try to invent things that have already been > invented. Try: gzip -c /dev/sdaX > my_gzipped_partition.gz > > If ANY regular struture is present on that device, then it will be replaced > with some token. > > You'r rude too, well no worry I'm sure you will not invent a better gzip... > > You are clueless!(sigh) You *really* should do your homework before shooting your mouth off. And, if you *think* about it, it is EASY to come up with a "better gzip"! THINK ABOUT IT before you stick your foot in your mouth. If you can't come up with a compressor that achieve rates of 4000:1 ON THE BACK OF A NAPKIN then you shouldn't be writing code. ANY code! (Remember, *you* can pick the data to be compressed! gzip has to live with whatever data it *encounters*! Be wary of "assumptions" as they'll always trip you up!) I'll wait for you to post your 4000-fold compression algorithm...
Reply by ●November 13, 20142014-11-13
On 13.11.2014 г. 08:45, Don Y wrote:> Hi Dimiter, > > On 11/12/2014 3:15 PM, Dimiter_Popoff wrote: >> On 12.11.2014 г. 19:21, Don Y wrote: >>> On 11/3/2014 11:50 PM, Dimiter_Popoff wrote: >>>> On 03.11.2014 г. 23:22, Don Y wrote: >>> >>>>> First, can a user create files having arbitrary names and contents >>>>> under your FS? >>>> >>>> Yes of course. Pretty much everything you would expect from a >>>> filesystem. >>> >>> To be clear, by "user" I mean can a human being walk up and >>> enter "arbitrary text" into a file, etc.? I.e., I had assumed >>> your filesystem handled files that the *instrument* created >>> (e.g., observational data, instrument generated reports, etc.). >>> Could a purchaser store his email addresses in a file called >>> MYADDRS.TXT? Could he, likewise, create a file filled with >>> repeated strings of "Kilroy was here!"? >> >> Hey, of course you can do that. To cut things short, all of my >> programming is done under DPS, using its text editor, writing >> dps shell scripts etc. etc. The emails we exchange have never >> been here on a disk other than a dps one. >> *All* my design & programming work is done under dps, I can >> happily survive the day when windows and unix disappear from >> the face of the Earth :-). > > Grrr... my bad! I kept thinking solely about the netmca and > how *it* is used. Forgot entirely about your development > environment! :-/ > > [My comments/reservations should make more sense in the netmca > context...]Hah, nothing that bad about forgetting something, come on. BTW the netmca runs a complete DPS on it, shell windows and all. It even has the development software on it (not the data of course), not that users do use it a lot, not to my knowledge at least. It just has no display controller, relies on the network to be VNC accessible. Dimiter
Reply by ●November 13, 20142014-11-13
On 11/13/2014 12:06 AM, Dimiter_Popoff wrote:> On 13.11.2014 г. 08:45, Don Y wrote:>> Grrr... my bad! I kept thinking solely about the netmca and >> how *it* is used. Forgot entirely about your development >> environment! :-/ >> >> [My comments/reservations should make more sense in the netmca >> context...] > > Hah, nothing that bad about forgetting something, come on.<frown> I've been most concerned with *appliances*, here, because they have the most restricted (human) interfaces. E.g., I can create, write and delete (somewhat) arbitrary files on the disks in my *printers*... but, can't run executables, there (well, this is a small lie but not a practical one!). The same sort of thing is true of my NAS boxes... I can freely and easily -- even programmatically -- create and delete files (e.g., from a remote host mounting. them as foreign filesystems) But, executing an arbitrary executable DIRECTLY on those boxes isn't possible (not the least bit because the OS isn't openly documented -- just like the OS on the printers)> BTW the netmca runs a complete DPS on it, shell windows and all.Ah, OK. So, you don't have an "embedded" version of it with reduced capabilities/features.> It even has the development software on it (not the data of course), > not that users do use it a lot, not to my knowledge at least. > It just has no display controller, relies on the network to > be VNC accessible.
Reply by ●November 13, 20142014-11-13
On 13.11.2014 г. 09:16, Don Y wrote:> On 11/13/2014 12:06 AM, Dimiter_Popoff wrote: > ... > >> BTW the netmca runs a complete DPS on it, shell windows and all. > > Ah, OK. So, you don't have an "embedded" version of it with > reduced capabilities/features.No need for that. Much of the functionality even fits in 2M flash.... how thinkable is that (about 1.5 to 2M lines of VPA code which is not generous with CRLF, unlike certain HLL-s :-) ). But booting off flash is intended just to be able to restore your HDD via the net if you mess it up. I have smaller versions of course, e.g. I am not tortured by a small coldfire (mcf52211) which has a tiny derivative of dps (mainly the scheduler and some library calls, about 7 kilobytes total). Bloody thing won't go into low power mode which is sort of specified to at least halve the consumption, nothing of the sort, *zero* effect of entering that mode by the core. Cost me two days so far to zero result. Not that I can't live without that mode but why does it not work, drives me mad. Dimiter
Reply by ●November 13, 20142014-11-13
On Tue, 04 Nov 2014 15:50:31 GMT, Jan Panteltje <pNaonStpealmtje@yahoo.com> wrote:> I'm sure [Don] will not invent a better gzip...FYI: gzip is _not_ the last word in general purpose compression. gzip uses on-the-fly LZH dictionary compression. gzip is pretty good, but 7z's LZMA usually does better. However, no on-the-fly compressor can do as well as a tool that performs batch analysis of the file(s) prior to compression and creates a dictionary customized for the batch. There used to be a number of batch oriented compression tools, but their 2-pass approach made them ever less suitable for handling ever larger batches. When LZ was introduced, streaming compression became "good enough" for general purpose and so the batch approach fell out of favor. While reserving judgment on whether Don could beat gzip for general purpose, he certainly should be able to beat it for his specialized purpose. George
Reply by ●November 13, 20142014-11-13
Hi George, On 11/4/2014 10:41 AM, George Neuner wrote:> On Mon, 03 Nov 2014 21:48:56 -0700, Don Y <this@is.not.me.com> wrote: > >> IIRC, the Bullet Server could (did?) create contiguously stored >> files. But, that was largely possible because of its "write once" >> semantics (size declared a priori). > > It could, but usually did not because the client usually could not > give it a size. Typically, a Bullet file was allocated in one or more > largish extents and then consolidated when the file was closed. AnySo, the disk was used as cache, temporarily, while the file was being built... perhaps a different *part* of the media (soas not to interfere with files that were being built "correctly"?) Yet another case of initial assumptions ("gobs of memory") being "off"! :>> copies made of an existing file - e.g., to/from a remote server - > always were contiguously stored.
Reply by ●November 13, 20142014-11-13
On 11/13/2014 11:30 AM, George Neuner wrote:> On Tue, 04 Nov 2014 15:50:31 GMT, Jan Panteltje > <pNaonStpealmtje@yahoo.com> wrote: > >> I'm sure [Don] will not invent a better gzip... > > FYI: gzip is _not_ the last word in general purpose compression. > > gzip uses on-the-fly LZH dictionary compression. gzip is pretty good, > but 7z's LZMA usually does better. > > However, no on-the-fly compressor can do as well as a tool that > performs batch analysis of the file(s) prior to compression and > creates a dictionary customized for the batch.Exactly. If you are compressing *once* -- and decompressing "often" (often > 1) -- AND can afford the time "up front", you can achieve better compression rates. E.g., brotli, zopfli, etc. *AND*, if you can choose the data that you want to compress, you can obviously design a compander that exploits that knowledge for higher compression rates!> There used to be a number of batch oriented compression tools, but > their 2-pass approach made them ever less suitable for handling ever > larger batches. When LZ was introduced, streaming compression became > "good enough" for general purpose and so the batch approach fell out > of favor. > > While reserving judgment on whether Don could beat gzip for general > purpose, he certainly should be able to beat it for his specialized > purpose.Goal isn't to be general purpose. Rather, to be good at *this* application! E.g., gzip can't do better than ~1000:1 (on carefully constructed data sets). If you can choose the data that you expect to be encountering (e.g., even the same data that gzip compresses to 1000:1!), you can easily beat that! gzip has to be all things to everyone and make tradeoffs because it ASSUMES it has no knowledge of the data. There's no reason to similarly encumber yourself when you *have* control of the data! Then, fall back on gzip (or any other suitable archiver) for the data over which you have *no* control! The mix of controllable and uncontrolled data determines your OVERALL compression rate. If the controllable data is plentiful enough, then it beats gzip when applied "overall". [In a few minutes, you can write a trivial compressor that will beat gzip (or any similar compander) even when the balance of controlled to uncontrolled is small! We're just waiting for Jan to take the time to write that code and come to that realization...]







