Windows registry diffs| page 2

Reply by Dimiter_Popoff ●November 15, 20162016-11-15

On 15.11.2016 &#1075;. 15:02, Don Y wrote:
> On 11/14/2016 4:19 AM, Dimiter_Popoff wrote:
>> On 14.11.2016 &#1075;. 01:52, Clifford Heath wrote:
>>> On 14/11/16 07:35, Dimiter_Popoff wrote:
>
>>>> Recently I introduced a sort of equivalent of it
>>> ...
>>>> I have implemented it on top of the dps file system,
>>>
>>> I don't know DPS, but that doesn't sound completely nuts,
>>> as long as DPS is journalled (though I expect, only on
>>> the directory structure, not the contents).
>>
>> In dps a directory can just be copied like any other file
>> and it will still point to the correct file entries at that
>> moment; it is not done automatically but I have done it in
>> the past. Then one can copy the directory file and set the
>> type of the copy to be non-directory so it won't confuse
>> the system by its duplicate pointers etc.
>> It is probably quite different from other filesystems as I
>> have done it without looking at many of them.
>
> I suspect you'd have the same sort of problem I'm currently
> encountering with the Windows Registry:  how could you
> "painlessly" track changes made to your "global store",
> "who" (which process) made them AND easily "undo" them.

Hi Don,
well, may be. But being on top of the filesystem gives me more
options, e.g. I have a "diff" script which compares all files
in a directory to those in another; I could make it recursive
like some others I have already (e.g. byte count (bc *.sa <top path>)
etc. It lists differing files and lists "not found" ones in
the second directory (second on the command line). I don't
think it will be a huge issue to locate a change this way,
it may be one of the changes are too many and I have to navigate
through all detected ones.

>
> I've been amused to discover that this *is* possible under
> Windows; but, much harder under my formal RDBMS implementation!
> Especially the "undo them" (well, maybe I could build a giant
> "transaction" around everything but I'm willing to be the
> RDBMS would die when faced with an open-ended issue like that!)

If I want to be able to undo something I would just make a copy
of the "global store" at some point I might need to return to,
making each operation "undoable" is not worth the overhead this
would cost I suppose.

>
> A solution for *you* might be to physically make a recrusive
> copy of your global store's root folder "off to the side,
> somewhere" -- then, use that to replace the modified store
> later, to return it to its original contents.  (?)

Oh yes, the root and other directories. I have done that and various
other things recovering from disasters; mainly my doing, I often
program system code and run it on the same machine... have had
my moments of course (I am tempted to say "not last 5 or 10 years"
but I'd rather not pull the devil by the tail).

>
>>> Personally if
>>> I had to use a filesystem I'd use a "content store" of
>>> immutable objects like the back end of GIT though.
>>
>> I believe my equivalent to this would be to use the directory
>> entry to store data into rather than a pointer to a file.
>> A directory entry in dps can store two SDW (segment descriptor
>> word, starting_block:length each, a total of 4 longwords); I
>> did this ages ago in order to be able to access files which
>> are in up to two pieces directly from the directory entry,
>> if more pieces are involved then the directory entry points
>> to a RIB (retrieve information block, a list of SDW-s).
>
> This approach is used to store "symbolic links"; i.e., use
> the directory entry to store a text pathname to the linked
> file (if the pathname is short enough to fit *in* a
> directory entry) without incurring other filesystem costs
> (i.e., a real data block)
>
> E.g., if a directory entry is supposed to symbolically reference
> some other point in the filesystem hierarchy:
>        file1
>        folder/
>             file2   -> /file1
>             file3
> the "link" to "/file1" is stored in the dirent for "file2"
> *as* "/file1" so it carries no real overhead beyond that of its
> name ("file2")


I have seen the "link" type directory entries on unix machines
but if I were to do the same I'd have to store the name in a
file, the directory entry is not long enough to hold the
name of the target file (up to 16 bytes - the 2 SDW-s, another
8 in the EOF entry - and the name length can be up to 255
characters). Let alone a complete path.
But it is a practical thing to have, I may add it one of these days.

>
>> Now this would save me the 64 bytes for a file; but I'll have
>> to introduce another file type (there is room for that but
>> obviously I am cautious with such a step) which would
>> be treated as "null file". Not such a huge step now that I
>> think of it. But I'll leave it for later, it will not be
>> hard to change to it when I want to get rid of the 64 byte
>> file contents (the longnamed directory entry is very efficient
>> in that it is of variable length, depending on the name length,
>> e.g. a 7 character name takes 3 longwords, an 11 char. name
>> takes 4 etc.).
>>
>>> However, by far the most widespread solution used to this
>>> incredibly common problem is to use SQLite. Never mind that
>>> it's SQL, the underlying storage technology is some of the
>>> most heavily tested and reliable code that has ever been
>>> written. It's fully journalled, and the automated test
>>> platform simulates *every* point of failure (every error
>>> path) in the entire codebase, for every platform, on every
>>> release.
>>>
>>> SQLite is small enough that it gets used in almost every
>>> phone app, as well as larger things like web browsers.
>>> I don't think I've ever heard of a failure that resulted
>>> initially from corruption of such a database, and that's
>>> almost incredible by itself.
>>>
>>> Read more about their testing strategy here, it's admirable:
>>> <https://www.sqlite.org/testing.html>
>>
>> Thanks for the pointer (to SQL), I'll look into what it does
>> for ideas. I won't use it - so far all dps code is my own and
>> I want it to stay like this for now - but I can certainly have
>> a look at it to see how other people do it.
>
> There are other "lightweight, connectionless" schemes to maintain
> simple data in a "file-like hive".  Some ~20 years ago, I was
> fond of "db" (and dbm/ndbm).  But, they didn't have the full
> relational capabilities of more modern implementations
> (I'm not sure that capability would benefit you in your usage;
> its helpful for me as very few actions do NOT involve a variety
> of JOINs)
>

I looked into SQL as Clifford suggested and it turned out to be
a language for relational databases of sorts. I can see the appeal
of that reading here and in other posts of yours what you
are doing (your song/artist/album etc. example) but I do not
need all that for my "global store", it does not need to be searchable
too efficiently. Generally it is more like a locker room, you know
where your locker is if you are to have to deal with it.
It can be searched on top of what it is of course - by indexing etc.
for acceleration etc. as needed, but at the moment I do not need
that.

Dimiter

Reply by Clifford Heath ●November 15, 20162016-11-15

On 16/11/16 09:39, Dimiter_Popoff wrote:
> I looked into SQL as Clifford suggested and it turned out to be
> a language for relational databases of sorts.

I suggested SQLite. The SQL language it uses is a *distraction*
and you don't need it... however...

What it seems you need is the kind of reliable storage that
SQLite provides - and does so better than *anything* else you'll
find, with a code size that is smaller than anything of comparable
reliability.

Clifford Heath.

Reply by Don Y ●November 15, 20162016-11-15

Hi Dimiter,

On 11/15/2016 3:39 PM, Dimiter_Popoff wrote:

>>> In dps a directory can just be copied like any other file
>>> and it will still point to the correct file entries at that
>>> moment; it is not done automatically but I have done it in
>>> the past. Then one can copy the directory file and set the
>>> type of the copy to be non-directory so it won't confuse
>>> the system by its duplicate pointers etc.
>>> It is probably quite different from other filesystems as I
>>> have done it without looking at many of them.
>>
>> I suspect you'd have the same sort of problem I'm currently
>> encountering with the Windows Registry:  how could you
>> "painlessly" track changes made to your "global store",
>> "who" (which process) made them AND easily "undo" them.
>
> well, may be. But being on top of the filesystem gives me more
> options, e.g. I have a "diff" script which compares all files
> in a directory to those in another; I could make it recursive
> like some others I have already (e.g. byte count (bc *.sa <top path>)
> etc. It lists differing files and lists "not found" ones in
> the second directory (second on the command line). I don't
> think it will be a huge issue to locate a change this way,
> it may be one of the changes are too many and I have to navigate
> through all detected ones.

There are several issues involved:
- finding the change
- reporting it in a meaningful way
- identifying the "culprit"

The Windows registry just supports a few different data types:
- binary
- string
- dword
- qword

So, there is very little "information" conveyed if you report
that a dword changed from 0x1234 to 0x4343.  OTOH, if a new
key is added, that might convey some information (as they
HOPEFULLY have descriptive names).

The advantage to a "real" database (I am playing fast and
loose with my definition of "real") is that you tend to have
more explicit types.  And, the datum (field) indicates its type.

So, if a byte in my "persistent store" (RDBMS) changes from
0x12 to 0x13, I can see that this was part of a MAC address...
or, a "currency" value, or a "text string", or a "book title"
(if I define a type that is used to represent book titles!)
or a UPC code, etc.

And, I can identify who (process) changed it -- as well as
knowing who CAN'T have changed it (due to the ACL's in place
for that object).

  (i.e., byte 27 changed fro 0x88 to 0x73
   probably doesn't help you recognize *significant* changes)

>> I've been amused to discover that this *is* possible under
>> Windows; but, much harder under my formal RDBMS implementation!
>> Especially the "undo them" (well, maybe I could build a giant
>> "transaction" around everything but I'm willing to be the
>> RDBMS would die when faced with an open-ended issue like that!)
>
> If I want to be able to undo something I would just make a copy
> of the "global store" at some point I might need to return to,
> making each operation "undoable" is not worth the overhead this
> would cost I suppose.

In my case, taking a snapshot of the database is expensive
(because it is hundreds of tables, each of which can occupy
many files, etc.).  So, a "before and after" comparison isn't
really practical.

By contrast, the windows registry is reasonably self-contained.

>>>> SQLite is small enough that it gets used in almost every
>>>> phone app, as well as larger things like web browsers.
>>>> I don't think I've ever heard of a failure that resulted
>>>> initially from corruption of such a database, and that's
>>>> almost incredible by itself.
>>>>
>>>> Read more about their testing strategy here, it's admirable:
>>>> <https://www.sqlite.org/testing.html>
>>>
>>> Thanks for the pointer (to SQL), I'll look into what it does
>>> for ideas. I won't use it - so far all dps code is my own and
>>> I want it to stay like this for now - but I can certainly have
>>> a look at it to see how other people do it.
>>
>> There are other "lightweight, connectionless" schemes to maintain
>> simple data in a "file-like hive".  Some ~20 years ago, I was
>> fond of "db" (and dbm/ndbm).  But, they didn't have the full
>> relational capabilities of more modern implementations
>> (I'm not sure that capability would benefit you in your usage;
>> its helpful for me as very few actions do NOT involve a variety
>> of JOINs)
>
> I looked into SQL as Clifford suggested and it turned out to be
> a language for relational databases of sorts.

Yes, though you can use it in flat "databases" just as a means
of binding "names" to "values".  In my case, as I had already
opted to bear the cost of the RDBMS, it was silly NOT to
avail myself of other features that it provides.

If I had provided a filesystem interface, applications would
undoubtedly just go about creating a bunch of ad hoc data
files with very little in common -- in terms of formats, syntax,
parsing code, etc.

Instead, I make it easier for applications to assign *meanings*
to the data they store -- and to retrieve that information without
having to do all the legwork to ensure the data hasn't been corrupted
(by a user with a text editor who is careless or ignorant of the
rules for THIS particular file).

And, hopefully, allow applications to build on the mechanisms
and meanings of the data created and maintained by others.

E.g., the music database example could easily be augmented by
an application that tracks how OFTEN you play each song.  Or,
the most recent *time* that you played it, etc.  Had the music
"database" been some ad hoc file created by the "music player"
application, the "music player TRACKER" application would have
a harder time being implemented (poor documentation on the
file formats, incomplete parsing algorithms for THAT file format
vs. the "video player" application, etc.)

> I can see the appeal
> of that reading here and in other posts of yours what you
> are doing (your song/artist/album etc. example) but I do not
> need all that for my "global store", it does not need to be searchable
> too efficiently. Generally it is more like a locker room, you know
> where your locker is if you are to have to deal with it.
> It can be searched on top of what it is of course - by indexing etc.
> for acceleration etc. as needed, but at the moment I do not need
> that.

As I said, I've been leveraging my decision to use the RDBMS to
the point where I no longer have "const data" in my applications.
All of that stuff gets fetched from tables at run-time.

Want to know what a '0' looks like vs. an 'O'?  Load the templates
for each of them!  Want to know how *Bob* draws an 'O' vs. the
way *Tom* draws it?  Load Bob's O-template and compare it to Tom's!

This changes the performance criteria on the store as now it's
part of algorithms with timeliness guarantees (instead of just
"initialization" data).

Rescued some more toys, today, so a long night sorting stuff out...  :>

Reply by Dimiter_Popoff ●November 16, 20162016-11-16

On 16.11.2016 &#1075;. 04:30, Clifford Heath wrote:
> On 16/11/16 09:39, Dimiter_Popoff wrote:
>> I looked into SQL as Clifford suggested and it turned out to be
>> a language for relational databases of sorts.
>
> I suggested SQLite. The SQL language it uses is a *distraction*
> and you don't need it... however...

I looked at it (its wikipedia entry) now and I see it is a different
animal indeed.

> What it seems you need is the kind of reliable storage that
> SQLite provides

Sort of yes, but I am more after a hierarchical thing, like a
directory tree. I guess I'll be fine as I have made it, if I have to
take steps to compress it further than it is now I know how to do it.
But from your feedback - and that of Don - it seems the extra few bytes
spilled per entry does not bother you much.

>... - and does so better than *anything* else you'll
> find, with a code size that is smaller than anything of comparable
> reliability.

I did not read enough to get to the code size, could you please
post some figure? Just for reference, I'd be curious to know and
other people reading this might be as well.

Dimiter

Reply by Dimiter_Popoff ●November 16, 20162016-11-16

On 16.11.2016 &#1075;. 05:44, Don Y wrote:
> Hi Dimiter,
>
> On 11/15/2016 3:39 PM, Dimiter_Popoff wrote:
>
>>>> In dps a directory can just be copied like any other file
>>>> and it will still point to the correct file entries at that
>>>> moment; it is not done automatically but I have done it in
>>>> the past. Then one can copy the directory file and set the
>>>> type of the copy to be non-directory so it won't confuse
>>>> the system by its duplicate pointers etc.
>>>> It is probably quite different from other filesystems as I
>>>> have done it without looking at many of them.
>>>
>>> I suspect you'd have the same sort of problem I'm currently
>>> encountering with the Windows Registry:  how could you
>>> "painlessly" track changes made to your "global store",
>>> "who" (which process) made them AND easily "undo" them.
>>
>> well, may be. But being on top of the filesystem gives me more
>> options, e.g. I have a "diff" script which compares all files
>> in a directory to those in another; I could make it recursive
>> like some others I have already (e.g. byte count (bc *.sa <top path>)
>> etc. It lists differing files and lists "not found" ones in
>> the second directory (second on the command line). I don't
>> think it will be a huge issue to locate a change this way,
>> it may be one of the changes are too many and I have to navigate
>> through all detected ones.
>
> There are several issues involved:
> - finding the change
> - reporting it in a meaningful way
> - identifying the "culprit"

Finding the change would be easy. Reporting it in a meaningful way
depends on the entity it is reported to, i.e. what it finds
"meaningful" - if it is a human, this will depend on their knowledge.
Identifying the culprit may or may not be possible - logging every
event means logging the logging events involved and so on into
infinity, so we should draw the line some place:-).

> The Windows registry just supports a few different data types:
> - binary
> - string
> - dword
> - qword

I also think I have seen them store strings. The dps global store
understands all the types and units an "indexed parameter" has
known for 15+ years now (hopefully not 20 yet but I am not sure).
Looking at the source actually I see it IS 20 years old now....
(all capitals, i.e. it has been written for my asm32 which ran
on a different machine....). Here:

**************************************
*                                    *
*           Transgalactic            *
*             Instruments            *
*                                    *
**************************************
*
* PARAMETER OBJECT RELATED EQUATES
*
*
          ORG       0
*
PAR$FL0  DO.B      1         FLAG 0 (R/W ETC.)
          DO.B      1         RESERVED
PAR$DIX  DO.L      1         DEVICE DEPENDENT INDEX
PAR$DIM  DO.W      1         DIMENSION
PAR$MULT DO.W      1         POWER OF 10 (SIGNED) MULTIPLYER
PAR$TYPE DO.W      1         PARAMETER TYPE (DATA TYPE, BYTE,REAL,TEXT, 
ETC.)
PAR$DATA EQU       *         DATA FOLLOWING
*
* PAR$FL0 BIT DEFINITIONS
*
PR0$RD   EQU       0         CAN BE READ
PR0$WR   EQU       1         CAN BE WRITTEN TO
*
          IFUDF     PT$BU
*
* DEVICE DRIVER (OR OTHER) INTERACTION TYPE DEFINITIONS
* TYPE PASSED/RETURNED IN D5,(INDEX IN D6),DATA IN D1 UP TO D4,AS
* MUCH AS IT TAKES IF NUMERIC; IF LIST OF OBJECTS, A5 ->
*
          ORG       0
PT$LIST  DO.B      1         OBJECT LIST AT (A5)
PT$VAR   DO.B      1         A2 -> VAR NAME, D1=@VAR
PT$BU    DO.B      1         UNSIGNED BYTE
PT$BS    DO.B      1         SIGNED BYTE
PT$WU    DO.B      1         UNSIGNED WORD
PT$WS    DO.B      1         SIGNED WORD
PT$LU    DO.B      1         UNSIGNED LONG
PT$LS    DO.B      1         SIGNED LONG
PT$DU    DO.B      1         UNSIGNED DUAL
PT$DS    DO.B      1         SIGNED DUAL (64-BIT)
PT$QU    DO.B      1         UNSIGNED QUAD
PT$QS    DO.B      1         SIGNED QUAD
PT$FS    DO.B      1         FP SINGLE PRECISION
PT$FD    DO.B      1         FP DUAL PRECISION
PT$FX    DO.B      1         FP .X
pt$alst  do.b      1        allocated list (same as pt$list but can be 
deallocated)
*
*
          ENDC
*
* units
*
          ORG       0
DIM$NUMB DO.B      1         UNDEFINED - JUST A NUMBER
DIM$SEC  DO.B      1         SECONDS
DIM$MIN  DO.B      1         MINUTES
DIM$HOUR DO.B      1         HOURS
DIM$DAY  DO.B      1         DAY
DIM$WEEK DO.B      1         WEEK
DIM$MON  DO.B      1         MONTH
DIM$YEAR DO.B      1         YEAR; FURHTER WITH MULTIPLIER
DIM$METR DO.B      1         METER
DIM$DEG  DO.B      1         DEGREE
DIM$RAD  DO.B      1         RADIAN
DIM$GRAD DO.B      1         GRAD (100GR=90 DEG)
DIM$VOLT DO.B      1         VOLT
DIM$AMP  DO.B      1         AMPERE
DIM$OHM  DO.B      1         OHM
DIM$FAR  DO.B      1         FARAD
DIM$HEN  DO.B      1         HENRY
DIM$BQ   DO.B      1         BECQUEREL
DIM$CU   DO.B      1         CURIE
DIM$ROE  DO.B      1         ROENTGEN
DIM$SLIC DO.B      1         SYSTEM TIME SLICES
DIM$CELS DO.B      1         DEGREE CELSIUS
DIM$FRNH DO.B      1         DEGREE FAHRENHEIT
dim$hz   do.b      1         hertz
*
*
          END

The "parameter object" has been abandoned ages ago, probably
never used since it was first introduced. But may be some code
using it is still in use. The types and units are widely deployed.

The "units" (called "dim" because I have not thought of the correct
English word, in Bulgarian a "unit" is called a "dimension") contains
entries which were never used but it is a 16-bit field so no
problem expected soon out of that.

The "list" type is quite generic, it can be any sequence of
dps inherent objects (lowest level objects, like horizontal line,
text string etc., not the "object" I refer to elsewhere which
is the basis of the dps runtime object system, the latter is
"extobj", one of the many low level objects). But I mostly use
it for text strings, these can be "pasted" (de-encapsulated
and written to some memory address).

What comes with all the types is the check for overflow (hence
the signed and unsigned types); then setting a parameter will
fail if the supplied type and unit not as expected. In the global
store this is not the case, the type/unit will be overwritten
with the latest (I think...).

>
> So, there is very little "information" conveyed if you report
> that a dword changed from 0x1234 to 0x4343.  OTOH, if a new
> key is added, that might convey some information (as they
> HOPEFULLY have descriptive names).

Paths and names are what I rely on for meaning, ownership etc.
indeed.

> The advantage to a "real" database (I am playing fast and
> loose with my definition of "real") is that you tend to have
> more explicit types.  And, the datum (field) indicates its type.
>
> So, if a byte in my "persistent store" (RDBMS) changes from
> 0x12 to 0x13, I can see that this was part of a MAC address...
> or, a "currency" value, or a "text string", or a "book title"
> (if I define a type that is used to represent book titles!)
> or a UPC code, etc.
>
> And, I can identify who (process) changed it -- as well as
> knowing who CAN'T have changed it (due to the ACL's in place
> for that object).

Hmmm, identifying the process which changed it may be useful
but may be not so straight forward for that purpose. What if
it has been modified by a process (task) which was killed and
then another ran in its place? In dps this is countered by
identifying tasks not just by their task descriptor ID (offset
to access it really) but in addition by their spawn moment
(system time). This does not survive reset though.... one would
have to include the starting moment of the boot session.

>
> Rescued some more toys, today, so a long night sorting stuff out...  :>

Hah, sounds like you will have some fun :-).

Dimiter

Reply by Don Y ●November 16, 20162016-11-16

On 11/16/2016 4:43 AM, Dimiter_Popoff wrote:
> On 16.11.2016 &#1075;. 05:44, Don Y wrote:
>> On 11/15/2016 3:39 PM, Dimiter_Popoff wrote:
>>
>>>>> In dps a directory can just be copied like any other file
>>>>> and it will still point to the correct file entries at that
>>>>> moment; it is not done automatically but I have done it in
>>>>> the past. Then one can copy the directory file and set the
>>>>> type of the copy to be non-directory so it won't confuse
>>>>> the system by its duplicate pointers etc.
>>>>> It is probably quite different from other filesystems as I
>>>>> have done it without looking at many of them.
>>>>
>>>> I suspect you'd have the same sort of problem I'm currently
>>>> encountering with the Windows Registry:  how could you
>>>> "painlessly" track changes made to your "global store",
>>>> "who" (which process) made them AND easily "undo" them.
>>>
>>> well, may be. But being on top of the filesystem gives me more
>>> options, e.g. I have a "diff" script which compares all files
>>> in a directory to those in another; I could make it recursive
>>> like some others I have already (e.g. byte count (bc *.sa <top path>)
>>> etc. It lists differing files and lists "not found" ones in
>>> the second directory (second on the command line). I don't
>>> think it will be a huge issue to locate a change this way,
>>> it may be one of the changes are too many and I have to navigate
>>> through all detected ones.
>>
>> There are several issues involved:
>> - finding the change
>> - reporting it in a meaningful way
>> - identifying the "culprit"
>
> Finding the change would be easy. Reporting it in a meaningful way
> depends on the entity it is reported to, i.e. what it finds
> "meaningful" - if it is a human, this will depend on their knowledge.

In my case (the reason for the post), its a way of identifying
changes made to the system that I might not EXPECT to have occurred
(e.g., an application remapping certain file extensions to its
own handler instead of the handler that I'd previously "been
happy with").  It's easier to just *see* what it has added/changed
than to stumble across the consequences of those changes -- maybe
days or weeks/months later (then having to sort out how to undo them
and any *further* changes).

Then, to see how ANOTHER application potentially mucks with the settings
put in place by the first.

Or, how a *newer* version of an application changes settings from an
earlier version, etc.

In Windows, this sucks because there are so few data types and the
documentation for each registry setting is usually nonexistent for
any particular application.

> Identifying the culprit may or may not be possible - logging every
> event means logging the logging events involved and so on into
> infinity, so we should draw the line some place:-).

Again, a difference in our expectations of the store.  In my case,
as it is the sole means of storing stuff, it is *huge* (terabytes).
E.g., every executable is retrieved from the store and loaded,
on demand, at runtime.  Every song you want to play, the time
of every incoming phone call, voice recordings of those calls
(think: answering machine), surveillance video, etc.

A notion of "tablespaces" (i.e., store THIS table on THAT physical
medium) lets me present a unified interface to the store yet still
move data objects around "behind the scenes".  E.g., settings that
change frequently should be backed by BBSRAM; OTOH, silly to
store music (which is largely immutable) there -- NAND FLASH would
be a better choice; and, surveillance video on magnetic disks, etc.

So, logging "process ID", "time of change", and "change" isn't
a huge resource hog.  :>  And, a log need not be boundless; you
can elect to just save the last N transactions, etc.

But, having ACLs in place means I can already narrow down the list of
potential offenders:  who had *permission* to make that change?

>> The Windows registry just supports a few different data types:
>> - binary
>> - string
>> - dword
>> - qword
>
> I also think I have seen them store strings. The dps global store
> understands all the types and units an "indexed parameter" has
> known for 15+ years now (hopefully not 20 yet but I am not sure).
> Looking at the source actually I see it IS 20 years old now....
> (all capitals, i.e. it has been written for my asm32 which ran
> on a different machine....). Here:
>
> **************************************
> *                                    *
> *           Transgalactic            *
> *             Instruments            *
> *                                    *
> **************************************
> *
> * PARAMETER OBJECT RELATED EQUATES
> *
> *
>          ORG       0

Is this a hack to effectively define:
PAR$FLO		EQU	0	* byte = 1 byte
<unused>	EQU	1	* byte = 1 byte
PAR$DIX		EQU	2	* long = 4 bytes
PAR$DIM		EQU	6	* word = 2 bytes
PAR$MULT	EQU	8	...

*Or*, are these all "members" of a "parameter object"?

> PAR$FL0  DO.B      1         FLAG 0 (R/W ETC.)
>          DO.B      1         RESERVED
> PAR$DIX  DO.L      1         DEVICE DEPENDENT INDEX
> PAR$DIM  DO.W      1         DIMENSION
> PAR$MULT DO.W      1         POWER OF 10 (SIGNED) MULTIPLYER
> PAR$TYPE DO.W      1         PARAMETER TYPE (DATA TYPE, BYTE,REAL,TEXT, ETC.)
> PAR$DATA EQU       *         DATA FOLLOWING
> *
> * PAR$FL0 BIT DEFINITIONS
> *
> PR0$RD   EQU       0         CAN BE READ
> PR0$WR   EQU       1         CAN BE WRITTEN TO
> *
>          IFUDF     PT$BU
> *
> * DEVICE DRIVER (OR OTHER) INTERACTION TYPE DEFINITIONS
> * TYPE PASSED/RETURNED IN D5,(INDEX IN D6),DATA IN D1 UP TO D4,AS
> * MUCH AS IT TAKES IF NUMERIC; IF LIST OF OBJECTS, A5 ->
> *
>          ORG       0

E.g., here, it looks like you are using this as a "trick" to
define a bunch of mutually exclusive values (LIST, VAR, BU, BS...)
as constants.

> PT$LIST  DO.B      1         OBJECT LIST AT (A5)
> PT$VAR   DO.B      1         A2 -> VAR NAME, D1=@VAR
> PT$BU    DO.B      1         UNSIGNED BYTE
> PT$BS    DO.B      1         SIGNED BYTE
> PT$WU    DO.B      1         UNSIGNED WORD
> PT$WS    DO.B      1         SIGNED WORD
> PT$LU    DO.B      1         UNSIGNED LONG
> PT$LS    DO.B      1         SIGNED LONG
> PT$DU    DO.B      1         UNSIGNED DUAL
> PT$DS    DO.B      1         SIGNED DUAL (64-BIT)
> PT$QU    DO.B      1         UNSIGNED QUAD
> PT$QS    DO.B      1         SIGNED QUAD
> PT$FS    DO.B      1         FP SINGLE PRECISION
> PT$FD    DO.B      1         FP DUAL PRECISION
> PT$FX    DO.B      1         FP .X
> pt$alst  do.b      1        allocated list (same as pt$list but can be
> deallocated)
> *
> *
>          ENDC
> *
> * units
> *
>          ORG       0
> DIM$NUMB DO.B      1         UNDEFINED - JUST A NUMBER
> DIM$SEC  DO.B      1         SECONDS
> DIM$MIN  DO.B      1         MINUTES
> DIM$HOUR DO.B      1         HOURS
> DIM$DAY  DO.B      1         DAY
> DIM$WEEK DO.B      1         WEEK
> DIM$MON  DO.B      1         MONTH
> DIM$YEAR DO.B      1         YEAR; FURHTER WITH MULTIPLIER
> DIM$METR DO.B      1         METER
> DIM$DEG  DO.B      1         DEGREE
> DIM$RAD  DO.B      1         RADIAN
> DIM$GRAD DO.B      1         GRAD (100GR=90 DEG)
> DIM$VOLT DO.B      1         VOLT
> DIM$AMP  DO.B      1         AMPERE
> DIM$OHM  DO.B      1         OHM
> DIM$FAR  DO.B      1         FARAD
> DIM$HEN  DO.B      1         HENRY
> DIM$BQ   DO.B      1         BECQUEREL
> DIM$CU   DO.B      1         CURIE
> DIM$ROE  DO.B      1         ROENTGEN
> DIM$SLIC DO.B      1         SYSTEM TIME SLICES
> DIM$CELS DO.B      1         DEGREE CELSIUS
> DIM$FRNH DO.B      1         DEGREE FAHRENHEIT
> dim$hz   do.b      1         hertz
> *
> *
>          END
>
> The "parameter object" has been abandoned ages ago, probably
> never used since it was first introduced. But may be some code
> using it is still in use. The types and units are widely deployed.

The RDBMS that I'm using supports a variety of "natural" types:
<https://www.postgresql.org/docs/9.5/static/datatype.html#DATATYPE-TABLE>
and adjusts the storage required (as well as the sanity checks that
are applied to values -- e.g., a "date" has different rules than a
"point") accordingly.

But, I can freely augment the list of data types with types of my
own.  E.g., I have a "Bezier" type that is used to represent
cubic bezier curves.  Another that is used to represent ISBN
identifiers.  etc.

Additionally, I can define operators that apply to those particular
data types.  E.g., publisher() yields the publisher code of a particular
ISBN identifier:
<https://en.wikipedia.org/wiki/List_of_group-0_ISBN_publisher_codes>
And, is_line() tells me if a particular Bezier is actually a straight line
segment, etc.

Additionally, I can impose constraints on data items that the RDBMS will
enforce.  I.e., "ensure the time of incoming phone call is LATER than
the time of the call that preceded it".  As such, the RDBMS can be seen as
a contract enforcer for its clients.

I consider the store to be incorruptible; if something is accepted
by the store, it will always return that same value (or, the most
recent value for that "thing").  So, clients don't have to check
the sanity of anything coming FROM the store.

> The "units" (called "dim" because I have not thought of the correct
> English word, in Bulgarian a "unit" is called a "dimension") contains
> entries which were never used but it is a 16-bit field so no
> problem expected soon out of that.
>
> The "list" type is quite generic, it can be any sequence of
> dps inherent objects (lowest level objects, like horizontal line,
> text string etc., not the "object" I refer to elsewhere which
> is the basis of the dps runtime object system, the latter is
> "extobj", one of the many low level objects). But I mostly use
> it for text strings, these can be "pasted" (de-encapsulated
> and written to some memory address).

I can't support "bags" (groups of objects of inconsistent types)
but can support an array of any standard type:
<https://www.postgresql.org/docs/9.5/static/arrays.html>

> What comes with all the types is the check for overflow (hence
> the signed and unsigned types); then setting a parameter will
> fail if the supplied type and unit not as expected. In the global
> store this is not the case, the type/unit will be overwritten
> with the latest (I think...).

I can specify a valid range of values for a (certain) particular types:
<https://www.postgresql.org/docs/9.5/static/rangetypes.html>

For other types, I have to bear some cost for creating the
constraints that apply to that type.  E.g., if I wanted to
ensure a particular IP address was in a particular subnet...

>> So, there is very little "information" conveyed if you report
>> that a dword changed from 0x1234 to 0x4343.  OTOH, if a new
>> key is added, that might convey some information (as they
>> HOPEFULLY have descriptive names).
>
> Paths and names are what I rely on for meaning, ownership etc.
> indeed.

Yes.  And, for the identifiers (and positions in the "namespace
hierarchy") that YOU choose, this will probably work.  But, in
my case, I can't rely on a future developer (or USER!) to pick
good names.  So, I want to facilitate that effort by imposing the
fewest impediments to picking arbitrary names (within the
confines of the SQL language)

>> The advantage to a "real" database (I am playing fast and
>> loose with my definition of "real") is that you tend to have
>> more explicit types.  And, the datum (field) indicates its type.
>>
>> So, if a byte in my "persistent store" (RDBMS) changes from
>> 0x12 to 0x13, I can see that this was part of a MAC address...
>> or, a "currency" value, or a "text string", or a "book title"
>> (if I define a type that is used to represent book titles!)
>> or a UPC code, etc.
>>
>> And, I can identify who (process) changed it -- as well as
>> knowing who CAN'T have changed it (due to the ACL's in place
>> for that object).
>
> Hmmm, identifying the process which changed it may be useful
> but may be not so straight forward for that purpose. What if
> it has been modified by a process (task) which was killed and
> then another ran in its place? In dps this is countered by
> identifying tasks not just by their task descriptor ID (offset
> to access it really) but in addition by their spawn moment
> (system time). This does not survive reset though.... one would
> have to include the starting moment of the boot session.

Yes.  In my case, if I start a "job" many times over the lifetime
of the system, identifying which instance is the culprit is
problematic.  But, you'd really only need this ability as a
diagnostic tool; "who screwed with this?".  It would be nigh
on impossible to catch a one-time event.  But, if it is a
repeatable "bug", you should be able to manually fix a setting
(object) and then watch (instrument) to see how/when it changes
thereafter.

I can conceivably write a trigger that does this watching for me
(if I know what to look for) and turns on a red light when it
is tripped, etc.

>> Rescued some more toys, today, so a long night sorting stuff out...  :>
>
> Hah, sounds like you will have some fun :-).

<frown>  Until I have to decide what to DISCARD to accommodate the
NEW ADDITIONS!  :-/

"Simplify"  <big frown>

I rescued a second (i.e., "spare") 2KW UPS:
<https://www.amazon.com/APC-SMT2200RM2U-2200VA-120V-Smart-UPS/dp/B004F09D0O>
for my automation system.  Coupled with an "expansion battery pack"
(48V @ 15AHr), I should be able to keep the system "up" for a few
hours without shedding loads... the better part of a *day* with active
load management!  (beyond that, you've got bigger problems!  :> )

Reply by Don Y ●November 16, 20162016-11-16

On 11/16/2016 4:03 AM, Dimiter_Popoff wrote:
> On 16.11.2016 &#1075;. 04:30, Clifford Heath wrote:
>> On 16/11/16 09:39, Dimiter_Popoff wrote:

>> What it seems you need is the kind of reliable storage that
>> SQLite provides
>
> Sort of yes, but I am more after a hierarchical thing, like a
> directory tree. I guess I'll be fine as I have made it, if I have to
> take steps to compress it further than it is now I know how to do it.
> But from your feedback - and that of Don - it seems the extra few bytes
> spilled per entry does not bother you much.

In the case of PostgreSQL, it's not JUST a "extra few bytes"!  :<
There's a fair bit of (data) overhead that gets dragged in
with the data.  Especially if you want to optimize access to
that data or tie it to other data (primary/foreign keys).

>> ... - and does so better than *anything* else you'll
>> find, with a code size that is smaller than anything of comparable
>> reliability.
>
> I did not read enough to get to the code size, could you please
> post some figure? Just for reference, I'd be curious to know and
> other people reading this might be as well.

I think about a quarter megabyte -- depending on which features
you include (exclude).  For PostgreSQL, that number increases about
5-fold.

[Given that "investment", I try to leverage as much functionality
out of the RDBMS as is conceivable!]

But, these are addressing different sorts of problems.  I suspect
an even "simpler" (name,value) store (e.g., ndbm) would be considerably
smaller -- and less feature-full.

E.g., do you have to support concurrent readers/writers and guarantee
atomic access?  Or, can you afford to do that in a "wrapper" around
the "global store" (a "monitor", of sorts).  Do you have to support
"transactions" and be able to roll-back operations on the store
based on conflicts encountered with "later" operations in that
same transaction?  etc.

Sizewise, your approach will almost always "win" -- because you're just
special-casing the filesystem code (e.g., to handle your namespace).
If you wanted to ensure no two clients (threads/processes) could compete
for a particular "setting/parameter", you could implement that mechanism
with simple file locking:  pend on lock, make change (to value or directory),
release lock.  etc.

By contrast, SQLite/PostgreSQL don't really use the underlying file
system for anything more than "bulk storage".

Reply by Clifford Heath ●November 16, 20162016-11-16

On 17/11/16 04:33, Don Y wrote:
> On 11/16/2016 4:03 AM, Dimiter_Popoff wrote:
>> On 16.11.2016 &#1075;. 04:30, Clifford Heath wrote:
>>> On 16/11/16 09:39, Dimiter_Popoff wrote:
>
>>> What it seems you need is the kind of reliable storage that
>>> SQLite provides
>>
>> Sort of yes, but I am more after a hierarchical thing, like a
>> directory tree. I guess I'll be fine as I have made it, if I have to
>> take steps to compress it further than it is now I know how to do it.
>> But from your feedback - and that of Don - it seems the extra few bytes
>> spilled per entry does not bother you much.
>
> In the case of PostgreSQL, it's not JUST a "extra few bytes"!  :<
> There's a fair bit of (data) overhead that gets dragged in
> with the data.  Especially if you want to optimize access to
> that data or tie it to other data (primary/foreign keys).
>
>>> ... - and does so better than *anything* else you'll
>>> find, with a code size that is smaller than anything of comparable
>>> reliability.
>>
>> I did not read enough to get to the code size, could you please
>> post some figure? Just for reference, I'd be curious to know and
>> other people reading this might be as well.

They have an entire web page for that - 2nd hit in Google:
<https://www.sqlite.org/footprint.html>

> I think about a quarter megabyte -- depending on which features
> you include (exclude).

A quarter (all OMIT options) up to half a megabyte (no OMITs).
Yes, it's quite a lot. Yes, it would be possible to do the
same in a much smaller library. No, no-one has actually done
that in a widely-used well-tested library. db/ndb goes close.

FWIW, almost all such systems since 1990 were built by following
the instructions from just one book - "Transaction Processing -
Concepts and Techniques" by Gray and Reuter. Incredibly influential
book. This kind of reliability requires techniques that are very
non-obvious - much more so if you want to maximise concurrent
processing - and those techniques were closely-guarded trade
secrets until this book was published. I can highly recommend
it to anyone interested in making any composite action appear
to be atomic (this is the core idea of a "transaction").

>  For PostgreSQL, that number increases about 5-fold.

Postgres is not suitable for this. It requires periodic
"vacuum"ing and other maintenance.

> E.g., do you have to support concurrent readers/writers and guarantee
> atomic access?  Or, can you afford to do that in a "wrapper" around
> the "global store" (a "monitor", of sorts).

SQLite uses a global lock, so each transaction is single-threaded.
That's why it's so much smaller than e.g. Postgres. However, it's
still *really* hard to guarantee atomicity across failure restarts.
The method that gives you this, also gives you roll-back for free.

> Sizewise, your approach will almost always "win" -- because you're just
> special-casing the filesystem code (e.g., to handle your namespace).

But you're then *totally* reliant on the filesystem to provide
atomicity across unexpected restarts - and that is almost
certainly not the case.

> By contrast, SQLite/PostgreSQL don't really use the underlying file
> system for anything more than "bulk storage".

And even as "bulk storage" those are significant possible points of
failure. Consider that a database "page" which might be 8K is made
up of multiple sectors, and a power fail can result in a block that
has been only part written (so you get a so-called "torn page").
Torn page detection involves writing a page checksum to the start
and end of each page, and checking both against the actual data on
every read.

Drives (including SSD) do so much caching and write rescheduling
that you cannot really know which "completed" writes have actually
completed. Newer drives provide special modes and operations which
can be used to increase reliability for DBMS, in addition to global
flush all writes type features.

Clifford Heath

Reply by Don Y ●November 18, 20162016-11-18

On 11/13/2016 11:34 AM, Don Y wrote:
> Is there a *simple* means (I've already found a complicated one)
> of noting the changes made to the registry by installing X vs. Y?
>
> (Consider the different cases:  install X, then Y vs. Y then X)

Well, the simplest solution is to install X in a sandbox and trap
the registry (as well as filesystem) changes.  Then, Y in a sandbox
*within* that sandbox for similar reasons.

Dump both sandboxes and repeat, swapping the order of X and Y.

Reply by Ed Prochak ●November 21, 20162016-11-21

On Tuesday, November 15, 2016 at 8:02:04 AM UTC-5, Don Y wrote:
[]
> I suspect you'd have the same sort of problem I'm currently
> encountering with the Windows Registry:  how could you
> "painlessly" track changes made to your "global store",
> "who" (which process) made them AND easily "undo" them.
> 
> I've been amused to discover that this *is* possible under
> Windows; but, much harder under my formal RDBMS implementation!
> Especially the "undo them" (well, maybe I could build a giant
> "transaction" around everything but I'm willing to be the
> RDBMS would die when faced with an open-ended issue like that!)
> 
Hi Don,

Since you are working in a RDBMS, I am surprised you have not yet
tried a transaction history. About 15 years ago, I was contracting
on an Oracle project. The system build managed a set of transaction
tables that logged who, what, and when changes were made.

We were able to use that tracking storage for the work I was doing
which was "fixing" data errors. The initial data populated into the
data base was pretty bad quality from an old ISAM style system. Mainly
address data, and rural addresses at that. So we iteratively identified
address corrections, applied them and then found where the change caused
other issues and could back out the whole change, or even just subsets.

It does not necessarily have to cost a lot more storage.  A binary
mapping of what changes in a row of your working table and a separate
table of those changed columns. Time-stamps and user IDs and such can
be in the tracking tables also.

I am guessing the RDBMS exists on larger nodes in your system that have
more resources. But if you are resource limited, then your options are
limited also and this may not be a possibly option. But I hope it helps.

Ed