On 11/29/2020 2:16 AM, George Neuner wrote:
> On Sat, 28 Nov 2020 00:47:00 -0700, Don Y
> <blockedofcourse@foo.invalid> wrote:
> 
>> On 11/27/2020 5:40 AM, Wojciech Zabolotny wrote:
>>
>>> Here you are: https://groups.google.com/g/alt.sources/c/YeeAV3fBAVc/m/AZgPoFxS4NYJ
>>> The Python code has completely removed indentation.
>>
>> Indentation and whitespace /tend/ to be insignificant to the operation
>> of the code.  Of course, presence in string literals is a different
>> story -- where even replacing tabs with spaces is a hazard.
> 
> In Python, indentation is required syntax: in general, it is an error
> for code in the same scope not to be vertically aligned.

Sorry, I didn't even examine the "content" of the archive; rather,
concentrated on the "SHAR wrapper" as it was quite obviously
corrupted.

> However, with a nested 'if-else', logic actually depends on the
> indentation:
> 
>    if <expr1>:
>      <statements1>
>      if <expr2>:
>        <statements2>
>      else:
> 
> is very different from
> 
>    if <expr1>:
>      <statements1>
>      if <expr2>:
>        <statements2>
>    else:
> 
> In C the 'else' goes to the nearest 'if' regardless of whitespace.  In
> Python, the 'else' goes to the nearest 'if' with which it vertically
> aligned.

Yes.  I dislike Python as my naming and coding styles rely on long
logical lines.  I prefer to let a pretty-printer clean up my
code to my own coding standards (indents, braces, function templates,
etc.) than to let the language dictate what my code HAS TO look like.

[I most often don't write in an IDE so can't rely on the "editor"
to "correct" formatting for me if, for example, I prepend an "if"
to a block of code or wrap it into some other explicit block]

> Significant whitespace sucks!

There are still places where a space is not a space and you have to
deal with it.  I frequently find tabs and spaces interchanged for
each other when cutting and pasting across systems; the machine
sees things that the human doesn't care about.  Try CONCLUSIVELY
sorting out whether you're looking at " \t", "     " or "\t " (or
variations thereof) from a paper printout!

But, there are also annoyances with things as banal as typefaces
that needlessly confound.

Or, displays that have opted to use particular glyphs that
can't readily be resolved as being rightside up or upside
down.  Is "529" five hundred and twenty nine?  Or, six hundred
and twenty five?

On 11/29/2020 1:57 AM, George Neuner wrote:
> On Sat, 28 Nov 2020 00:53:43 -0700, Don Y
> <blockedofcourse@foo.invalid> wrote:
> 
>> On 11/26/2020 9:14 PM, George Neuner wrote:
>>>
>>> On Thu, 26 Nov 2020 19:01:33 -0700, Don Y
>>> <blockedofcourse@foo.invalid> wrote:
>>
>> Hi George!
>>
>> Have not heard from you in a while -- was beginning to think that you
>> may have been coviderated!  Hopefully, that's not the case (?)
> 
> Nope. I had a viral flu in early 2018 that had eerily similar symptoms
> to what is claimed for Covid-19: I was really sick with respiratory
> problems for ~5 weeks, and it was ~14 weeks before I really felt well
> again.  I was never hospitalized, so that virus was never identified,
> but I'm hoping that was a coronavirus because some studies in Europe
> found that prior exposure to other coronaviruses *may* give some
> increased resistance to this one.
> 
> In any event, I don't have your current email.

<frown>  I was evaluating lawyers and their ilk (good use
of that word in that context) a few months back and "consumed"
several email addresses in the process -- giving them out
"temporarily" and then canceling the accounts once I'd made
up my mind to cut off further communication from the
"undesirables" (Q:  are ANY of them "desirables"  :> )

I thought I'd picked accounts that I wasn't actively using.
But, may have screwed up.  I'll check my mail archive to
see what you were using to see if it was affected.

In either case, you should have a couple of addresses for me (?)

>>>> Are you sure the "corruption" can't be stripped from the post
>>>> with a filter (script)?
>>>
>>> The junk is HTML formatting.  The worry is that things like C++ source
>>> legitimately may contain angle bracket delimited text.  You'd need a
>>> smart filter that understands HTML tags.
>>
>> Or, scrape the posts manually?  E.g., highlight text in browser,
>> copy, paste?
> 
> Laborious if someone posted a long program.

Of course.  My point was that the "content" isn't really "lost",
just less easily accessed!

(I had to resort to scans of much of my earliest work to get
them back into electronic form)

>> If posted as an "image" of text (to deliberately hinder capture),
>> a screen capture program feeding an OCR... and manual touch-up.
> 
> Yuck! On average OCR still makes ~1 mistake per line.

I've not seen that sort of problem with good images.  Much worse
with scanned stuff (esp if scanned at too low resolution).

In any case, it appears that much of the delimiters that SHAR introduces
are arbitrarily removed from those posts.  Perhaps google thinking
a leading nonspace character is indicative of an indent level
in quoting?  (you can specify which character to use in many MUAs)

>> Though, having seen Wojciech's example, it appears that there is
>> more involved than just eliding HTML tags!  I've not actively studied
>> the (apparent) transformation to try to codify the rules that may
>> have been applied...
> 
> The problem there is Python. For almost any other language, your idea
> of scraping it manually would work.  For Python, you have to
> understand the logic to reinstate the required indentation.
> 
> I have always been opposed to significant whitespace in a language.

George Neuner <gneuner2@comcast.net> writes:
> Significant whitespace sucks!

You'll love my new language "Point Blank".  Its file extension is a
space character.

There is also my Haskell dialect for embedded microprocessors.  It is
called Control-H.  its file extension is a backspace.

;-)

On Sat, 28 Nov 2020 00:47:00 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 11/27/2020 5:40 AM, Wojciech Zabolotny wrote:
> 
>> Here you are: https://groups.google.com/g/alt.sources/c/YeeAV3fBAVc/m/AZgPoFxS4NYJ
>> The Python code has completely removed indentation.
>
>Indentation and whitespace /tend/ to be insignificant to the operation
>of the code.  Of course, presence in string literals is a different
>story -- where even replacing tabs with spaces is a hazard.

In Python, indentation is required syntax: in general, it is an error
for code in the same scope not to be vertically aligned.

However, with a nested 'if-else', logic actually depends on the
indentation:

  if <expr1>:
    <statements1>
    if <expr2>:
      <statements2>
    else:

is very different from

  if <expr1>:
    <statements1>
    if <expr2>:
      <statements2>
  else:

In C the 'else' goes to the nearest 'if' regardless of whitespace.  In
Python, the 'else' goes to the nearest 'if' with which it vertically
aligned.

Significant whitespace sucks!
George

On Sat, 28 Nov 2020 00:53:43 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 11/26/2020 9:14 PM, George Neuner wrote:
>> 
>> On Thu, 26 Nov 2020 19:01:33 -0700, Don Y
>> <blockedofcourse@foo.invalid> wrote:
>
>Hi George!
>
>Have not heard from you in a while -- was beginning to think that you
>may have been coviderated!  Hopefully, that's not the case (?)

Nope. I had a viral flu in early 2018 that had eerily similar symptoms
to what is claimed for Covid-19: I was really sick with respiratory
problems for ~5 weeks, and it was ~14 weeks before I really felt well
again.  I was never hospitalized, so that virus was never identified,
but I'm hoping that was a coronavirus because some studies in Europe
found that prior exposure to other coronaviruses *may* give some
increased resistance to this one.

In any event, I don't have your current email.

>>> Are you sure the "corruption" can't be stripped from the post
>>> with a filter (script)?
>> 
>> The junk is HTML formatting.  The worry is that things like C++ source
>> legitimately may contain angle bracket delimited text.  You'd need a
>> smart filter that understands HTML tags.
>
>Or, scrape the posts manually?  E.g., highlight text in browser,
>copy, paste?

Laborious if someone posted a long program.

>If posted as an "image" of text (to deliberately hinder capture),
>a screen capture program feeding an OCR... and manual touch-up.

Yuck! On average OCR still makes ~1 mistake per line.

>Though, having seen Wojciech's example, it appears that there is
>more involved than just eliding HTML tags!  I've not actively studied
>the (apparent) transformation to try to codify the rules that may
>have been applied...

The problem there is Python. For almost any other language, your idea
of scraping it manually would work.  For Python, you have to
understand the logic to reinstate the required indentation.

I have always been opposed to significant whitespace in a language.

George

On 11/26/2020 9:14 PM, George Neuner wrote:
> 
> On Thu, 26 Nov 2020 19:01:33 -0700, Don Y
> <blockedofcourse@foo.invalid> wrote:

Hi George!

Have not heard from you in a while -- was beginning to think that you
may have been coviderated!  Hopefully, that's not the case (?)

>> On 11/26/2020 4:11 PM, Wojciech Zabo?otny wrote:
>>> A few Usenet groups allowed users to post their source code as shar archive.
>>> The Google Groups website supported access to those groups, viewing the
>>> message in the original (raw) format and upacking the sources.
>>> Unfortunately, last update of Google Groups has dropped a possibility
>>> to access the original of the Usenet posts.
>>> The "formatted" (in fact corrupted) version of the message does not
>>> allow to unpack the (now damaged) shar archive.
>>
>> I don't understand what you mean by "corrupted"?  Do you have a
>> pointer to an example that I can examine (without a google login)?
>>
>> Are you sure the "corruption" can't be stripped from the post
>> with a filter (script)?
> 
> The junk is HTML formatting.  The worry is that things like C++ source
> legitimately may contain angle bracket delimited text.  You'd need a
> smart filter that understands HTML tags.

Or, scrape the posts manually?  E.g., highlight text in browser,
copy, paste?

If posted as an "image" of text (to deliberately hinder capture),
a screen capture program feeding an OCR... and manual touch-up.

Though, having seen Wojciech's example, it appears that there is
more involved than just eliding HTML tags!  I've not actively studied
the (apparent) transformation to try to codify the rules that may
have been applied...

> And there may be a *lot* of it. I've seen usenet messages sent (or
> forwarded) from Google Groups with ... not kidding! ... ~10,000 lines
> of deeply nested HTML surrounding ~10 lines of text.
> 
>>> Does it mean that all sources that were posted to Usenet are now
>>> lost for us forever?
>>> Is there any other way to access the old Usenet messages in their original
>>> form?
> 
> Since Google has removed the option to see the raw message, the only
> way to get things unmangled is from some other source.

For a small-ish post, I'd wager you could scrape (as above) and
manually edit the resulting text to something that's faithful to
the original intent.  Tedious and potentially error prone but
denies "lost forever".

> Unfortunately few NNTP servers go back further than about 10 years,
> and ftp.uu.net (the original usenet archive) is no longer operating.
> 
> You can try
>    https://usenetarchives.com/ or
>    https://www.crunchbase.com/organization/the-usenet-archive.
> 
> Many(most?) of the historically popular groups are available, and that
> includes pretty much everything in the comp.* and sci.* hierarchies.
> But searching is not easy, and if you're looking for something
> esoteric you may not find it.

Some of the better known sources are also available on FTP servers.
E.g., I think Vixie's cron(8) is available like this.

On 11/27/2020 5:40 AM, Wojciech Zabolotny wrote:
> pi&#261;tek, 27 listopada 2020 o 03:21:39 UTC+1 Don Y napisa&#322;(a):
>> On 11/26/2020 4:11 PM, Wojciech Zabo&#322;otny wrote:
>>> A few Usenet groups allowed users to post their source code as shar archive.
>>> The Google Groups website supported access to those groups, viewing the
>>> message in the original (raw) format and upacking the sources.
>>> Unfortunately, last update of Google Groups has dropped a possibility
>>> to access the original of the Usenet posts.
>>> The "formatted" (in fact corrupted) version of the message does not
>>> allow to unpack the (now damaged) shar archive.
>> I don't understand what you mean by "corrupted"? Do you have a
>> pointer to an example that I can examine (without a google login)?
> 
> Here you are: https://groups.google.com/g/alt.sources/c/YeeAV3fBAVc/m/AZgPoFxS4NYJ
> The Python code has completely removed indentation.

Indentation and whitespace /tend/ to be insignificant to the operation
of the code.  Of course, presence in string literals is a different
story -- where even replacing tabs with spaces is a hazard.

>> Are you sure the "corruption" can't be stripped from the post
>> with a filter (script)?
> 
> No, the indentation space are simply removed. There is no way to recover them.

 From a quick look, it seems like the problem goes beyond that.
Note that the leading 'X' is stripped from most -- but not all -- lines
of "encoded" files.

>>> Does it mean that all sources that were posted to Usenet are now
>>> lost for us forever?
>>> Is there any other way to access the old Usenet messages in their original
>>> form?

As you appear to be the owner of the file (and presumably have another
copy stashed away), you might try reposting it as a SHAR but uuencoded,
first.  I would suspect that would be more robust wrt whatever pretty-printing
algorithm google is trying to impose.

Or, just keep a copy on some other public archive.

pi&#261;tek, 27 listopada 2020 o&nbsp;03:21:39 UTC+1 Don Y napisa&#322;(a):
> On 11/26/2020 4:11 PM, Wojciech Zabo&#322;otny wrote: 
> > A few Usenet groups allowed users to post their source code as shar archive. 
> > The Google Groups website supported access to those groups, viewing the 
> > message in the original (raw) format and upacking the sources. 
> > Unfortunately, last update of Google Groups has dropped a possibility 
> > to access the original of the Usenet posts. 
> > The "formatted" (in fact corrupted) version of the message does not 
> > allow to unpack the (now damaged) shar archive.
> I don't understand what you mean by "corrupted"? Do you have a 
> pointer to an example that I can examine (without a google login)? 
> 

Here you are: https://groups.google.com/g/alt.sources/c/YeeAV3fBAVc/m/AZgPoFxS4NYJ
The Python code has completely removed indentation.

> Are you sure the "corruption" can't be stripped from the post 
> with a filter (script)?

No, the indentation space are simply removed. There is no way to recover them.

> > Does it mean that all sources that were posted to Usenet are now 
> > lost for us forever? 
> > Is there any other way to access the old Usenet messages in their original 
> > form?

On Thu, 26 Nov 2020 19:01:33 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 11/26/2020 4:11 PM, Wojciech Zabo?otny wrote:
>> A few Usenet groups allowed users to post their source code as shar archive.
>> The Google Groups website supported access to those groups, viewing the
>> message in the original (raw) format and upacking the sources.
>> Unfortunately, last update of Google Groups has dropped a possibility
>> to access the original of the Usenet posts.
>> The "formatted" (in fact corrupted) version of the message does not
>> allow to unpack the (now damaged) shar archive.
>
>I don't understand what you mean by "corrupted"?  Do you have a
>pointer to an example that I can examine (without a google login)?
>
>Are you sure the "corruption" can't be stripped from the post
>with a filter (script)?

The junk is HTML formatting.  The worry is that things like C++ source
legitimately may contain angle bracket delimited text.  You'd need a
smart filter that understands HTML tags.

And there may be a *lot* of it. I've seen usenet messages sent (or
forwarded) from Google Groups with ... not kidding! ... ~10,000 lines
of deeply nested HTML surrounding ~10 lines of text.

>> Does it mean that all sources that were posted to Usenet are now
>> lost for us forever?
>> Is there any other way to access the old Usenet messages in their original
>> form?

Since Google has removed the option to see the raw message, the only
way to get things unmangled is from some other source.

Unfortunately few NNTP servers go back further than about 10 years,
and ftp.uu.net (the original usenet archive) is no longer operating.

You can try 
  https://usenetarchives.com/ or
  https://www.crunchbase.com/organization/the-usenet-archive.

Many(most?) of the historically popular groups are available, and that
includes pretty much everything in the comp.* and sci.* hierarchies.
But searching is not easy, and if you're looking for something
esoteric you may not find it.

George

On 11/26/2020 4:11 PM, Wojciech Zabo&#322;otny wrote:
> A few Usenet groups allowed users to post their source code as shar archive.
> The Google Groups website supported access to those groups, viewing the
> message in the original (raw) format and upacking the sources.
> Unfortunately, last update of Google Groups has dropped a possibility
> to access the original of the Usenet posts.
> The "formatted" (in fact corrupted) version of the message does not
> allow to unpack the (now damaged) shar archive.

I don't understand what you mean by "corrupted"?  Do you have a
pointer to an example that I can examine (without a google login)?

Are you sure the "corruption" can't be stripped from the post
with a filter (script)?

> Does it mean that all sources that were posted to Usenet are now
> lost for us forever?
> Is there any other way to access the old Usenet messages in their original
> form?