EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Changes in Google Groups - sources posted to Usenet lost forever?

Started by Wojciech Zabołotny November 26, 2020
A few Usenet groups allowed users to post their source code as shar archive. 
The Google Groups website supported access to those groups, viewing the
message in the original (raw) format and upacking the sources.
Unfortunately, last update of Google Groups has dropped a possibility
to access the original of the Usenet posts.
The "formatted" (in fact corrupted) version of the message does not 
allow to unpack the (now damaged) shar archive.
Does it mean that all sources that were posted to Usenet are now
lost for us forever?
Is there any other way to access the old Usenet messages in their original
form?

TIA & Regards,
Wojtek


On Thursday, November 26, 2020 at 6:11:13 PM UTC-5, Wojciech Zabołotny wrote:
> A few Usenet groups allowed users to post their source code as shar archive. > The Google Groups website supported access to those groups, viewing the > message in the original (raw) format and upacking the sources. > Unfortunately, last update of Google Groups has dropped a possibility > to access the original of the Usenet posts. > The "formatted" (in fact corrupted) version of the message does not > allow to unpack the (now damaged) shar archive. > Does it mean that all sources that were posted to Usenet are now > lost for us forever? > Is there any other way to access the old Usenet messages in their original > form? > > TIA & Regards, > Wojtek
I think you are referring to the indication Google gives that you can't view the "original message" because of email protections or something similar. Seems very goofy, but there it is. But bear in mind that Google Groups is not usenet. However, retention of usenet posts varies with the access provider and is seldom "forever". -- Rick C. - Get 1,000 miles of free Supercharging - Tesla referral code - https://ts.la/richard11209
Dnia 26.11.2020 Rick C <gnuarm.deletethisbit@gmail.com> napisa&#322;/a:
> On Thursday, November 26, 2020 at 6:11:13 PM UTC-5, Wojciech Zabo&#322;otny wrote: >> A few Usenet groups allowed users to post their source code as shar archive. >> The Google Groups website supported access to those groups, viewing the >> message in the original (raw) format and upacking the sources. >> Unfortunately, last update of Google Groups has dropped a possibility >> to access the original of the Usenet posts. >> The "formatted" (in fact corrupted) version of the message does not >> allow to unpack the (now damaged) shar archive. >> Does it mean that all sources that were posted to Usenet are now >> lost for us forever? >> Is there any other way to access the old Usenet messages in their original >> form? >> >> TIA & Regards, >> Wojtek > > I think you are referring to the indication Google gives that you can't view the "original message" because of email protections or something similar. Seems very goofy, but there it is. > > But bear in mind that Google Groups is not usenet. However, retention of usenet posts varies with the access provider and is seldom "forever". >
I'll just quote the thread: https://support.google.com/groups/thread/61391913?hl=en&msgid=61725204 "Google bought the Usenet archive from Dejanews. Having now "banned" these groups there is no way to access the historically important posts spanning decades. Pointing at another Usenet service that provides access to current posts is irrelevant. If Google are unwilling to continue to host the archive they should donate it to the internet archive or a similar group. Erasing history is not acceptable." With best regards, Wojtek
On Thursday, November 26, 2020 at 6:35:07 PM UTC-5, Wojciech Zabo&#322;otny wrote:
> Dnia 26.11.2020 Rick C <gnuarm.del...@gmail.com> napisa&#322;/a: > > On Thursday, November 26, 2020 at 6:11:13 PM UTC-5, Wojciech Zabo&#322;otny wrote: > >> A few Usenet groups allowed users to post their source code as shar archive. > >> The Google Groups website supported access to those groups, viewing the > >> message in the original (raw) format and upacking the sources. > >> Unfortunately, last update of Google Groups has dropped a possibility > >> to access the original of the Usenet posts. > >> The "formatted" (in fact corrupted) version of the message does not > >> allow to unpack the (now damaged) shar archive. > >> Does it mean that all sources that were posted to Usenet are now > >> lost for us forever? > >> Is there any other way to access the old Usenet messages in their original > >> form? > >> > >> TIA & Regards, > >> Wojtek > > > > I think you are referring to the indication Google gives that you can't view the "original message" because of email protections or something similar. Seems very goofy, but there it is. > > > > But bear in mind that Google Groups is not usenet. However, retention of usenet posts varies with the access provider and is seldom "forever". > > > I'll just quote the thread: https://support.google.com/groups/thread/61391913?hl=en&msgid=61725204 > > "Google bought the Usenet archive from Dejanews. Having now "banned" these groups there is no way to access the historically important posts spanning decades. > Pointing at another Usenet service that provides access to current posts is irrelevant. > If Google are unwilling to continue to host the archive they should donate it to the internet archive or a similar group. > Erasing history is not acceptable." > > With best regards, > Wojtek
Google may have bought "a" usenet archive, but my understanding is there is no one archive of usenet. https://www.fastusenet.org/blog/what-is-dejanews-where-did-it-go.html They provide 12 years of retention. Seems like it should be easy to not delete anything, but I guess the use of usenet is not one of those things that is growing exponentially making the previous usage small in comparison. -- Rick C. + Get 1,000 miles of free Supercharging + Tesla referral code - https://ts.la/richard11209
On 11/26/2020 4:11 PM, Wojciech Zabo&#322;otny wrote:
> A few Usenet groups allowed users to post their source code as shar archive. > The Google Groups website supported access to those groups, viewing the > message in the original (raw) format and upacking the sources. > Unfortunately, last update of Google Groups has dropped a possibility > to access the original of the Usenet posts. > The "formatted" (in fact corrupted) version of the message does not > allow to unpack the (now damaged) shar archive.
I don't understand what you mean by "corrupted"? Do you have a pointer to an example that I can examine (without a google login)? Are you sure the "corruption" can't be stripped from the post with a filter (script)?
> Does it mean that all sources that were posted to Usenet are now > lost for us forever? > Is there any other way to access the old Usenet messages in their original > form?
On Thu, 26 Nov 2020 19:01:33 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 11/26/2020 4:11 PM, Wojciech Zabo?otny wrote: >> A few Usenet groups allowed users to post their source code as shar archive. >> The Google Groups website supported access to those groups, viewing the >> message in the original (raw) format and upacking the sources. >> Unfortunately, last update of Google Groups has dropped a possibility >> to access the original of the Usenet posts. >> The "formatted" (in fact corrupted) version of the message does not >> allow to unpack the (now damaged) shar archive. > >I don't understand what you mean by "corrupted"? Do you have a >pointer to an example that I can examine (without a google login)? > >Are you sure the "corruption" can't be stripped from the post >with a filter (script)?
The junk is HTML formatting. The worry is that things like C++ source legitimately may contain angle bracket delimited text. You'd need a smart filter that understands HTML tags. And there may be a *lot* of it. I've seen usenet messages sent (or forwarded) from Google Groups with ... not kidding! ... ~10,000 lines of deeply nested HTML surrounding ~10 lines of text.
>> Does it mean that all sources that were posted to Usenet are now >> lost for us forever? >> Is there any other way to access the old Usenet messages in their original >> form?
Since Google has removed the option to see the raw message, the only way to get things unmangled is from some other source. Unfortunately few NNTP servers go back further than about 10 years, and ftp.uu.net (the original usenet archive) is no longer operating. You can try https://usenetarchives.com/ or https://www.crunchbase.com/organization/the-usenet-archive. Many(most?) of the historically popular groups are available, and that includes pretty much everything in the comp.* and sci.* hierarchies. But searching is not easy, and if you're looking for something esoteric you may not find it. George
pi&#261;tek, 27 listopada 2020 o&nbsp;03:21:39 UTC+1 Don Y napisa&#322;(a):
> On 11/26/2020 4:11 PM, Wojciech Zabo&#322;otny wrote: > > A few Usenet groups allowed users to post their source code as shar archive. > > The Google Groups website supported access to those groups, viewing the > > message in the original (raw) format and upacking the sources. > > Unfortunately, last update of Google Groups has dropped a possibility > > to access the original of the Usenet posts. > > The "formatted" (in fact corrupted) version of the message does not > > allow to unpack the (now damaged) shar archive. > I don't understand what you mean by "corrupted"? Do you have a > pointer to an example that I can examine (without a google login)? >
Here you are: https://groups.google.com/g/alt.sources/c/YeeAV3fBAVc/m/AZgPoFxS4NYJ The Python code has completely removed indentation.
> Are you sure the "corruption" can't be stripped from the post > with a filter (script)?
No, the indentation space are simply removed. There is no way to recover them.
> > Does it mean that all sources that were posted to Usenet are now > > lost for us forever? > > Is there any other way to access the old Usenet messages in their original > > form?
On 11/27/2020 5:40 AM, Wojciech Zabolotny wrote:
> pi&#261;tek, 27 listopada 2020 o 03:21:39 UTC+1 Don Y napisa&#322;(a): >> On 11/26/2020 4:11 PM, Wojciech Zabo&#322;otny wrote: >>> A few Usenet groups allowed users to post their source code as shar archive. >>> The Google Groups website supported access to those groups, viewing the >>> message in the original (raw) format and upacking the sources. >>> Unfortunately, last update of Google Groups has dropped a possibility >>> to access the original of the Usenet posts. >>> The "formatted" (in fact corrupted) version of the message does not >>> allow to unpack the (now damaged) shar archive. >> I don't understand what you mean by "corrupted"? Do you have a >> pointer to an example that I can examine (without a google login)? > > Here you are: https://groups.google.com/g/alt.sources/c/YeeAV3fBAVc/m/AZgPoFxS4NYJ > The Python code has completely removed indentation.
Indentation and whitespace /tend/ to be insignificant to the operation of the code. Of course, presence in string literals is a different story -- where even replacing tabs with spaces is a hazard.
>> Are you sure the "corruption" can't be stripped from the post >> with a filter (script)? > > No, the indentation space are simply removed. There is no way to recover them.
From a quick look, it seems like the problem goes beyond that. Note that the leading 'X' is stripped from most -- but not all -- lines of "encoded" files.
>>> Does it mean that all sources that were posted to Usenet are now >>> lost for us forever? >>> Is there any other way to access the old Usenet messages in their original >>> form?
As you appear to be the owner of the file (and presumably have another copy stashed away), you might try reposting it as a SHAR but uuencoded, first. I would suspect that would be more robust wrt whatever pretty-printing algorithm google is trying to impose. Or, just keep a copy on some other public archive.
On 11/26/2020 9:14 PM, George Neuner wrote:
> > On Thu, 26 Nov 2020 19:01:33 -0700, Don Y > <blockedofcourse@foo.invalid> wrote:
Hi George! Have not heard from you in a while -- was beginning to think that you may have been coviderated! Hopefully, that's not the case (?)
>> On 11/26/2020 4:11 PM, Wojciech Zabo?otny wrote: >>> A few Usenet groups allowed users to post their source code as shar archive. >>> The Google Groups website supported access to those groups, viewing the >>> message in the original (raw) format and upacking the sources. >>> Unfortunately, last update of Google Groups has dropped a possibility >>> to access the original of the Usenet posts. >>> The "formatted" (in fact corrupted) version of the message does not >>> allow to unpack the (now damaged) shar archive. >> >> I don't understand what you mean by "corrupted"? Do you have a >> pointer to an example that I can examine (without a google login)? >> >> Are you sure the "corruption" can't be stripped from the post >> with a filter (script)? > > The junk is HTML formatting. The worry is that things like C++ source > legitimately may contain angle bracket delimited text. You'd need a > smart filter that understands HTML tags.
Or, scrape the posts manually? E.g., highlight text in browser, copy, paste? If posted as an "image" of text (to deliberately hinder capture), a screen capture program feeding an OCR... and manual touch-up. Though, having seen Wojciech's example, it appears that there is more involved than just eliding HTML tags! I've not actively studied the (apparent) transformation to try to codify the rules that may have been applied...
> And there may be a *lot* of it. I've seen usenet messages sent (or > forwarded) from Google Groups with ... not kidding! ... ~10,000 lines > of deeply nested HTML surrounding ~10 lines of text. > >>> Does it mean that all sources that were posted to Usenet are now >>> lost for us forever? >>> Is there any other way to access the old Usenet messages in their original >>> form? > > Since Google has removed the option to see the raw message, the only > way to get things unmangled is from some other source.
For a small-ish post, I'd wager you could scrape (as above) and manually edit the resulting text to something that's faithful to the original intent. Tedious and potentially error prone but denies "lost forever".
> Unfortunately few NNTP servers go back further than about 10 years, > and ftp.uu.net (the original usenet archive) is no longer operating. > > You can try > https://usenetarchives.com/ or > https://www.crunchbase.com/organization/the-usenet-archive. > > Many(most?) of the historically popular groups are available, and that > includes pretty much everything in the comp.* and sci.* hierarchies. > But searching is not easy, and if you're looking for something > esoteric you may not find it.
Some of the better known sources are also available on FTP servers. E.g., I think Vixie's cron(8) is available like this.
On Sat, 28 Nov 2020 00:53:43 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 11/26/2020 9:14 PM, George Neuner wrote: >> >> On Thu, 26 Nov 2020 19:01:33 -0700, Don Y >> <blockedofcourse@foo.invalid> wrote: > >Hi George! > >Have not heard from you in a while -- was beginning to think that you >may have been coviderated! Hopefully, that's not the case (?)
Nope. I had a viral flu in early 2018 that had eerily similar symptoms to what is claimed for Covid-19: I was really sick with respiratory problems for ~5 weeks, and it was ~14 weeks before I really felt well again. I was never hospitalized, so that virus was never identified, but I'm hoping that was a coronavirus because some studies in Europe found that prior exposure to other coronaviruses *may* give some increased resistance to this one. In any event, I don't have your current email.
>>> Are you sure the "corruption" can't be stripped from the post >>> with a filter (script)? >> >> The junk is HTML formatting. The worry is that things like C++ source >> legitimately may contain angle bracket delimited text. You'd need a >> smart filter that understands HTML tags. > >Or, scrape the posts manually? E.g., highlight text in browser, >copy, paste?
Laborious if someone posted a long program.
>If posted as an "image" of text (to deliberately hinder capture), >a screen capture program feeding an OCR... and manual touch-up.
Yuck! On average OCR still makes ~1 mistake per line.
>Though, having seen Wojciech's example, it appears that there is >more involved than just eliding HTML tags! I've not actively studied >the (apparent) transformation to try to codify the rules that may >have been applied...
The problem there is Python. For almost any other language, your idea of scraping it manually would work. For Python, you have to understand the logic to reinstate the required indentation. I have always been opposed to significant whitespace in a language. George

The 2024 Embedded Online Conference