> If you look at the *content* of the Atmel PDF, you will see that a
> LOT of it (> 50MB!) is "Document Overhead" -- stuff that isn't
> pertinent to the actual delivered *content*. I suspect the authoring
> tools that they used for the illustrations AND text have added
> "private data" that they did not make an effort to STRIP from the
> final document. Things like "previews"/thumbnails for the
> illustrations, etc. I haven't looked in detail at the content
> to identify the tools that they actually used for these things...
> ("it's not my job..." :> )
Out of curiosity, I started looking at some of the PDFs that I have
prepared over the years. I had to search for something that was
"nonproprietary" that I could publish without having to research any
potential legal constraints placed upon me, etc.
This is a description of a little project I undertook to prepare a
revolving globe animation (using tools readily available to me
without investing any serious time or money -- it was a pro bono
effort).
Looking at the sources for the document, the rough sizes are:
Text 64K
Formatting... 160K
Fig 1 7105K
Fig 2 7357K
Fig 3 7K
Fig 4 13K (7+6)
Fig 5 20K
Fig 6 260K
Fig 7 260K
Fig 8 260K
Fig 9 84K
Fig 10 7K
Fig 11 7K
Fig 12 20K
Fig 13 23K
Fig 14 132K
Fig 23 12K (6+6)
Fig 24 3K (2+1)
Fig 25 12K (6+6)
Fig 26 8K (7+1)
Fig 27 2K (1+1)
Fig 28 9K (2+3+2+2)
Fig 29 25K (8+8+9)
Other items (figures and tables not explicitly mentioned) are included
in the Text and Formatting metrics.
This shows a cumulative size of about 15700K for the sources. Note that
I do not preprocess any of the sources prior to embedding them in the
document! (i.e., any compression happens in the chosen tools themselves).
The resulting PDF is about 1792K. Of this, a space audit reveals:
Thumbnails 44 (something I chose to make available)
Images 400
Bookmarks 10 (ditto)
Content 1200
Fonts 40 (primarily the header font)
Links 8
Destinations 80
Overhead 43 (1720 if scaled ~40-fold to Atmel's size!)
Colorspace 5
Xrefs 35
(note that there are many "invisible" links that I didn't elect to
advertise in my layout choices. E.g., things you can click on that
don't casually appear to be clickable)
Note that the downsampling I have specified for the images still allows
them to scale pretty well -- to at least 300% before their inherent
graininess becomes apparent.
If I add a 30MB animated GIF to the PDF (not present in the file
referenced, here), I must first convert it to a (270MB) AVI for
embedding (I'm not looking to optimize, here... just trying to QUICKLY
see what the tools naturally do!) and the final file increases by about
40MB.
The PDF can be found at:
http://www.mediafire.com/view/tybgm763jfbqp8y/AZRevolvingZGlobeZAnimation.pdf
Replace each 'Z' in the above URL with a '_'
Reply by Don Y●August 21, 20142014-08-21
Hi Simon,
On 8/21/2014 6:11 AM, Simon Clubley wrote:
> On 2014-08-20, Don Y<this@is.not.me.com> wrote:
>> On 8/20/2014 12:59 PM, Simon Clubley wrote:
>>>
>>> However you look at it, the Atmel PDFs are massively oversized
>>> (especially since I remember the size they _used_ to be. :-()
>>>
>>> BTW, pdfinfo (which I used along with some bash shell scripting to
>>> produce the above table) reveals both Atmel and Microchip use
>>> FrameMaker and Acrobat Distiller to produce the PDFs.
>>
>> As I said previously, they are clearly doing something "wrong" -- or,
>> have an entirely different set of expectations for the document that
>> isn't apparent to the casual user. Pointy-clicky interfaces tend to
>> produce users who think ALL they need to do is "point and click"!
>>
>> Atmel-8285-8-bit-AVR-Microcontroller-ATmega165A_PA_325A_PA_3250A_PA_645A_P_6450A_P_datasheet.pdf
>> has a
>> downloaded size of 62,477,528 bytes.
>>
>> I created a version that is 7,102,901 bytes. Had I access to the
>> original "sources" -- and, more motivation (beyond a 90 second
>> investment) -- I am sure *something* smaller than 60MB is easily
>> possible!
>
> Thanks for doing that experiment Don.
>
> That ~7MB size is pretty much the kind of size I would consider
> reasonable based on the historical Atmel PDF sizes.
If you look at the *content* of the Atmel PDF, you will see that a
LOT of it (> 50MB!) is "Document Overhead" -- stuff that isn't
pertinent to the actual delivered *content*. I suspect the authoring
tools that they used for the illustrations AND text have added
"private data" that they did not make an effort to STRIP from the
final document. Things like "previews"/thumbnails for the
illustrations, etc. I haven't looked in detail at the content
to identify the tools that they actually used for these things...
("it's not my job..." :> )
I would guesstimate that the "ideal" delivered size would be in that
~7M ballpark -- perhaps a bit higher for some of the dynamic aspects
of the document.
>> Off for the dreaded weekly shopping excursion :<
>
> Hope you had fun. :-)
Never. Worst part of my week -- EVERY week! :< Unfortunately,
"gotta eat"!
Reply by Simon Clubley●August 21, 20142014-08-21
On 2014-08-20, Don Y <this@is.not.me.com> wrote:
> On 8/20/2014 12:59 PM, Simon Clubley wrote:
>>
>> However you look at it, the Atmel PDFs are massively oversized
>> (especially since I remember the size they _used_ to be. :-()
>>
>> BTW, pdfinfo (which I used along with some bash shell scripting to
>> produce the above table) reveals both Atmel and Microchip use
>> FrameMaker and Acrobat Distiller to produce the PDFs.
>
> As I said previously, they are clearly doing something "wrong" -- or,
> have an entirely different set of expectations for the document that
> isn't apparent to the casual user. Pointy-clicky interfaces tend to
> produce users who think ALL they need to do is "point and click"!
>
> Atmel-8285-8-bit-AVR-Microcontroller-ATmega165A_PA_325A_PA_3250A_PA_645A_P_6450A_P_datasheet.pdf
> has a
> downloaded size of 62,477,528 bytes.
>
> I created a version that is 7,102,901 bytes. Had I access to the
> original "sources" -- and, more motivation (beyond a 90 second
> investment) -- I am sure *something* smaller than 60MB is easily
> possible!
>
Thanks for doing that experiment Don.
That ~7MB size is pretty much the kind of size I would consider
reasonable based on the historical Atmel PDF sizes.
> Off for the dreaded weekly shopping excursion :<
Hope you had fun. :-)
Simon.
--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
Reply by Don Y●August 20, 20142014-08-20
On 8/20/2014 12:59 PM, Simon Clubley wrote:
> On 2014-08-20, Anders.Montonen@kapsi.spam.stop.fi.invalid<Anders.Montonen@kapsi.spam.stop.fi.invalid> wrote:
>> Simon Clubley<clubley@remove_me.eisner.decus.org-earth.ufp> wrote:
>>> Microchip still manage to do what Atmel used to do and represent similar
>>> amounts of information in much smaller PDFs. The datasheet for the
>>> PIC32MX2xx range is 330 pages and just over 6MBytes in size.
>>
>> That's not really comparable, as Microchip put most of the detailed
>> information into the family reference manuals.
>
> Actually, it's comparable in so far as it's the same type of text and
> diagram layouts between the Atmel and Microchip documents.
>
> However, your mention of the PIC32 FRM caused me to take a closer look
> at it. I have two versions of the FRM to hand: an old monolithic version
> from 2008 and the PDFs making up the segmented version which I downloaded
> about 2.5 years ago going by the date stamps.
>
> The old monolithic version is 12484111 bytes and contains 1138 pages.
>
> The combined full size of the PDFs I have to hand from the segmented
> version is ~20MBytes and contains 1155 pages in total. Here's the
> breakdown:
>
> Pages: 60 1333485 61104E.pdf
> Pages: 38 542626 61105E.pdf
> Pages: 60 766764 61106G.pdf
> Pages: 43 819879 61107F.pdf
> Pages: 26 499771 61108F.pdf
> Pages: 12 162893 61109F.pdf
> Pages: 18 386578 61110E.pdf
> Pages: 42 898097 61111E.pdf
> Pages: 42 753430 61112G.pdf
> Pages: 84 1018966 61113D.pdf
> Pages: 16 211886 61114F.pdf
> Pages: 40 517551 61115F.pdf
> Pages: 58 973471 61116E.pdf
> Pages: 60 916284 61117F.pdf
> Pages: 16 479610 61118F.pdf
> Pages: 30 875700 61119E.pdf
> Pages: 22 404513 61120E.pdf
> Pages: 22 242560 61121E.pdf
> Pages: 24 381784 61122F.pdf
> Pages: 8 116072 61124F.pdf
> Pages: 34 392210 61125E.pdf
> Pages: 72 949130 61126F.pdf
> Pages: 8 130893 61127D.pdf
> Pages: 56 1002481 61128F.pdf
> Pages: 22 300819 61129E.pdf
> Pages: 14 166593 61130F.pdf
> Pages: 98 2031416 61154B.pdf
> Pages: 106 2645428 61155B.pdf
> Pages: 24 478674 61167A.pdf
>
> However you look at it, the Atmel PDFs are massively oversized
> (especially since I remember the size they _used_ to be. :-()
>
> BTW, pdfinfo (which I used along with some bash shell scripting to
> produce the above table) reveals both Atmel and Microchip use
> FrameMaker and Acrobat Distiller to produce the PDFs.
As I said previously, they are clearly doing something "wrong" -- or,
have an entirely different set of expectations for the document that
isn't apparent to the casual user. Pointy-clicky interfaces tend to
produce users who think ALL they need to do is "point and click"!
Atmel-8285-8-bit-AVR-Microcontroller-ATmega165A_PA_325A_PA_3250A_PA_645A_P_6450A_P_datasheet.pdf
has a
downloaded size of 62,477,528 bytes.
I created a version that is 7,102,901 bytes. Had I access to the
original "sources" -- and, more motivation (beyond a 90 second
investment) -- I am sure *something* smaller than 60MB is easily
possible!
Off for the dreaded weekly shopping excursion :<
Reply by Simon Clubley●August 20, 20142014-08-20
On 2014-08-20, Anders.Montonen@kapsi.spam.stop.fi.invalid <Anders.Montonen@kapsi.spam.stop.fi.invalid> wrote:
> Simon Clubley <clubley@remove_me.eisner.decus.org-earth.ufp> wrote:
>> Microchip still manage to do what Atmel used to do and represent similar
>> amounts of information in much smaller PDFs. The datasheet for the
>> PIC32MX2xx range is 330 pages and just over 6MBytes in size.
>
> That's not really comparable, as Microchip put most of the detailed
> information into the family reference manuals.
>
Actually, it's comparable in so far as it's the same type of text and
diagram layouts between the Atmel and Microchip documents.
However, your mention of the PIC32 FRM caused me to take a closer look
at it. I have two versions of the FRM to hand: an old monolithic version
from 2008 and the PDFs making up the segmented version which I downloaded
about 2.5 years ago going by the date stamps.
The old monolithic version is 12484111 bytes and contains 1138 pages.
The combined full size of the PDFs I have to hand from the segmented
version is ~20MBytes and contains 1155 pages in total. Here's the
breakdown:
Pages: 60 1333485 61104E.pdf
Pages: 38 542626 61105E.pdf
Pages: 60 766764 61106G.pdf
Pages: 43 819879 61107F.pdf
Pages: 26 499771 61108F.pdf
Pages: 12 162893 61109F.pdf
Pages: 18 386578 61110E.pdf
Pages: 42 898097 61111E.pdf
Pages: 42 753430 61112G.pdf
Pages: 84 1018966 61113D.pdf
Pages: 16 211886 61114F.pdf
Pages: 40 517551 61115F.pdf
Pages: 58 973471 61116E.pdf
Pages: 60 916284 61117F.pdf
Pages: 16 479610 61118F.pdf
Pages: 30 875700 61119E.pdf
Pages: 22 404513 61120E.pdf
Pages: 22 242560 61121E.pdf
Pages: 24 381784 61122F.pdf
Pages: 8 116072 61124F.pdf
Pages: 34 392210 61125E.pdf
Pages: 72 949130 61126F.pdf
Pages: 8 130893 61127D.pdf
Pages: 56 1002481 61128F.pdf
Pages: 22 300819 61129E.pdf
Pages: 14 166593 61130F.pdf
Pages: 98 2031416 61154B.pdf
Pages: 106 2645428 61155B.pdf
Pages: 24 478674 61167A.pdf
However you look at it, the Atmel PDFs are massively oversized
(especially since I remember the size they _used_ to be. :-()
BTW, pdfinfo (which I used along with some bash shell scripting to
produce the above table) reveals both Atmel and Microchip use
FrameMaker and Acrobat Distiller to produce the PDFs.
Simon.
--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
Reply by Don Y●August 20, 20142014-08-20
Hi Ivan,
On 8/20/2014 6:37 AM, Ivan Shmakov wrote:
>>>>>> Don Y<this@is.not.me.com> writes:
>
> > The resolution of an image is relatively easily controlled. There's
> > a point at which it doesn't make sense to support "infinite zoom".
> > (e.g., a photo of an author on a dust jacket)
>
> > Text, OTOH, if done properly, *can* be zoomed indefinitely -- as it
> > is rendered from a *model* ("font") and not stored as an image.
>
> The diagrams, if done properly, can be zoomed indefinitely just
> as well. Simply because PDF allows for not only "model-based"
> fonts, but also rather arbitrary graphics.
Yes, but you have to choose how the diagram/image is represented.
E.g., a TIFF will, eventually, become pixelated. OTOH, vector art
will scale just like a "font" (same approach). E.g., use something
like Illustrator to prepare your "figures", not "Paint".
> > Of course, this assumes the document preparer knows how to separate
> > text annotations from the "images" to which they apply!
>
> To me, the most apparent issue with Atmel PDFs was the design
> change; FWIW, I can no longer be sure that I get a nice result
> when printing one on a monochrome laser printer. But I hope to
> check the sizes shortly, too, -- who knows, the issue may be
> easy to solve just by "converting" PDF to PDF with suitable
> GhostScript settings. (Or perhaps I'll try PDF::API2.)
You can use color in documents -- but, you have to consider *which*
colors and in which combinations. You need sufficient "value change"
to give acceptable contrast. E.g., using yellow callouts on a white
field is probably going to disappear when rendered monochromatically.
This shouldn't be rocket science. Prepare document. Print it.
See what it looks like. Revise color choices/layout/etc.
Ditto for file size.
Adobe products have lots of redundancy in their file formats.
Previews, etc. Make sure the stuff you actually need -- and no
more! -- is dragged into the PDF.
Also, think about what is creating the PDF. Sometimes, a "PDF printer
driver" produces a smaller PDF!
Reply by Ivan Shmakov●August 20, 20142014-08-20
>>>>> Don Y <this@is.not.me.com> writes:
>>>>> On 8/19/2014 11:42 AM, rickman wrote:
>>>>> On 8/19/2014 9:43 AM, nahum.bush wrote:
>>> Yes, I think the images in the datasheet maybe is too large, and
>>> then the settings of the PDF convert/ create tools is not tuned
>>> properly.
>> Be careful what you ask for, you might just get it. I'd rather have
>> images in a PDF file that were large, but with sufficient detail
>> that I can zoom in and see them clearly than to have a smaller file
>> and not be able to read the fine print. I can't tell you how many
>> times I have not been able to read text contained in a diagram or
>> sometimes even the caption/title of the figure.
> Two different issues!
> The resolution of an image is relatively easily controlled. There's
> a point at which it doesn't make sense to support "infinite zoom".
> (e.g., a photo of an author on a dust jacket)
> Text, OTOH, if done properly, *can* be zoomed indefinitely -- as it
> is rendered from a *model* ("font") and not stored as an image.
The diagrams, if done properly, can be zoomed indefinitely just
as well. Simply because PDF allows for not only "model-based"
fonts, but also rather arbitrary graphics.
> Of course, this assumes the document preparer knows how to separate
> text annotations from the "images" to which they apply!
To me, the most apparent issue with Atmel PDFs was the design
change; FWIW, I can no longer be sure that I get a nice result
when printing one on a monochrome laser printer. But I hope to
check the sizes shortly, too, -- who knows, the issue may be
easy to solve just by "converting" PDF to PDF with suitable
GhostScript settings. (Or perhaps I'll try PDF::API2.)
--
FSF associate member #7257 http://boycottsystemd.org/ 3013 B6A0 230E 334A
Reply by ●August 20, 20142014-08-20
Simon Clubley <clubley@remove_me.eisner.decus.org-earth.ufp> wrote:
> Microchip still manage to do what Atmel used to do and represent similar
> amounts of information in much smaller PDFs. The datasheet for the
> PIC32MX2xx range is 330 pages and just over 6MBytes in size.
That's not really comparable, as Microchip put most of the detailed
information into the family reference manuals.
-a
Reply by Simon Clubley●August 19, 20142014-08-19
On 2014-08-19, rickman <gnuarm@gmail.com> wrote:
> On 8/19/2014 9:43 AM, nahum.bush wrote:
>> Yes, I think the images in the datasheet maybe is too large, and then the
>> settings of the PDF convert/ create tools is not tuned properly.
>
> Be careful what you ask for, you might just get it. I'd rather have
> images in a PDF file that were large, but with sufficient detail that I
> can zoom in and see them clearly than to have a smaller file and not be
> able to read the fine print. I can't tell you how many times I have not
> been able to read text contained in a diagram or sometimes even the
> caption/title of the figure.
>
All I can tell you is that new revisions of the same AVR datasheets have
been jumping in size over the last few years, but this jump to 35Mbytes
is a new low for Atmel.
That's 35Mbytes for 660 pages. I still have a ATMega168 datasheet from
2007 which is 376 pages and that is under 5Mbytes (and all the diagrams
are perfectly readable).
Microchip still manage to do what Atmel used to do and represent similar
amounts of information in much smaller PDFs. The datasheet for the
PIC32MX2xx range is 330 pages and just over 6MBytes in size.
The user manual for the LPC81x range (cute little MCU BTW) is 370 pages
and just under 2Mbytes in size (although there appears to not be as many
diagrams in that case).
Oh, and Atmel dumping that stupid distracting blue bar at the top of
each page of it's current datasheets would be a good thing.
Simon.
--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
Reply by Don Y●August 19, 20142014-08-19
On 8/19/2014 11:42 AM, rickman wrote:
> On 8/19/2014 9:43 AM, nahum.bush wrote:
>> Yes, I think the images in the datasheet maybe is too large, and then the
>> settings of the PDF convert/ create tools is not tuned properly.
>
> Be careful what you ask for, you might just get it. I'd rather have
> images in a PDF file that were large, but with sufficient detail that I
> can zoom in and see them clearly than to have a smaller file and not be
> able to read the fine print. I can't tell you how many times I have not
> been able to read text contained in a diagram or sometimes even the
> caption/title of the figure.
Two different issues!
The resolution of an image is relatively easily controlled. There's
a point at which it doesn't make sense to support "infinite zoom".
(e.g., a photo of an author on a dust jacket)
Text, OTOH, if done properly, *can* be zoomed indefinitely -- as it
is rendered from a *model* ("font") and not stored as an image.
Of course, this assumes the document preparer knows how to separate
text annotations from the "images" to which they apply!