Optimizing PDF Document Features

Bookmarks

Datalogics::PDFL::PDFOptimizerDiscardBookmarks

PDF documents often use bookmarks to help in navigating from one section or page to another.  The bookmarks appear on the left side of the viewer window, in the form of a Table of Contents, and are commonly attached to headings within the document. A user can click on any value in the Table of Contents and move directly to the part of the page where the bookmark is found. Bookmarks can also be linked to cross references within other pages in the same document.

But bookmarks have no effect on the appearance of a PDF document, and take up some space. You can make a PDF document smaller by removing the bookmarks from that document.

Default Value PDFOptimizerDiscardBookmarks ON

Page Thumbnails

Datalogics::PDFL::PDFOptimizerDiscardThumbNails

A thumbnail is a small graphic image of a page in a PDF document, used to preview pages in a PDF document and appearing in a panel on the left side of the viewer window. The thumbnails aid in navigating through a document, as a user can scroll through a series of thumbnails to find a page he or she is looking for. Thumbnails are useful, but they no longer need to be stored in PDF documents because most viewers generate thumbnails automatically. If you have a PDF document with thumbnails, however, you might save considerable space in that document if you delete them.

Default Value PDFOptimizerDiscardThumbnails ON

Output Intent

Datalogics::PDFL::PDFOptimizerDiscardOutputIntent

The output intent is an entry in the PDF document OutputIntents array. The output intent is used to describe how the destination device for the document, most likely a printer, reproduces the colors in the document. Specifically, the output intent describes the ICC Color space to use for rendering the document. The ICC profile is stored in the PDF document itself. If the output intent is present, rendering will be to that profile.

As the ICC profile can be quite large, you might want to remove it to reduce the size of the PDF document, if you don’t have plans to print the document in the future. But if it is important to preserve color fidelity, it should not be removed.

Default Value PDFOptimizerDiscardOutputIntent OFF

Page Labels

Datalogics::PDFL::PDFOptimizerDiscardPageLabels

You can add “names,” or labels, to the pages in a PDF document, to describe those pages, rather than simply relying on page numbers. The page label replaces the page number provided on the navigator bar.

You can add any text you like to a page label, including a page number, and to any range of pages in the document that you select. In Adobe Acrobat, the page labels feature is found on the Page Numbers window, accessed from the Thumbnail viewing pane (right click and select “Number Pages”). The page label text appears for each thumbnail in the Thumbnail viewing panel on the left side of the viewer window.

You can delete page labels without affecting the appearance of the PDF document, to make the document smaller. As these labels only appear when a document is opened in Adobe Acrobat, and the labels are applied to a widget used within Adobe Acrobat, you may want to remove them.

Default Value PDFOptimizerDiscardPageLabels ON

Name Trees

Datalogics::PDFL::PDFOptimizerDiscardNameTrees

A name tree is a type of dictionary that is often used as a data structure in PDF files. Unlike a standard dictionary, instead of using a code value as a key to identify an object, a name tree uses names as keys to map to data objects. The name in a given dictionary must be associated with an object of a given type.

If you discard a name tree in a PDF document you might save considerable space. But you would do this at the risk of removing navigation information that might cause links to other documents and to embedded documents to fail. So don’t remove the name tree from a document unless you know that future readers will be willing to tolerate the result.

Default Value PDFOptimizerDiscardNameTrees ON

Piece Data

Datalogics::PDFL::PDFOptimizerDiscardPieceData

The creator of a PDF document can add private data to the document using a page-piece dictionary. This data, known as piece data, can be used by external applications to work with the PDF document in ways other than simply rendering the PDF in a viewing tool or sending it to a printer. Adobe Illustrator, for example, can be used to post content to a page-piece dictionary in a PDF that can only be read by Adobe Illustrator. Commonly piece data describes the structure of the PDF; Adobe Photoshop can add information to a PDF document to describe how layers are used in that document.

You can enable PDFOptimizerDiscardPieceData to remove this metadata from a PDF document. Because the piece data is only relevant to the application that added it to the PDF document, it can be removed without any effect to the way the PDF renders. But the values will no longer be available to applications that might need to access them again later.

Default Value PDFOptimizerDiscardPieceData ON

Structure Trees

Datalogics::PDFL::PDFOptimizerDiscardStructureTrees

The structure of a document is described by a hierarchy of objects called the structure tree. As with piece data, applications that create PDF documents can add values, or tags, to the structure tree dictionary to access it later. And like piece data, you can enable the DiscardStructureTrees option to remove this data from a PDF document to make it smaller without affecting how the PDF document is rendered. Many PDF documents do not have structure trees, but if a structure tree is removed from a PDF document the loss might be more broadly detected by viewing tools used to display the document, as well as by custom applications.

Default Value PDFOptimizerDiscardStructureTrees OFF