PDF OPTIMIZER

Setting up your Profile

Your JSON profile file (or files) should include a list of settings that define exactly what kinds of changes you want to apply to your PDF document. You can make your custom profile file as long or as short as you need, depending on the types of changes you plan to apply to optimize your PDF document.

The PDF OPTIMIZER offers a lot of different methods to optimize of a PDF document. The options you select will depend on how you want to change your output PDF documents, and on your goals.

Suppose you have a PDF document that is 18 MB and you want to make it smaller so that the file will be easier to distribute online. If you expect that your readers will be opening the file in a browser window, and it doesn’t matter if the photographs and diagrams in the document appear with a lower resolution, you could make the document smaller by compressing the graphics included in the file.

On the other hand, if you are working with a large PDF document that your customers are likely to want to print, but you want to make it smaller so that it downloads and prints more quickly, you probably want to leave the graphics alone. They will need to appear as sharp as possible. But you don’t need interactive content, like form fields, bookmarks, comments, or digital signatures. You can use PDF OPTIMIZER to remove everything from the PDF document that will not appear on paper.

Or maybe you are building a PDF document that you intend for people to read on smart phones and other mobile devices. In this case you want to resize the document so that it opens as quickly as possible. So you would reduce the size of the images in this case as well, but given that the screens are a lot smaller than a laptop or desktop monitor, you can reduce the resolution of the images in the PDF document to be less than a PDF document that is intended for opening in a browser window.

All of the settings for compressing a PDF document in the PDF OPTIMIZER are optional, and turned off by default. That means then that a setting is only applied if it is included in the JSON file. Flag settings must be set to “ON.” Settings that are turned off do not need to be defined in the JSON profile file. So if you wanted to you could create a custom JSON file with only a single setting, to compress images. Your JSON file might only hold five or six lines of text.

Only use lowercase characters for the keys and values you add to the JSON file.

The methods you can use to optimize a PDF document are sorted into six categories:

Images When a PDF document is created that includes photographs, diagrams or drawings, the original graphic file, such as a JPEG photograph or a PNG image, in saved in the PDF document itself.  You can enter settings in the JSON profile file to compress these color, gray scale, or black & white graphic files.
Fonts PDF documents travel with the fonts that they need to access to properly render text.  A set of font files (often .ttf or .otf) are saved within the PDF itself.  That way, no matter what machine is used to open a PDF file, the PDF is always guaranteed to look the same, and the viewing tool does not need to look for substitute fonts installed on the local desktop or laptop.  But these embedded font files can make the PDF larger, maybe a lot larger, if the document needs to express characters from an Asian font set, like Mandarin or Japanese. You can enter settings in your JSON profile file to remove individual font characters or sets that you don’t need, thus reducing the size of the PDF file.
Transparency It is possible to stack objects, such as graphics, images, text boxes, and form fields, on top of each other on a PDF document.  These objects can be partially or fully transparent, and thus can interact in various ways with objects behind them. If a set of transparencies are stacked in a PDF file, each one contributes to the final result that appears on the page, such as the colors blending together into a final color that appears.

To make a PDF document simpler—and usually, smaller—you can flatten these transparencies.  The flattening process combines the layers of content on a PDF page, or a stack of transparent images or colors, and renders the result as a single image, blended color, or set of text.  For example, if a digital signature is flattened, the digital certificate key and related properties are removed from the signature field. The name of the person who signed the document and related information, such as the date and time stamp and the signer’s email address, appear on the page as text, but the signature field is no longer interactive.

Objects Besides graphic images and font files, a variety of other objects can be saved within a PDF document.

  • Blocks of JavaScript code
  • Thumbnail images
  • Bookmarks
  • Tags
  • Alternate graphics images

PDF OPTIMIZER is designed to allow you to remove any of these objects from a PDF document. This serves to make the document smaller and easier to distribute.

User Data It is possible to edit PDF documents using Adobe Acrobat and other viewing and editing tools.  For example, when reviewing the content in a PDF document, a user might want to add a comment.  It is also possible to attach external files to a PDF document so that the file is saved as a part of the PDF, or embed a hyperlink to a web page.  Finally, a user could add metadata.  PDF OPTIMIZER can remove any of this content. It can also remove form fields, such as text boxes, check boxes, and radio buttons.
Cleanup Use the Cleanup features in PDF OPTIMIZER to set compression values for a PDF document.  Think of compression as zipping a file, to reduce its size.  You can compress the entire PDF document or parts of the content, and you can also remove redundant content or select a compression method to use, as well as other changes designed to make a PDF document open more quickly.

Images

Resampling Images

If you have photographs or other kinds of graphics images embedded in a PDF document that you want to make smaller, and you know that these images don’t need to have a high resolution in the output file, you can reduce the resolution of these images. You can also compress these images within the file. Both steps will reduce the final size of the PDF document.

In PDF OPTIMIZER, this process is called resampling. You can choose to resample color images in a PDF document, or grayscale, or monochrome (black & white). The settings for reducing the resolution for these three kinds of images in a PDF document must be added separately to the JSON profile file. Each type of graphic can have its own settings and resolution values. So you could, for example, enable resampling to only apply to the color images in a PDF document. Or you could include only grayscale and black and white images.

Downsampling and Recompression

The resampling process that PDF OPTIMIZER uses involves downsampling and recompression. Downsampling reduces the size of the image directly by reducing the resolution. In recompression, compressed images in a document are expanded and then compressed again. You can enter a recompression setting to change the compression algorithm used for recompression, such as ZIP, JPEG or
Flate, and another setting to change the final image quality after recompression is complete. The image quality is part of the compression method used.

Downsampling will decompress and then recompress images in a document, because any image must be decompressed first before it can be downsampled. Then it recompresses the image. If you add settings in the JSON profile file to downsample images, PDF OPTIMIZER will also recompress the images involved whether you provide recompression settings or not.

If you do not add recompression settings to the JSON profile, PDF OPTIMIZER downsamples and recompresses each image in the PDF document using the default compression algorithm and quality value defined in the image itself. For example, if you provide downsample settings but not recompression settings in your JSON profile, and apply that profile to a document that only holds JPEG images, PDF Optimizer will use the JPEG compression method. It will also use the highest quality recompression setting available (“maximum”) to keep from reducing the quality of the images as they are recompressed.

On the other hand, if you decide to leave out downsample settings from your JSON profile file, but add recompression settings, PDF OPTIMIZER will recompress the images using the recompression algorithm you provide while keeping the image downsampling resolution (DPI) the same. Note that if you add recompression settings you must include both values in the JSON file, the compression algorithm and the recompression quality level.

Image Resolution

When we refer to the resolution of an image, we generally refer to the number of pixels in that image. This can be expressed in terms of megapixels, or in Dots per Inch (DPI). With an image in a PDF document, the resolution of the image is expressed as a certain number of pixels wide and pixels high. The downsampling process involves changing the width and height of an image in pixels, in order to reach a given target resolution. PDF OPTIMIZER calculates the resolution for every image in the document. Keep in mind that the resolution values used with downsampling are distinct from the image quality settings used for image recompression.

If the same image appears multiple times within a single PDF document, and the document is downsampled, PDF OPTIMIZER downsamples every copy of this image. The system uses the version of the image with the highest resolution as the reference point for downsampling for all of the other copies of this same image.

You can specify a target resolution to use for downsampling images in a document (target-dpi) and a trigger resolution (trigger-dpi). If you decide to downsample a type of image, both the target and the trigger resolution settings must be included in your profile file. The target resolution defines the goal—the maximum resolution for every image in the file. So if you add a target resolution to your JSON profile and set that target resolution to 600 DPI, PDF OPTIMIZER will downsample every graphic in the PDF document to 600 DPI unless it that image is already at 600 DPI or less.

The trigger resolution, if used, defines the resolution PDF OPTIMIZER uses as its starting point. Any image with a resolution greater than the trigger resolution will be downsampled. If an image has a resolution less than the trigger resolution, PDF OPTIMIZER ignores it.

So if you set the trigger resolution to 800 DPI, and the target resolution to 400 DPI, it means that you want to downsample every image in the PDF document to 400 DPI, but only if the image is larger than 800 DPI to begin with. In this example you would be telling PDF OPTIMIZER to look for only the really large images (the ones with a resolution at 800 DPI or more) and then downsample just those images to a certain set value, in this example 400 DPI.

If the trigger resolution is 500 DPI, and the target resolution is 400 DPI, PDF OPTIMIZER will not downsample an image if it is 480 DPI. But if the trigger resolution is 500 and the target is 400, if PDF OPTIMIZER finds an image with a resolution of 680 DPI, it will downsample it to 400 DPI.

A Few Examples of JSON Profile Settings

This example shows settings used to downsample and recompress color JPEG images in a PDF document:

        "color": {
            "downsample": {
                "trigger-dpi": 225,
                "target-dpi": 150
            },
            "recompress": {
                "type": "jpeg",
                "quality": "medium"
            }
        },

Here images with a resolution above 225 DPI will be resized to 150 DPI, using the JPEG compression algorithm and set at “medium” quality. The quality levels (low, medium, high) are defined within the JPEG algorithm itself.

In this example, we don’t provide recompress settings.

        "color": {
            "downsample": {
                "trigger-dpi": 225,
                "target-dpi": 150
            },
        },

PDF OPTIMIZER will use the same compression algorithm, JPEG, because these types of images are by default JPEG compressed. But the quality setting will be “Maximum” because the software tries to maintain the same level of compression quality as found in the original JPEG images.
Finally, consider this example:

        "color": {
            "downsample": {
                "trigger-dpi": 225,
                "target-dpi": 150
            },
            "recompress": {
                "type": "zip"
                "quality": "medium"
            }
        },

This refers to the same type of image in the PDF document, color JPEG images. PDF OPTIMIZER will downsample the color images to 150 DPI if they are above 225 DPI, but uses the ZIP compression algorithm, rather than using the default JPEG compression method.

Images: Color

downsample Ability to specify a target resolution and a trigger resolution at which color images will be recompressed.
trigger-dpi All color images above this resolution will be resampled.
target-dpi The new resolution of resampled color images.
recompress Sets the type and quality of compression used to resample color images. Jpeg compression is a compression format used for rendering photographs as image files.  It is also known as dct, Discrete Cosine Transform.
type same Keep original default compression algorithm provided in the images themselves
 zip Use ZIP compression
 jpeg Use JPEG compression
 jpeg2000 Use JPEG2000 compression
 quantity These values are valid for JPEG and JPEG2000 compression only:
 minimum
 low
 medium
 high
 maximum
 lossless Original quality of the graphic is preserved (see Monochrome below). Only available for JPEG2000.

Images: Grayscale

downsample Ability to specify a target resolution and a trigger resolution at which grayscale images will be recompressed.
trigger-dpi All grayscale images above this resolution will be resampled.
target-dpi The new resolution of resampled grayscale images.
recompress Sets the type and quality of compression used to resample grayscale images.  Jpeg compression is a compression format used to render photographs as image files.  Also known as dct, Discrete Cosine Transform.
type same Keep original default compression algorithm provided in the images themselves
zip Use ZIP compression
jpeg Use JPEG compression
jpeg2000 Use JPEG2000 compression
quantity These values are valid for JPEG and JPEG2000 compression only:
minimum
low
medium
high
maximum
lossless Original quality of the graphic is preserved (see Monochrome below). Only available for JPEG2000.

Images: Monotone

downsample Ability to specify a target resolution and a trigger resolution at which monochrome images will be recompressed.
trigger-dpi All monochrome images above this resolution will be resampled.
target-dpi The new resolution of resampled monochrome images.
recompress Sets the type and quality of compression used to resample monochrome images. Jpeg compression is a compression format used for rendering photographs as image files. It is also known as dct, Discrete Cosine Transform.

JBIG2 is a compression algorithm designed for binary images, or images where each pixel can only have one of two possible colors. For PDF OPTIMIZER JBIG2 is used for black and white images. It can be used for either lossy or lossless image processing.

CCITT Group 4 refers to the compression type from the International Telegraph and Telephone Consultative Committee (CCITT),or TIU. Many fax and document imaging file formats support this form of lossless data compression encoding. These protocols are referred to as CCITT Group 3 and Group 4 compression, respectively.

Lossy and lossless refer to the approach used for compressing data. For lossless, all of the data in the image is preserved. The quality of the image does not change, and it can be uncompressed to its original state. Lossy compression permanently removes data from the image file, such as pixels, reducing the image resolution. Files reduced using lossy compression will be considerably smaller, but will not print or display as well as those compressed using lossless compression.

type same Keep original default compression algorithm provided in the images themselves
jbig2 Use jbig2 compression
ccittg3 Use ccittg3 compression
ccittg4 Use ccittg4 compression
quantity lossy Valid for jbig2 only
lossless Valid for jbig2 only

Images: Color, Grayscale, or Monotone

Optional. Enable any of these values by adding it to the profile and setting it equal to ON.

optimize-only-if-reduction-in-size Set this value to ON if you want PDF OPTIMIZER to only downsample an image found in a PDF document if the newly downsampled image is in fact smaller than the original. When the downsampling process is combined with recompression, the output file that results can actually expand in size.  If the process yields an image that is the same size as the original, or larger, PDF OPTIMIZER will leave the image alone.
consolidate-duplicate-image-and-forms Remove duplicate copies of alternate images and forms. This feature merges identical forms or images, as determined by an MD5 hash of their content.  The MD5 algorithm is used to create a 128 bit hash value that serves as a digest of a document or of a message of any length.  This hash value can be used to verify the original data later against any attempt to change the content or to make sure that the content was not corrupted during transmission. If hash values are created for images or form values in a PDF document, these values can be compared to each other to identify forms or images that are identical within that document.

Fonts

Optional. Enable any of these values by adding it to the profile and setting it equal to ON.

subset-embedded-fonts Subsetting fonts removes unused characters from font files embedded in the PDF.

It is a best practice when working with PDF documents to embed all of the fonts used in that document into the document itself. That way, the viewing tool (like Acrobat) does not have to look for a font on the local system, or choose a substitute. But embedding font files in a PDF document can make the PDF quite large, especially if the PDF has embedded a font file for an Asian language, such as Mandarin, with tens of thousands of characters.  To avoid this, a subset of the characters in the font can be saved in the PDF document. The subset font only includes the characters you expect to need when rendering the pages of that document. This often leads to a much smaller PDF.

Note that with a subset font the reader of the PDF file might not be able to edit the file, such as by using the Adobe Acrobat editing tools. This is because some new characters that the reader may use might not be included in the subset font. 

consolidate-duplicate-fonts Remove multiple copies of the same font file.

Fonts are commonly embedded in a document to make sure that the PDF can be rendered on any platform, as described above.  The fonts travel with the PDF, so the file will open and display properly whether the same fonts are installed on the local machine or not. It is also possible to save space by including a subset of a font in a PDF document.  Sometimes, however, PDF documents are created with multiple copies of the same font, either as multiple subsets or multiple, fully embedded copies of a font file. When multiple copies of the same font appear, they may be merged into a single font.

Sometimes different versions of a font in a PDF document share the same name. In this event these named fonts are not merged.

Transparency

quality The resolution level to use when flattening transparent objects. The higher the level of quality, the better the final output in print or in a browser window. But the resulting PDF document will also be larger.
low-quality Line art and text 288 DPI, gradients 144 DPI
medium-quality Line art and text 300 DPI, gradients 150 DPI
high-quality Line art and text 1200 DPI, gradients 300 DPI

Objects

Optional. Enable any of these values by adding it to the profile and setting it equal to ON.

discard-javascript-actions Removes JavaScript content. Blocks of JavaScript code can be added to a PDF to complete a function or calculate a value, such as a user’s age when that person enters his or her birth date in a form field.
discard-alternate-images Removes alternate versions of the same image found in the PDF document. A PDF document can be set up to specify alternate images, or multiple versions of one image within the same document. These images can be used to meet different needs. For example, a PDF could present one image with a lower resolution for display on a monitor, and an alternate image with a higher resolution to use when the PDF document is sent to a printer. These images can make a PDF document very large.  Today alternate images are rarely used.
discard-thumbnails Removes document thumbnails.  Thumbnail images are used to preview pages in a PDF document and appear in a panel on the left side of the viewer window.  A user could scroll through a series of thumbnails to find a page he or she is looking for.
discard-document-tags Removes document tags. Tags are sometimes added to PDF documents to enable external applications, like Adobe Illustrator, to view and manipulate those documents if needed. Note that bookmarks no longer appear in most PDF documents, because they can be generated automatically as needed by modern viewing tools.
discard-bookmarks Removes document bookmarks. Bookmarks make navigation easier.  They appear on the left side of the viewer window, in the form of a Table of Contents, and are commonly attached to headings within the document. A user can click on any value in the Table of Contents and move directly to the part of the page where the bookmark is found.  And bookmarks can be used apart from a table of contents to mark a place in a PDF document to navigate to.

User Data

Optional. Enable any of these values by adding it to the profile and setting it equal to ON.

discard-comments-forms-multimedia Removes changes added to the PDF document, such as comments from a reviewer or completed forms fields.
discard-document-information-and-metadata Removes document descriptions and metadata
discard-file-attachments Removes files attached to the document
discard-external-crossreferences Removes references to external data. This would include links to external resources, like a photograph or another PDF document that could be downloaded from a web page or FTP site.  This option effectively removes the hyperlinks to these items.
discard-private-data Removes piece data relevant to the application that created the file. Some applications, like Adobe Illustrator, add their own unique values to a PDF document when generating that document.  These values are useful to the original software product if the PDF is opened and edited in that product again.  But these values can also be removed.

Cleanup

compression Selects the compression action for the file.
compress-entire-file Compress document as a single unit
compress-document-structure Compress the document structure only
remove-compression Removes compression from file streams

Optional. Enable any of these values by adding it to the profile and setting it equal to ON.

flate-encoded-unencoded-streams Compress un-compressed streams using flate.
convert-lzw-to-flate Recompress LZW-compressed streams using flate.
optimize-page-content Removes redundant content streams, or page text.
optimize-for-fast-web-view Place all the information needed to render the first page of the document near the beginning of the file.

Flate, or Deflate compression, is an open source standard widely used for creating zip files and with PDF. It is commonly used for PNG image files, and is much more widely used than lzw.

Lzw, Lempel-Ziv-Welch, is a universal data compression algorithm, once widely used with Unix platforms. This method appears in some old PDF documents but it is rarely used any longer.