Optimizing Images

Generally, when the Adobe PDF Library optimizes a PDF document, the system properly converts all the images in that document so that the colors remain consistent wherever the document is printed or displayed.

If an embedded image has a resolution that differs from the default resolution available at a target device, the Library will convert the image to the resolution for that target device. When creating a PDF document, then, it is best to include the highest resolution images that a potential rendering device can accept. That way, if the PDF needs to be downsampled for other renderings, these other renderings start from the original high resolution image. So if you plan to send a PDF document to a printer with 1200 dots per inch resolution, you might want to use that resolution for images in your PDF document.

If you plan to send a PDF document as an email attachment, however, or transfer it using the File Transfer Protocol, you will probably want to make the PDF documents as small as you can. If you know that the images presented in the PDF document no longer need a high resolution—that is, you know that they are unlikely to need to be printed again—you may want to reduce the resolution you use for the document.

This section discusses downsampling and recompression with images in PDF documents. The downsampling process reduces the resolution of an image, and recompression involves using a different compression method to more efficiently reduce the size of an image. Because image recompression and downsampling are dependent on color type, we set up three classes for these functions, for color, gray scale, and black & white images.

We describe the parameters for optimizing images in this section.

Removing Alternate Images

Datalogics::PDFL::PDFOptimizerDiscardAlternateImages

A PDF document can be set up to specify alternate images, or multiple versions of one image within the same document. These images can be used to meet different needs. In a common example, a PDF document could present one image with a lower resolution for display on a monitor, and an alternate image with a higher resolution to use when the PDF document is sent to a printer.

But alternate images can take a lot of space within the document, making the PDF a lot bigger. If you know that higher resolution alternate images are no longer needed, because the PDF document will not need to be printed, these images can be removed from the file to reduce its size.

When enabled, all alternate images will be discarded, and only the primary image will be retained. Note that this process may discard higher resolution images in favor of images with lower resolutions.

This feature is seldom used. Few PDF documents need to have alternate images any longer, because most rendering systems these days are able to adjust image resolution to match a display.  When it is used, it can save considerable space. But the rendered quality of an image may decline.

Default Value PDFOptimizerDiscardAlternateImages ON

Downsampling

Datalogics::PDFL::PDFOptimizerDownsampleColor
Datalogics::PDFL::PDFOptimizerDownsampleGray
Datalogics::PDFL::PDFOptimizerDownsampleBW
Datalogics::PDFL::PDFOptimizerDownSampleRecompressOnlyIfSmaller

If you know that you do not need high resolution images embedded in a PDF document, you can downsample these images to a lower resolution using one of these three options. This reduces the size of the PDF document itself.

Note that an image in a PDF document does not have a resolution as such. It is simply a certain number of pixels wide and pixels high. And a single image used three places in a document can have three different effective resolutions. The downsampling process involves changing the width and height of an image in pixels, in order to reach a given target resolution.

If an image appears multiple times within a single PDF document, and the document is downsampled, every copy of this image will be downsampled.  The version that has the highest resolution will be used as the reference point for downsampling for all of the other copies of this same image.

PDF Optimizer provides three classes for images, color, gray scale, and black & white (monochrome), and the product assigns each image in a PDF document to one of these classes. You can enable downsampling for images in a PDF document separately for each class, and with separate resolution limits. So you could, for example, enable downsampling to only apply to the color images in a PDF document. You can also specify a maximum resolution (DPI) for each of these classes, so that PDF Optimizer downsamples any color image in a document with an effective resolution greater than the resolution specified for the color image class.

PDF Optimizer also calculates the resolution for every image in the document wherever it appears, and you can set a target resolution (target DPI) for the document. For each image in the document that is downsampled, the resolution for that image can be reduced to match this effective target resolution. If an image already has a resolution less than the target resolution, PDF Optimizer does not downsample that image. If an image appears more than once in a document, and the effective resolution of these versions of the image do not match, PDF Optimizer finds the highest effective resolution for this set of images and downsamples them all so that the effective resolution of each image matches the target resolution.

Downsampling and recompression

Whenever you downsample an image, you need to select a recompression method because the system needs to decompress and then recompress that image.

Note that if the original compression schema for the image was lossy, using the Jpeg compression format, or JP2K (a lighter version of Jpeg), you may lose some definition within the image through downsampling.

If you set the option PDFOptimizerDownSampleRecompressOnlyIfSmaller to be true, downsampling only happens if the compressed image that results is smaller than the original compressed image (see Recompression below).

Downsampling and system management

Downsampling a PDF document can take a lot of time and system resources, so the PDF Optimizer is designed to make this process as productive as possible. If the downsampling process starts with a PDF document at 8.4 MB and reduces its size only to 8.2 MB, you might as well not bother to downsample the file at all, especially if this PDF document is only one of dozens or hundreds of PDF files you would seek to downsample as part of a batch process. That’s why the PDF Optimizer uses default limits in image resolution. For example, if a color image is more than 225 DPI, it can be downsampled to 150 DPI by default. If the color image is only 200 DPI, the PDF Optimizer ignores it. With the modest savings in storage space, the time and processing capacity needed to downsample the image is not worth the cost. Also, PDF Optimizer will not downsample any images in a PDF document that are smaller than 10 pixels wide or 10 pixels high.

Downsampling and color images

Downsampling requires that the colors be specified in a colorant with an even gradient, but this actually works only for Gray scale and for the color models RGB (Red/Green/Blue) and CMYK (Cyan/Magenta/Yellow/Black). So these images will be considered color images:

  • Images with three or four colors that are specified in DeviceRGB, CalRGB, DeviceCMYK, Lab, or ICC profiles.
    The ICC Color Profile is a specification developed by the International Color Consortium (ICC) in 1993. The ICC Profile was intended to provide a standard for color and color management across all operating systems, platforms, and software and hardware and software systems. Color profiles are based on this standard.
    A Lab color space is a CIE-based ABC color space with two transformation stages. In this type of space, A, B, and C represent the L*, a*, and b* components of a CIE 1976 L*a*b* space. This color space holds all perceivable colors, and is known for device independence.
  • Images with DeviceN color spaces using only process colorants, and having sizes of 8 bit per sample.
    Adobe Systems introduced DeviceN to allow systems to combine color channels for composite printing, such as drawing colors from the Pantone Hexachrome color system. The DeviceN allows for printing with an arbitrary number of color components, and thus it can use a wider range of colors.

For more about the Lab color space see section 8.6.5 of the ISO 32000-1:2008 document, page 144.

Images that are specified in DeviceGray, CalGray, or DeviceN color spaces, with a sample size above 1 bit, are gray images.

PDF Optimizer does not downsample these types of images:

  • Images having a Bits Per Sample other than 1 or 8
  • Images having a color model of Indexed, Pattern, or Separation
  • Images with a color space of DeviceN and a non-process colorant, or more than four process colorants
Default Value PDFOptimizerDownsampleColor ON
Default Value PDFOptimizerDownsampleGray ON
Default Value PDFOptimizerDownsampleBW ON
Default Value PDFOptimizerDownSampleRecompressOnlyIfSmaller ON
Color and Gray image types Target DPI 150
Maximum DPI 225
Black and White image types Target DPI 300
Maximum DPI 450

Recompression

Datalogics::PDFL::PDFOptimizerRecompressColor
Datalogics::PDFL::PDFOptimizerRecompressGray
Datalogics::PDFL::PDFOptimizerRecompressBW

PDF documents can also be optimized by recompression.  In recompression, compressed images in a document are uncompressed, and then compressed again, using a different compression method. This process can save storage space depending on the size of the PDF document and the original compression method used.

The PDF Optimizer will not explicitly recompress a PDF image that has already been downsampled. Downsampling by its nature will tend to implicitly decompress and then recompress images in a document, because any image must be decompressed to be downsampled, and then will be recompressed with the selected recompression method.

Image data in a PDF document is always a simple stream of binary data. Image data can be encoded in any of the PDF compression filters in order to conserve space.

Requesting recompression will cause the image stream to be uncompressed (if needed), and recompressed using the requested filter, such as Jpeg or Flate. If the image is already compressed using the specified filter, the Library will not recompress it.

Note that if an image stored in lossy format is recompressed the quality of the image might decline.

The Adobe PDF Library will not recompress an image if the target compression type for that image is set to NONE or SAME.

Default Value PDFOptimizerRecompressColor ON
Default Value PDFOptimizerRecompressGray ON
Default Value PDFOptimizerRecompressBW ON
Color and Gray image types recompression set to JPEG
medium quality
Black and White image types recompression set to CCITT Group 4

CCITT Group 4 refers to the compression type from the International Telegraph and Telephone Consultative Committee (CCITT), also known as TIU. Many fax and document imaging file formats support this form of lossless data compression encoding. The CCITT is a standards organization that has developed a series of communications protocols for the facsimile transmission of black-and-white images over telephone lines and data networks. These protocols are known officially as the CCITT T.4 and T.6 standards but are more commonly referred to as CCITT Group 3 and Group 4 compression, respectively.

Jpeg compression is a compression format used for rendering photographs as JPG images.  It is also known as dct, Discrete Cosine Transform.

Image recompression methods and quality levels
Valid recompression methods Default
Color Flate/JPEG/JPEG2000 JPEG
Grayscale Flate/JPEG/JPEG2000 JPEG
Black & White Flate/CCITT G4/JBIG2 JBIG2
Available quality levels Default
JPEG Maximum/high/medium/low/minimum Medium
JPEG2000 Lossless/maximum/high/medium/low/minimum Medium
JBIG2 Lossless/maximum/high/medium/low/minimum Lossless

Convert 16 bit-per-component images to 8 bit-per-component images

Datalogics::PDFL::PDFOptimizerDownConvert16To8BpcImages

When enabled, images that are 16 bits per component will be converted to 8 bits per component. The color depth of an image is the number of bits used per pixel for each color component. RGB, for example, has three color components. By down-converting an image in a PDF file from 16 bpc to 8 bpc, you are reducing the resolution of the image, but also significantly reducing its size. If a PDF document features high-resolution images, the final PDF can also be significantly smaller.

Default Value PDFOptimizerDownConvert16To8BpcImages ON