Adobe PDF Library

Converting PDF Pages to Images

DocToImages

View Sample Code

Use DocToImages to convert a PDF file to a series of graphics images, one per page.  Suppose you have a 30 page PDF file.  You could use DocToImages to render this PDF file as a series of 30 individual JPG or PNG graphics files, one file per page.  You can select the file format, graphics size, or resolution, or choose only specific pages (like pages 1 and 2-7) to convert to a graphic file, besides a variety of other variables to determine what your output looks like. You can also write the pages of a PDF document or some of the pages in the PDF document to a multi-page TIF image using the “multi” parameter.

The program DocToImages requires you to enter the formatting values manually in the command line.

Here is a typical command statement.  Enter the program name, the export file format to use (such as JPG), and the name of the source PDF file, in this example test.pdf.  The source PDF file is also found in the /Sample_Source folder:

java -Djava.library.path=target/lib -cp target/maven-samples-15.8.2.jar com.datalogics.pdfl.samples.Images.DocToImages.DocToImages -format=png -color=rgb test.pdf

To add more parameters, simply line them up in the statement.  You can add as many parameters as you like to a single command.

The version number shown in the name of the maven-samples JAR file will vary, depending on the version of the Adobe PDF Library installation file you seek to install. The maven-samples JAR file can be found in the Java/Sample_Source/target folder.

The program will generate a series of image files as output.

-format Required. Enter the graphical file format to use, tif, jpg, bmp, png, or gif.
DocToImages –format=jpg
-color Optional. Choose the color format for the images.  The options are gray, RGB (Red/Green/Blue), and CMYK (Cyan/Magenta/Yellow/Key, or black).
RGB is the additive color model, where the absence of all color is black.  CMYK is the subtractive color model, where the absence of all color is white. That means that with CMYK white is the natural color of the paper or background before colors are applied. Defaults to RGB.
DocToImages –color=gray
-grayhalftone Optional.  Y or N.  Y will set the image to halftone.  This is only valid for the TIF format. Defaults to N.
DocToImages –format=tif –grayhalftone=y
-first Optional.  Y or N.  Y will tell the program to only convert the first page of the PDF file, and ignore the rest.  Defaults to N.
DocToImages –first=y
-quality Optional. Numeral, 1 to 100.  This is the JPG Quality Scale, an arbitrary scale used to describe the print or display quality for a JPG image. This was created specifically for JPG images.  In fact the quality scale describes the compression ratio.  The default value is 75.  The higher the quality number, the lower the compression and the better the image.  But the higher the quality number, the larger the JPG file that results.
DocToImages –format=jpg –quality=95
-resolution Optional. Enter a value or set of values to define the resolution for the graphics images, in Dots per Inch (DPI).  The resolution must be from 12 to 1200.
The first number is the horizontal resolution, and the second is the vertical resolution.  It is expressed as 480×640, for example.  If you enter a single value, the same number will be used for both horizontal and vertical resolution.  The higher the resolution, the higher the quality of the image, but graphics files with high resolutions take longer to generate and are considerably larger.  Defaults to 300.
DocToImages –resolution=480x640
Or
DocToImages –resolution=600
-fontlist Optional. Use this option to define a directory or set of directories where you have fonts are stored that you want the Viewer to use with your PDF file.  You may enter up to 16 directories, each one separated by a semi-colon (;).  You can use this option to provide a font list to the JavaViewer when you open it.
This parameter would be useful if you created a PDF file with a specialized font or set of fonts, perhaps a font that your firm custom created for use with the firm’s own advertising materials.  Datalogics provides a list of fonts in the Resources directory, and by default it will look there for fonts to use when converting the PDF file.  But if the system can’t find a font defined in the PDF, it will use a substitute. To prevent that, define the directory where your custom font is found. This will tell the program to look for the font file in that directory first.
This will tell the program to look for the font file in that directory first.Make sure you surround the entire list of path names for font directories with quotation marks.
DocToImages –fontlist="Z:\ColorManager\Resources\Fonts; C:\ColorWiseFont"
-pixels Optional. Enter the final absolute size of the graphics image in pixels, expressed as width by height.  If you don’t enter a pixels value, each of your graphics images will be auto-scaled from the original page in the PDF file.  But you can use this parameter to tell the function to tell the system to generate a PNG file that is exactly 650 pixels wide and 340 pixels high.
DocToImages –pixels=650x340
-compression Optional.  Set a compression value for generating graphics files.  The default value is no compression (none).  Note that None is only valid as an output format for BMP, PNG, or TIF.You must enter a compression value for JPG or GIF images.

  • Flate, or Deflate compression, an open source standard widely used for creating zip files and with PDF. Only valid for PNG.
  • LZW, Lempel-Ziv-Welch, a universal data compression algorithm, widely used with Unix platforms. Only valid for GIF. This method appears in some old PDF documents but it is rarely used any longer.
  • g3, Group 3 compression, a universal compression standard for fax documents. Only valid for gray images with grayhalftone set to Y.
  • g4, Group 4 compression, a universal compression standard for fax documents. Only valid for gray images with grayhalftone set to Y.
  • dct, Discrete Cosine Transform, a compression format used rendering photographs as JPG images.  Only valid for JPG, and also known as Jpeg compression.

Note that g3 and g4 refer to the CCITT compression format, and is limited to black and white images. It was designed for use with fax documents, but it is also still commonly used for pre-press work with black and white photographs. It can significantly reduce the size of an image without a loss in quality.

DocToImages –format=jpg –compression=dct
. . .
DocToImages –format=tif –grayhalftone=Y –compression=g3
-region Optional.  Define the part of the PDF page to rasterize as a graphics file, or take a vector graphic in a PDF file and convert it to a bitmap file.  The region parameter defines how the program manages the boundaries of the export graphic file.  The possible values include:

  • Crop. Defines the region for clipping or cropping the graphic for display or print. Crop, unlike the other settings for region, has no default, defined size or geometry; with the crop value it is possible to provide additional information to manually define the margins of the image that are selected and exported.  If these values are not provided, the crop value will match the media value.
  • Media. Defines the boundaries of the actual page where the graphics image will be printed. In this case the media setting may include an extended area around the graphic on the printed page.  This area can be used for printing marks on a proof copy, for example.  The media value provides the widest possible margins around the graphics image.
  • Art. Defines the logical extent of the content of the graphic.  This is the smallest of margins.  With art, only the graphic itself is exported, without any space around the image.
  • Trim. Defines the intended dimensions of the finished graphic after trimming to fit the page.  This will be smaller than the media and bleed settings, but wider than the art setting.
  • Bleed. Defines the region where the contents of the page will be clipped.  It may include an extra area surrounding the graphic to allow for the physical limitations of cutting, folding, and printing equipment. This value is the second widest margin around the graphics image, after media.
  • Bounding. Usually the smallest possible rectangle that can hold all of the content on the page.  It is possible, however, for Bounding to include objects that fall outside the borders of the PDF page, such as a particularly wide Bezier curve.

Defaults to Crop. Note that most modern software products that work with PDF files handle these values automatically.

DocToImages –region=art
-pages Optional.  Enter a list or range of pages to include in the export, with each page number or range of page numbers separated by a comma. If you have a 100 page PDF file but only want to convert the first three pages and pages 8 and 10 to JPG files, you could use this command to create five JPG files for those five pages.  You can also enter “even” or “odd.”
Defaults to all pages in the original PDF file.
DocToImages –pages=2,4,7-10,14
Or
DocToImages –pages=odd
-output Optional.  Enter the name you would like to apply to the export file or files. For example, you could use this parameter to export graphics from a PDF file called market.PDF to a series of export files called graphic01.JPG, graphic02.JPG, and so on.  The default value would be to create an export file with a name that matches the name of the import file, in this example, watermark.jpg.
DocToImages –format=png –output=graphic.png
-smoothing Optional.  Select smoothing for the text in a PDF file (“text”), or for text and images (“all”), or no smoothing at all (“none”).   Defaults to none.
DocToImages –smoothing=text
-reverse Optional.  Y or N.  Reverse black to white and white to black for gray images.  Defaults to N.
DocToImages –color=gray –reverse=Y
-blackisone Optional.  Y or N.  Reverse black to white and white to black for gray scale TIF images only. Defaults to N.
DocToImages –format=tif –grayhalftone=y –blackisone=y
-multi Optional.  Y or N.  Generate a multipage TIF file.  This is only valid for the TIF format. Defaults to N.
DocToImages –format=tif –multi=y
-digits Optional.  Enter a number 0 to 9 to define the size of the fixed digit suffix to add to the file name.  The system will automatically increment this value by one for each new output file generated.  For example, if you set the digits value equal to 2, the system will assign a two-digit suffix to each export file name, and add 1 to the value for each file that it creates, as in test27.jpg, test28.jpg, test29.jpg, and so on.  If you want to export hundreds of images from a single PDF file, this would be a useful way to track the output files.  If you set the digit value to 7, the suffix would be seven digits long, as in test2390224.jpg.
In each case the program will zero fill, so if you set the digits value equal to 4, and only have 12 graphics in the original PDF file, the file names will be test0001.jpg, test0002.jpg, test0003.jpg, and so on, up to test0012.jpg.
DocToImages –digits=4
-asprinted Optional, Y or N for annotations “as printed.”  Defaults to N. Set this option to Y if you want to add to the export graphics files any annotations added to the original PDF file that are printable.  Not all annotations can be sent to a printer.  With this option, you can choose to export with a graphic, to an image file (such as JPG or PNG), any annotations in the PDF file that are attached to graphics files.  For example, if someone added a text comment to a photograph in a PDF file, you could use this option to show this comment on the photograph when you export it to a JPG file.
DocToImages –asprinted=Y

Close the statement telling the program where to find the PDF file you want to convert.  If you have copied the PDF file to a directory under the Java folder, you can simply describe that subfolder:

java -Djava.library.path=target/lib -cp target/maven-samples-15.8.2.jar com.datalogics.pdfl.samples.Images.DocToImages.DocToImages -format=jpg samples\data\test.pdf

or, if the file is in the Java directory:

java -Djava.library.path=target/lib -cp target/maven-samples-15.8.2.jar com.datalogics.pdfl.samples.Images.DocToImages.DocToImages -format=jpg test.pdf

If you have it in a different directory or server drive, you need to include the full path name where that PDF file is found:

java -Djava.library.path=target/lib -cp target/maven-samples-15.8.2.jar com.datalogics.pdfl.samples.Images.DocToImages.DocToImages -format=jpg Z:\Datalogics\test.pdf

Stack up the parameters you want on the command line, but remember that you must always define the format to use, and close the command with the name of the PDF file that you want to draw images from:

java -Djava.library.path=target/lib -cp target/maven-samples-15.8.2.jar com.datalogics.pdfl.samples.Images.DocToImages.DocToImages -format=jpg–fontlist="Z:\ColorManager\Resources\Fonts;C:\ColorWise\Font" –pages=2,4,7 test.PDF