Adobe PDF Library

Converting PDF Pages to Images

DocToImages

View Sample Code

Use DocToImages to convert a PDF file to a series of graphics images, one per page.  Suppose you have a 30 page PDF file.  You could use DocToImages to render this PDF file as a series of 30 individual JPG or PNG graphics files, one file per page.  You can select the file format, graphics size, or resolution, or choose only specific pages (like pages 1 and 2-7) to convert to a graphic file, besides a variety of other variables to determine what your output looks like. You can also write the pages of a PDF document or some of the pages in the PDF document to a multi-page TIF image using the “multi” parameter.

The program DocuToImages requires you to enter the formatting values manually in the command line.

Here is a typical command line.  You need to enter the program name, the export file format to use (such as JPG), and the name and path of the source PDF file:

C:\Datalogics\APDFL15.0.4\DotNET\Sample_Source\Images\DocToImages –format=jpg samples\data\test.PDF

To add more parameters, simply line them up in the command:

C:\Datalogics\APDFL15.0.4\DotNET\Sample_Source\Images\DocToImages –format=png –color=rbg samples\data\test.PDF

You can add as many parameters as you like to a single command.

The program will generate a series of TIF image files as output, abs, per, rel, and sat:

-format Required. Enter the graphical file format to use, tif, jpg, bmp, png, or gif.
Doctoimages –format=jpg
-color Optional. Choose the color format for the images.  The options are gray, RGB (Red/Green/Blue), and CMYK (Cyan/Magenta/Yellow/Key, or black).
RGB is the additive color model, where the absence of all color is black.  CMYK is the subtractive color model, where the absence of all color is white. That means that with CMYK white is the natural color of the paper or background before colors are applied. Defaults to RBG.
Doctoimages –color=gray
-grayhalftone Optional.  Y or N.  Y will set the image to halftone.  This is only valid for the TIF format. Defaults to N.
Doctoimages –format=tif –grayhalftone=y
-first Optional.  Y or N.  Y will tell the program to only convert the first page of the PDF file, and ignore the rest.  Defaults to N.
Doctoimages –first=y
-quality Optional. Numeral, 1 to 100.  This is the JPG Quality Scale, an arbitrary scale used to describe the print or display quality for a JPG image. This was created specifically for JPG images.  In fact the quality scale describes the compression ratio.  The default value is 75.  The higher the quality number, the lower the compression and the better the image.  But the higher the quality number, the larger the JPG file that results.
Doctoimages –format=jpg –quality=95
-resolution Optional. Enter a value or set of values to define the resolution for the graphics images, in Dots per Inch (DPI).  The resolution must be from 12 to 1200.
The first number is the horizontal resolution, and the second is the vertical resolution.  It is expressed as 480×640, for example.  If you enter a single value, the same number will be used for both horizontal and vertical resolution.  The higher the resolution, the higher the quality of the image, but graphics files with high resolutions take longer to generate and are considerably larger.  Defaults to 300.
Doctoimages –resolution=480x640
Or
Doctoimages –resolution=600
-fontlist Optional. Use this option to define a directory or set of directories where you have fonts are stored that you want the Viewer to use with your PDF file.  You may enter up to 16 directories, each one separated by a semi-colon (;).  You can use this option to provide a font list to the DotNETViewer when you open it.
This parameter would be useful if you created a PDF file with a specialized font or set of fonts, perhaps a font that your firm custom created for use with the firm’s own advertising materials.  Datalogics provides a list of fonts in the Resources directory, and by default it will look there for fonts to use when converting the PDF file.  But if the system can’t find a font defined in the PDF, it will use a substitute. To prevent that, define the directory where your custom font is found. This will tell the program to look for the font file in that directory first.
This will tell the program to look for the font file in that directory first.Make sure you surround the entire list of path names for font directories with quotation marks.
Doctoimages –fontlist="Z:\ColorManager\Resources\Fonts; C:\ColorWiseFont"
-pixels Optional. Enter the final absolute size of the graphics image in pixels, expressed as width by height.  If you don’t enter a pixels value, each of your graphics images will be auto-scaled from the original page in the PDF file.  But you can use this parameter to tell the function to tell the system to generate a PNG file that is exactly 650 pixels wide and 340 pixels high.
Doctoimages –pixels=650x340
-compression Optional.  Set a compression value for generating graphics files.  The default value is no compression (none).  Note that None is only valid as an output format for BMP, PNG, or TIF.You must enter a compression value for JPG or GIF images.

  • Flate, or Deflate compression, an open source standard widely used for creating zip files and with PDF. Only valid for PNG.
  • Lzw, Lempel-Ziv-Welch, a universal data compression algorithm, widely used with Unix platforms. Only valid for GIF. This method appears in some old PDF documents but it is rarely used any longer.
  • g3, Group 3 compression, a universal compression standard for fax documents. Only valid for gray images with grayhalftone set to Y.
  • g4, Group 4 compression, a universal compression standard for fax documents. Only valid for gray images with grayhalftone set to Y.
  • dct, Discrete Cosine Transform, a compression format used rendering photographs as JPG images.  Only valid for JPG, and also known as Jpeg compression.

Note that g3 and g4 refer to the CCITT compression format, and is limited to black and white images. It was designed for use with fax documents, but it is also still commonly used for pre-press work with black and white photographs. It can significantly reduce the size of an image without a loss in quality.

Doctoimages –format=jpg –compression=dct
. . .
Doctoimages –format=tif –grayhalftone=Y –compression=g3
-region Optional.  Define the part of the PDF page to rasterize as a graphics file, or take a vector graphic in a PDF file and convert it to a bitmap file.  The region parameter defines how the program manages the boundaries of the export graphic file.  The possible values include:

  • Crop. Defines the region for clipping or cropping the graphic for display or print. Crop, unlike the other settings for region, has no default, defined size or geometry; with the crop value it is possible to provide additional information to manually define the margins of the image that are selected and exported.  If these values are not provided, the crop value will match the media value.
  • Media. Defines the boundaries of the actual page where the graphics image will be printed. In this case the media setting may include an extended area around the graphic on the printed page.  This area can be used for printing marks on a proof copy, for example.  The media value provides the widest possible margins around the graphics image.
  • Art. Defines the logical extent of the content of the graphic.  This is the smallest of margins.  With art, only the graphic itself is exported, without any space around the image.
  • Trim. Defines the intended dimensions of the finished graphic after trimming to fit the page.  This will be smaller than the media and bleed settings, but wider than the art setting.
  • Bleed. Defines the region where the contents of the page will be clipped.  It may include an extra area surrounding the graphic to allow for the physical limitations of cutting, folding, and printing equipment. This value is the second widest margin around the graphics image, after media.
  • Bounding. Usually the smallest possible rectangle that can hold all of the content on the page.  It is possible, however, for Bounding to include objects that fall outside the borders of the PDF page, such as a particularly wide Bezier curve.

Defaults to Crop. Note that most modern software products that work with PDF files handle these values automatically.

Doctoimages –region=art
-pages Optional.  Enter a list or range of pages to include in the export, with each page number or range of page numbers separated by a comma. If you have a 100 page PDF file but only want to convert the first three pages and pages 8 and 10 to JPG files, you could use this command to create five JPG files for those five pages.  You can also enter “even” or “odd.”
Defaults to all pages in the original PDF file.
Doctoimages –pages=2,4,7-10,14
Or
Doctoimages –pages=odd
-output Optional.  Enter the name you would like to apply to the export file or files. For example, you could use this parameter to export graphics from a PDF file called market.PDF to a series of export files called graphic01.JPG, graphic02.JPG, and so on.  The default value would be to create an export file with a name that matches the name of the import file, in this example, watermark.jpg.
Doctoimages –format=png –output=graphic.png
-smoothing Optional.  Select smoothing for the text in a PDF file (“text”), or for text and images (“all”), or no smoothing at all (“none”).   Defaults to none.
Doctoimages –smoothing=text
-reverse Optional.  Y or N.  Reverse black to white and white to black for gray images.  Defaults to N.
Doctoimages –color=gray –reverse=Y
-blackisone Optional.  Y or N.  Reverse black to white and white to black for gray scale TIF images only. Defaults to N.
Doctoimages –format=tif –grayhalftone=y –blackisone=y
-multi Optional.  Y or N.  Generate a multipage TIF file.  This is only valid for the TIF format. Defaults to N.
Doctoimages –format=tif –multi=y
-digits Optional.  Enter a number 0 to 9 to define the size of the fixed digit suffix to add to the file name.  The system will automatically increment this value by one for each new output file generated.  For example, if you set the digits value equal to 2, the system will assign a two-digit suffix to each export file name, and add 1 to the value for each file that it creates, as in test27.jpg, test28.jpg, test29.jpg, and so on.  If you want to export hundreds of images from a single PDF file, this would be a useful way to track the output files.  If you set the digit value to 7, the suffix would be seven digits long, as in test2390224.jpg.
In each case the program will zero fill, so if you set the digits value equal to 4, and only have 12 graphics in the original PDF file, the file names will be test0001.jpg, test0002.jpg, test0003.jpg, and so on, up to test0012.jpg.
Doctoimages –digits=4
-asprinted Optional, Y or N for annotations “as printed.”  Defaults to N. Set this option to Y if you want to add to the export graphics files any annotations added to the original PDF file that are printable.  Not all annotations can be sent to a printer.  With this option, you can choose to export with a graphic, to an image file (such as JPG or PNG), any annotations in the PDF file that are attached to graphics files.  For example, if someone added a text comment to a photograph in a PDF file, you could use this option to show this comment on the photograph when you export it to a JPG file.
Doctoimages –asprinted=Y

Close the statement telling the program where to find the PDF file you want to convert.  If you have copied the PDF file to a directory under the DotNET folder, you can simply describe that subfolder:

C:\Datalogics\APDFL15.0.4\DotNET\Sample_Source\Images\DocToImages –format=jpg samples\data\test.PDF

or, if the file is in the DotNET directory:

C:\Datalogics\APDFL15.0.4\DotNET\Sample_Source\Images\DocToImages –format=png test.PDF

If you have it in a different directory or server drive, you need to include the full path name where that PDF file is found:

C:\Datalogics\APDFL15.0.4\DotNET\Sample_Source\Images\DocToImages –format=png Z:\Datalogics\test.PDF

Stack up the parameters you want on the command line, but remember that you must always define the format to use, and close the command with the name of the PDF file that you want to draw images from:

C:\Datalogics\APDFL15.0.4\DotNET\Sample_Source\Images\DocToImages –format=png –fontlist="Z:\ColorManager\Resources\Fonts;C:\ColorWise\Font" –pages=2,4,7 test.PDF

DrawToBitmap

View Sample Code

This program sample converts a PDF file to a series of bitmap image files. The program will generate a series of graphics files for the first page in the original PDF:

  • Four DrawtoBitmap files, one each bmp, gif, jpg, and png
  • Four DrawtoGraphics files, one each bmp, gif, jpg, and png
  • DrawtoByteArray.png

These graphics files represent different methods that the program uses to render the images from the original PDF file. You can use this code as a basis for exporting graphics from a PDF, and edit it as needed to export graphics from other pages in the PDF file.

The DrawtoGraphicsorBitmap method can generate either DrawtoBitmap or DrawtoGraphics files. The name of the output file depends on the value of the last parameter in this method. If it is set to “True” the file name starts with DrawtoGraphics; otherwise, the file is named DrawtoBitmap. The DrawtoByteArray output file is generated from the Main function. You can also use this program to use the DrawLayers method to generate a DrawingLayersTo.PNG output file.

The DrawToBitmap sample program uses several approaches to render the content from a PDF page:

  1. System.Drawing.Graphics, using specified Matrix and export rectangle. Create a Graphics instance and then pass this one into the DrawContents() method with Matrix and Rect. This is the simplest way to render a page to the bitmap.
  2. System.Drawing.Graphics, using specified DrawParams object. Create a Graphics instance and fill necessary fields of DrawParams object, and then pass both to the DrawContents() method. This approach shows how to render a page to the bitmap using DrawParams with enabled BlackPointCompensation flag.
  3. System.Drawing.Graphics, using specified DrawParams object. Create a Graphics instance and fill necessary fields of DrawParams object, then pass both to the DrawContents() method. The difference between this approach and number two above is the ability to choose the layer on the page that should be included during rendering. It also shows how to create OptionalContentContext object with the layers you select.
  4. Render the content for a page to the byte array. You can use this array wherever you like. Fill in the DrawParams structure and pass it into the DrawContents() method. The program will create a result file named DrawtoByteArray.png. The sample program creates a new Bitmap from the obtained Byte array.
  5. System.Drawing.Bitmap, using specified DrawParams object. Create a Bitmap instance and fill the necessary fields of DrawParams object, and then pass both to the DrawContents() method. In this approach you can choose the layer on the page to include during rendering. You can see how to select the layers to be rendered and how to create OptionalContentContext object with those layers.
  6. The final rendering approach is also the simplest. It shows how to render a page’s contents to a bitmap. It is the same as first approach, but it allows you to render an image as a BMP file.

When you run the program the system will prompt you to enter the name of the PDF file. If the PDF is in the same directory as the program, you can simply enter the file name. Otherwise, you need to provide the full path name. The graphics images will be saved to the same directory.

DrawtoBitmap is similar sample program to DoctoImages, in that both generate image files from a PDF file. But the DrawtoBitmap makes use of matrix programming operations.