Adobe PDF Library

C++ Sample Programs

Introduction

All of these samples can be run from a command line. Most of them define a default input file to use, and provide a default name for an output file. Most of them also provide a default input directory for storing the input file. The default input directory is used to allow a sample to find an input file that is specific to that program, and not loaded for general use when the Adobe PDF Library initializes.

If you have your own PDF document you want to use as an input file, you can copy it to the default input directory and enter the name of that file on the command line. You can also enter a file name to use and the path for when the sample saves the results to an output file. And you can edit the default input and output file names named in the program itself.

Note: We recommend using Microsoft Visual Studio 2013 or later for working with the solution (.sln) project files related to these sample programs.

Working with Annotations

CreateAnnotations

View Sample Code

The CreateAnnotations sample program demonstrates how to add annotations to a PDF document, and how to extract text from these annotations and save this text to a separate PDF document.

The sample program finds all of the PDEElements in an input PDF document and creates a text annotation for each of them before saving the file. Then, it reopens this file, extracts the text from these annotations, and saves this text to a new output PDF.

The PDFEdit Layer (PDE) of the Adobe Acrobat API contains classes that provide for editing in PDF documents including color spaces, clip and page objects, fonts, form XObjects, and other objects. The PDEElement is a base class for PDE, used to derive a variety of elements within a PDF document, including text and XObjects. For any text elements the program finds, it creates a highlight annotation. For all other elements, the program creates a text annotation for the element and places it on the page of the output PDF document.

For more detail on PDEElement, see the description in the Adobe Acrobat API Reference.

The annotations are listed in the output PDF document in the order that they appear in the input file. The program assigns a font and type size to the text that appears.

FlattenAnnotations

View Sample Code

This sample program demonstrates flattening annotations within a PDF document.

The flattening process, in working with PDF documents, combines layers of content on a PDF page, or a stack of transparent images or colors, or an annotation, and renders the result as a single image, color, or set of text. When a digital signature is flattened, the digital certificate key and related properties are removed from the signature field. The name of the person who signed the document and related information, such as the date and time stamp and the signer’s email address, appear on the page as text, but the signature field is no longer presented as an annotation.

This sample merges the appearance (AP) dictionaries of all annotations on each page into the page’s content stream by converting them into Form XObjects. Then it removes the annotation and saves the updates to a new PDF output document. If an annotation does not have an appearance, it will still be removed from the output PDF document, but the program will not convert it to anXObject. So it will not appear as flattened text in the export PDF.

For more information about annotation appearances, see Section 12.5.5, “Appearance Streams,” in the ISO 32000 Reference, page 387.

Creating Content

AddArt

View Sample Code

The AddArt sample program shows how to draw a graphic image on a PDF page by manipulating the PDEPath objects in the PDF file.

The PDFEdit Layer (PDE) of the Adobe Acrobat API contains classes that provide for editing in PDF documents including colorspaces, clip and page objects, fonts, form XObjects, and other objects. The PDEPath is an element that contains a path, or clipping path. A path defines shapes, lines, and boundaries for graphics, and fill areas within graphics. The PDEPath can have stroke and fill attributes, as well as graphics state attributes.

The program creates a new single-page PDF document, called AddArtOut.PDF, featuring two arrows, one filled with yellow color. For the first arrow, the program sets the PDEGraphicsState object, which holds attributes of how the PDEElement will be displayed, and then draws the arrow. The program sets coordinates for beginning and ending positions and then calls PDEPathAddSegment to draw lines between these positions. The program defines the line color used, thickness, and rounded joints where lines meet.

Then, the program uses the same process to draw the second arrow, but applies transformations, namely a fill color, and scaling it to be smaller than the original.

The program closes by releasing resources and saving the PDF output file, AddArtOut.PDF.

For more details on working with graphic objects, particularly working with common transformation types, translations, rotations, scaling, and skews, see Section 8.3, “Coordinate Systems,” in the ISO 32000 Reference, page 114. Specifically see section 8.3.3, Common Transformations, on page 117.

AddAttachments

View Sample Code

A PDF file can contain other embedded files, called attachments, rather like attachments added to an email message. You can add any type of file to a PDF document, and a PDF can hold any number of attached files. The AddAttachment program demonstrates how to add two attached files to an input PDF document.

The program generates an output file called AddAttachments-out.PDF. The document includes two attached files, a spreadsheet (xlsx) and a Microsoft Word document (docx). One of these files is embedded in an annotation shown on the PDF page, and the other is added to the name tree in the PDF document itself. Both appear under the File Attachments Navigation Pane on the left side of the Adobe Acrobat window:

AddAttachment

The Word document is also added to an annotation that appears on the page, a thumb tack.

The program creates two PDFFileAttachment objects. The first is embedded in the PDF document’s name tree. The second is added to the File Specification Dictionary of the annotation, and then the program adds the annotation to the PDF page.

AddContent

View Sample Code

The AddContent program opens an input document, a blank PDF file called AddContent, and adds several elements to the blank page, including a line of text and a rectangle. The program saves the result to an output file called AddedContent.PDF. This program is similar to AddArt, except that it adds both text and graphics elements. The content of the text and graphics are held in the PDEContent object on the output page.

The PDFEdit Layer (PDE) of the Adobe Acrobat API contains classes that provide for editing in PDF documents including colorspaces, clip and page objects, fonts, form XObjects, and other objects. The PDEPath is an element that contains a path, or clipping path. A path defines shapes, lines, and boundaries for graphics, and fill areas within graphics. The PDEPath can have stroke and fill attributes, as well as graphics state attributes. In this case the PDEPath provides the placement of the rectangle, the rectangle height and width, and line width, and the RGB color values (red/green/blue).

The PDEFont element defines a font to use for text; PDEText holds the text itself. So the program defines the font for the text to be displayed on one element, and adds the text itself.

CreateBookmarks

View Sample Code

A bookmark in a PDF document labels a place within the PDF document to serve as a destination, usually a heading or a graphic. After you add a bookmark, you can create a link elsewhere in the same PDF document and connect it to that bookmark. When the reader clicks on that link the viewer will take the reader to the place in the PDF where that bookmark is found.

This sample demonstrates how to add a bookmark to a PDF document. It creates a parent bookmark in Korean via Unicode (bookmark 1), and then creates a child bookmark under that (bookmark 1.1) and saves the PDF as an output file.

CreateDocument

View Sample Code

This simple program demonstrates how to create and saves a new PDF document. The file is set to portrait orientation, 8½ by 11 inch, and the program inserts five blank pages.

CreateLayers

View Sample Code

PDF documents can include Optional Content Groups, known as “layers” in Adobe Acrobat and Reader, that can be used to separate and manage content or graphics on a single page.

Layers are a useful way to present information when opening a PDF file.  For example, you could create a brochure with multiple layers offering the same content but in different languages.  The first layer would be the blank background page.  The resulting PDF file could be set up, with some extra program code, to select the appropriate layer with French or Spanish or English, depending on the language of the reader, and then display that language in the PDF file.

The CreateLayers sample program demonstrates how to programmatically add two layers to a PDF document, one that displays text, and the other, annotations.

The layers appear if you open the PDF document in Acrobat and click the View Layers icon, Createlayersicon, in the upper left side of the Acrobat window:

Createlayers

In the output file, LayersCreated.PDF, both of the layers are set to be visible by default. Click on the box next to the name of either layer, and the corresponding text or annotation shown on the PDF page will appear or disappear.

CreateTransparency

View Sample Code

PDF files can have objects that are partially or fully transparent, and thus can blend in various ways with objects behind them. Transparent graphics or images can be stacked in a PDF file, with each one contributing to the final result that appears on the page. This is in contrast to opaque objects, where if you have a stack of graphics or images, only the graphic or image on top of the stack will appear. One or more graphics images presented together in a stack is referred to as a transparency group. With a stack of transparent images, the final colors shown are the result of blending the colors of all of the overlapping objects. The way that two transparencies blend together is called a blend mode.

The CreateTransparency sample program demonstrates transparency and blend modes in Adobe PDF Library for the CMYK and RGB color spaces. It creates 13 sets of RGB and CMYK color blending circles, one for each of the 12 standard blending modes available in Adobe PDF Library, and an additional set to demonstrate the absence of blending. So a total of 24 color blending circles are drawn.

The program will add these blending circles to the export file, CreateTransparency.PDF. This output document will have 13 pages with two blending circles each, like this:

CreateTransparency

The blend modes are Normal, Multiply, Screen, Overlay, Darken, Lighten, ColorDodge, ColorBurn, HardLight, SoftLight, Difference, and Exclusion. For more details on these 12 standard blend modes, see Section 11.3.5, “Blend Mode,” in the ISO 32000 Reference, page 324.

Several other sample programs are available that demonstration working with transparencies in PDF documents:

Extracting Content from a Document

CopyContent

View Sample Code

This program finds an input document called CopyContent.PDF and copies content from this file a new PDF document, and then saves it as CopiedContent.PDF.    The original PDF document includes the following items:

  1. Page 1, a graphics image, a rubber ducky
  2. Page 2, a set of four text objects, “The Quick Brown Fox Jumped Over the Lazy Dog.” One is bolded, the others feature varying font sizes.
  3. Page 2, a page background, Fonts Sources
  4. Page 3, a text block featuring CJK (Chinese/Japanese/Korean) characters
  5. Page 3, four overlapping graphics items, namely colored boxes
  6. Pages 4 through 7, a set of text pages, featuring Appendix F, Linearized PDF, taken from the PDF Reference Guide.

You can edit the program to specify what kinds of content you want to copy. The content to be copied is determined by the page where that content appears in the source PDF document.

The WILL_COPY_ALL_PAGES setting can be either 1 or 0. If this value is set to 0, the program will copy every page of the input file to the output PDF. If it is set to 0, it will copy the pages listed in the “pagesToCopy” parameter instead. By default, these are pages 0, 1, 3, and 5. That means that by default the CopyContent program will take the graphics image, the page background, and part of the Appendix 4 material (the first and third pages in this section) from the input PDF file and place it in the output file.

The program saves the output file and closes both PDF documents.

ExtractAttachments

View Sample Code

A PDF file can contain other embedded files, called attachments, rather like attachments added to an email message. You can add any type of file to a PDF document, and a PDF can hold any number of attached files. The ExtractAttachments program demonstrates how to export and save attached files in a PDF document to a series of external files.

The program opens a PDF document called extractfrom.PDF. This PDF holds five attachments, two MS Word documents, a JPG image, a PNG graphics file, and another PDF file. These five attachments are saved separately in the export directory. Three attached files are embedded in annotations on the PDF page, and two are added to the name tree in the source PDF document itself.

ExtractFonts

View Sample Code

This sample is similar to EmbedFonts. The sample looks for fonts embedded in a PDF document and then extracts those fonts and saves them as font files, storing them in the same directory on the local system where the input PDF document is found.

ExtractFonts does not define a default output file.

For more information about working with fonts in the Adobe PDF Library, see the description of the EmbedFonts sample program.

Editing and Manipulating Content

AddBookmarks

View Sample Code

A bookmark in a PDF document labels a place within the PDF document to serve as a destination, usually a heading or a graphic. After you add a bookmark, you can create a hyperlink elsewhere in the same PDF document and connect it to that bookmark. When the reader clicks on that link the viewer will take the reader to the place in the PDF where that bookmark is found.

This sample program shows how to add bookmarks to a PDF file.

The program starts with an input PDF document that is 11 pages long, holding text drawn from the James Joyce novel Ulysses. The program searches for all of the bolded headings found in the document and automatically attaches bookmarks to these headings.

The program also adds a few sub-bookmarks to the first bookmark. These take the reader to the same place in the document as the bookmark, but change the zoom level of the text that is shown.

The bookmarks added to the output document appear in the Bookmarks Navigation Pane in the upper left corner of the Adobe Acrobat window:

AddBookmarks

AddDocumentInformation

View Sample Code

This sample program opens a PDF file and then inserts standard document information into that file. The values are found in the Document Information Dictionary, and include items like the file’s Title, Author, Subject, and Creation Date. You can display them in any PDF document if you open that PDF in Adobe Acrobat and select File and Properties, and then click the Description tab. This program adds a Title and Author to the output file.

For more detail on the Document Information Dictionary see section 14.3.3 of the ISO 32000 Reference, page 549.

View Sample Code

Use the AddLinks sample program to automatically add hyperlinks to a PDF document. This program demonstrates adding three kinds of links to a file:

  1. Open a separate a Word document
  2. Move to a new location within the PDF document, the third page
  3. Open the Datalogics web page, Datalogics.com

The program starts with an input document called AddLinks.PDF and uses it to generate AddedLinks.PDF. The program creates and places each link one at a time, with each link embedded in a separate annotation on the PDF page. For the external file, a Word document called DOCXLink.docx, the program adds a link to the relative path where this Word Document is stored.

The Word document is found in the apdfl-samples/ Input directory where you installed your APDFL software package.

AddPageNumbers

View Sample Code

This sample program opens a blank but eight page long PDF document and adds labels and page numbers to the pages in that file.

If you open any PDF document and look at the Thumbnails Navigation Pane in the upper left corner of the Adobe Acrobat window, the pages shown will be numbered by default, like this:

AddPageNumbers2

But it is possible to provide custom labels to each page, labels that will appear with the Thumbnails Navigation Pane. The completed output document, created by AddPageNumbers, will look like this:

AddPageNumbers

In creating labels for the pages, the program will use a set of style keys, “R” and “r” for upper and lower case Roman numerals (as in “Page I, Page II” or “Page i, Page ii”) and “A” and “a” for upper and lower case characters (for example, “Page One” or “Page Two”). Arabic numerals are defined with the “D” style key (“Page 1” or “Page 2”).

AddWatermark

View Sample Code

The AddWatermark sample program adds two watermarks to the first two pages of an input PDF. The watermark can be either text or a graphic of some sort; the source of the graphic must be a page in another PDF source document. In this case the program takes a graphic from the first page of the input document Watermark.PDF, a picture of a rubber ducky, and applies it as a watermark to the first two pages of the output document (pages 0 and 1).

You can set the range of pages to receive a watermark in the output PDF document using the targetRange parameter, or set it to PDAllPages, and you can define the horizontal and vertical placement on each page. The text to use for the text watermark is also included in the program, and you can define the type style and font size. The variables for both the text and graphic watermark are stored in a pair of parameters structs.

AttachMimeToPDF

View Sample Code

This sample program demonstrates how to add a file attachment to a PDF document, and then save the updated file. If you run this program and then open the PDF output document in Adobe Acrobat, the newly attached file appears in the Attachments pane on the left side of the window.

Attached file window

This is a command line program. You can enter the name of the PDF input file and the name of the file to be attached.

MIME refers to the Multipurpose Internet Mail Extensions (MIME) standard. It was developed to define the types of files that can be attached to an electronic mail message using the SMTP format, but it also applies to other protocols like HTTP, and the MIME standard is used to define the types of files that can be attached to a PDF document. MIME is effectively a list of many common file types, including text documents, image, audio or video files, program files, and the like. The attached file used by this sample program is a text (TXT) file.

EmbedFonts

View Sample Code

This sample program demonstrates how to scan a PDF document to determine whether the fonts used in that document are embedded. For each font the program finds that is not embedded in the document, the sample looks for characters in that font that are used in the document. It gathers up the characters and then requests that the Adobe PDF Library to generate a subset font stream for these characters and then embed this font in a subset data stream. That is, the program subsets the characters from that font into the PDF document.

Font embedding places the entire font within a PDF document. This is a best practice for working with PDF, as by saving a font used by a PDF document within that document the viewing tool does not need to find the font on the local system, or substitute the original font used with another. However, a PDF document with all of the fonts used by that document embedded can be quite large, especially if the document uses an Asian font, with thousands of characters. To help make the PDF document smaller, a subset of the characters in an embedded font can be selected to be added to the PDF Document. This is called subsetting the font. The subset font only includes the characters needed when rendering the pages of that document. Any font that is subset in a PDF document must first be embedded in that document. The font is embedded, and then characters that will not be needed are removed.

The default input PDF document has some fonts that are embedded and some that are not. The sample program finds the fonts in the document that are not embedded, and then looks for an alternative font on the local computer system to use as a substitute. Then, the program subsets the characters of that font that are used in the document into that document, and renames this newly subset font to show that it was embedded and subset.

To learn more see Access Font information.

FlattenTransparency

View Sample Code

This sample program uses the PDFLattener plugin for flattening a PDF input document.

PDF files can have objects that are partially or fully transparent, and thus can interact in various ways with objects behind them. Transparent graphics or images can be stacked in a PDF file, with each one contributing to the final result that appears on the page.  This is in contrast to opaque objects, where if you have a stack of graphics or images, only the graphic or image on top of the stack will appear.  When the FlattenTransparency sample program flattens a set of transparencies within a PDF document, it combines them into a single image. The CreateTransparency sample program is similar, in that it merges transparencies within a PDF input file, but CreateTransparency blends the colors of the transparencies together based on a selected blend method. The FlattenTransparency program removes any interactivity from a PDF document.

PDFlattener only flattens pages that include transparent elements.

The program sets the PDFlattener parameters.

Parameter Description
Color space You can select the color space for working with transparent objects. For RGB (Red/Green/Blue) the default color space is sRGB IEC61966-2.1. This is a version of the standard RGB color space developed by Microsoft. For CMYK (Cyan/Magenta/Yellow/Black) the default is US Web Coated (SWOP) v2. See color management.
Compression You can also select a color compression method. The system defaults to DeFlate compression. DeFlate is commonly used with Zip and PDF files. See the description of the RenderPage sample program.
Raster/vector balance The default is for no vectors to be used (0). All transparencies will be rasterized or converted to a digital format (made up of pixels).
Tiling mode By default tiling is not used and we do not recommend it. The tiling process segments a graphics image using a grid. Tiling can be used to manage large graphics images. For example an image could be tiled to render it as smaller individual sections that can be more easily stored in system memory.
Target tile size Set to zero points by default. Tiling is not used.
Resolution for atomic regions An atomic region is a portion of a transparency that results from the process to flatten all of the transparencies in a PDF document. These atomic regions are commonly rectangles and fitted together on the page. Two settings are offered. The first defines the resolution for displaying and printing these atomic regions. The other sets the resolution for the region edges. Both are in Dots per Inch.
Clip complex regions Use clipping for complex regions. A clipping path can edit a graphic design by removing part of the image. This defaults to false. The entire region is preserved.
Stroke areas filled Convert stroke elements to filled elements. Outlined areas will be filled in with solid color.
Rasterize text By default the program is designed to convert any text characters it finds to rasterized characters based on pixels.
Overprinting Preserve overprinting when it appears.
Shading output Both features are enabled to allow shading of output and to allow level 3 shading. Level 3 shading refers to pattern shading for graphic images. The Shading Pattern facility was added to PDF level 3 or PDF 1.3 (specification 4.6.3).
Maximum size Maximum size limit while flattening. This defaults to zero (0). This means that there will be no maximum size setting for images embedded in the PDF document as it is flattened.
Adaptive flattening threshold Disregarded. The program does not set a tiling mode and does not use adaptive tilling.
The FlattenTransparency sample program closes with a callback function to monitor the flattening process.

Several other sample programs are available that demonstration working with transparencies in PDF documents:

ImportPages

View Sample Code

This program demonstrates how to copy the contents of one page from a PDF input file and place that content into a PDF page in a different document. The program creates a PDEForm to hold the page contents, and scales the page so that the imported content occupies one quarter of the PDF page where it will be placed. The program creates the PDEForm by calling the PDF Library API known as PDEContentAddPage. The PDFEdit Layer (PDE) of the Adobe Acrobat API contains classes that provide for editing a variety of objects in PDF documents, including form XObjects with this sample.

When importing content from the input document pages, the program must define the region to select on that page. That is, the program needs to define how the boundaries or margins of the import content are determined. There are several standard ways to do this when working with a PDF document. For example, the input content could use a media box, which defines the full boundaries of the actual page, or a crop box, which is limited to the exact boundaries for the print or graphic content shown on that page. So an input page defined with a crop box will usually be smaller than a page defined using a media box.

When selecting and placing content for pages in the output document, the sample program uses the media boxes found on the pages of the input document. But that means that when the output PDF document is generated, the pages in that document might show objects that would normally fall outside of the crop boxes of these input pages. For example, you might see color bars or cut marks that are usually not visible. If you want to make sure these items do not appear in the output file, change the settings in the program to use the crop boxes for the pages of the input documents, not the default setting of media boxes.

MergeAcroForms

View Sample Code

This sample program demonstrates how to use the Adobe PDF Library to move all of the AcroForm objects, namely forms fields and digital signatures, from one PDF document to another. AcroForm, or Acrobat Form, is the technology provided by Adobe Systems to build PDF forms documents.

The MergeAcroforms sample program starts by creating, effectively, a facsimile of each page in the input PDF document. These pages are used to build the pages in the output PDF file, with each new page in the output file a rasterized image of the corresponding page found in the input file. Then, the sample fills in these copied pages in the output file by copying in to them the Acroform fields found in the input file. The end of the sample program copies the standard individual objects of the Acroforms dictionary array (such as “NeedAppearances” and “SigFlags”) from the input PDF document to the Acroforms dictionary in the output document.

MergeDocuments

View Sample Code

This is a simple program that takes two PDF input files, Merge1.PDF and Merge2.PDF, and merges them into a single output file, MergePDF.PDF. Each of the input files has a single page with text; Merge2.PDF is added after Merge1.PDF to create a two page output PDF file. You can change the settings in the sample program to add Merge1 after Merge2.

PDFMakeOCGVisible

View Sample Code

This program makes the Optional Content Group (layers) within a PDF document visible within a viewing application, like Adobe Reader or Adobe Acrobat. The program finds any layers within the input PDF document and writes them to an output file.

Optional Content Groups can be used to separate and manage content or graphics on a single page. Layers are a very useful way to present information when opening a PDF file. For example, you could create a brochure with multiple layers offering the same content but in different languages. The first layer would be the blank background page. The resulting PDF file could be set up, with some extra program code, to select the appropriate layer with French or Spanish or English, depending on the language of the reader, and then display that language in the PDF file.

Several other sample programs are available that demonstration working with transparencies in PDF documents:

PDFUncompress

View Sample Code

The PDFUncompress sample is a utility that demonstrates how to completely un-compress the elements within a PDF document into a readable form.

Nearly all PDF documents feature compressed elements to make the documents more efficient to use, and most of the time these documents are left in their compressed state even when being opened in a browser or viewing tool. But sometimes it is necessary to completely uncompress a PDF document so that it can be opened and all of its contents viewed in detail in a text editor. This would be useful if you want to find the reason for a problem with a PDF document or set of PDF documents, or with a workflow that generates PDF documents.

For example, this sample also uncompresses font streams that are embedded in the document. The fonts are normally compressed using the Flate compression algorithm, but PDFUncompress can render this font content as ASCII or Hexadecimal characters.

SplitPDF

View Sample Code

The SplitPDF sample program is effectively the opposite of MergeDocuments. This sample opens an input file called PDFToBeSplit.PDF, six pages long, and exports each page to a separate PDF document. This program creates six PDF output files, but the length of the input PDF document does not matter. A document of any length can be divided into multiple PDF export files.

The sample program creates a vector called splitDocs to hold a reference to each page in the output PDDoc.  A PDDoc is an Adobe Acrobat object that is used to represent a complete PDF document, in this case, the output PDF file.

Displaying Information

DisplayPDEContent

View Sample Code

This program generates an output text file that lists details regarding the PDE content on every page in an input PDF document. The PDFEdit Layer (PDE) of the Adobe Acrobat API contains classes that provide for editing objects in PDF documents, including color spaces, clip and page objects, fonts, form XObjects, and other objects. This program can list the number of pages in the document and the file size, and identify and describe a variety of features within the PDF, such as the page layout (landscape or portrait), annotations, text content, and graphics. Specifically the program describes how the document manages graphic images, such as setting boundaries.

The purpose of this program is to demonstrate gathering information from a PDE content tree in a PDF document. This content tree stores objects in the document. Note that a wide variety of different kinds of information can be drawn from the PDE content tree. This sample provides a report that lists some of the kinds of data available, but you can edit the program to find and display other values that interest you.

PDFViewer

View Sample Code

PDFViewer is an application that can be used to open and view PDF documents in Windows environments. This sample is intended to serve as a model that you can use to build your own viewing tool. As such PDFViewer is a simple utility, though it will quickly open and display PDF documents with standard page and file sizes. The PDFViewer is designed to display one document at a time.

The user interface provided with PDFViewer allows you to set the page orientation and scaling, and to select the page in the document to display. You can also page through the document using arrows, or use the Home key to return to the first page or the End key to go to the last page. And you can use the mouse to position pages or change the page scaling or rotation.

The PDFViewer does not include all of the verification and error handling features that would be required of a more robust viewer. The goal was to clearly illustrate the Windows and Adobe PDF Library interfaces needed to build a viewer. Any code functions not needed to this end were left out.

This sample uses ASDouble throughout to refer to page positions or matrices. This is needed because many older PDF documents do not restrict Matrices or Sizes to the ASFixed limitations of earlier versions of PDF. All versions of Adobe PDF Library after version 10 treat page positions or matrices as ASDouble values.

This sample uses the drawing interface PDPageDrawContentsToWindowWithParams(), found in the DLExtras.h header file. This “draw to a window” interface permits the external specification of a page to a rendering matrix as floating point values.

The sample renders the entire page of a PDF document to an off-screen Device Context (DC). A DC is an external temporary storage area used to construct an image before loading it into the viewing page. This can be used to prevent a page from flickering while it is built. The PDFViewer then uses window level bitmap commands to display portions of an off-screen rendering on screen as needed. This allows the viewer to respond quickly to scrolling and rotation. It also allows a “placeholder” image to be displayed when the page is scaled, while the new rendering is being prepared.

To that end, renderings are prepared on a separate thread. This rendering thread repeats. That is, it opens and closes the Adobe PDF Library and the document only one time per document, but it can render any number of pages (and in any number of PagetoRender matrices) without restarting. The PDFViewer will create a new rendering only if the scale of the page changes.
This approach breaks down if a page is too large, or scaled too large. You can only use an off-screen DC that is up to 2 GB. But for pages of normal sizes, and normal scaling amounts, the PDFViewer renders pages very quickly. PDFViewer is designed to handle bitmap images up to 1.8 GB in size.

Converting Document Formats

ConvertPDFtoEPS

View Sample Code

The ConvertPDFtoEPS sample is a simple program that converts the contents of an input PDF file into a set of separate Encapsulated PostScript files, one EPS file for each page in the original PDF.

Color separation is part of high volume offset printing processing. The original digital content is color separated to create a set of plates for printing, generally one plate per page for each of the primary colors, cyan, magenta, yellow, and black. During printing each color layer is printed separately, one on top of the other, blended together to create the depth and variety of color in the final images.

To create the color-separated plates as a part of the pre-press process, usually the original digital file to be printed is separated into a series of Encapsulated Post Script (EPS) files. EPS is a standard graphics file format for working with text, images, drawings, and layouts that can be dismantled into their component parts and then be combined into a final completed document. EPS files can present both bitmap and vector data, and they can be scaled up or down without distortion. An EPS file is really part of a collection of several image files; hence the name “encapsulated.”

This sample program is intended to demonstrate a key process for working with PDF documents, converting a PDF to EPS for printing, and to allow a user to understand and complete that process quickly and easily. ConvertPDFtoEPS is based on the more sophisticated sample PrintPDF, which is included with the original samples in the core Adobe PDF Library.

ConvertPDFtoPostScript

View Sample Code

The ConvertPDFtoPostcript sample is a simple program that opens a PDF input file and converts the content into PostScript. The sample saves that content as a single new PostScript file that holds all of the pages of the original PDF file. It generates the PostScript using the “print to file” mechanism in PDFLPrintDoc.

This sample is intended to demonstrate a key process for working with PDF documents, converting a PDF to PostScript so that it can be sent to a PostScript printer. It is intended to show a user how to complete that process quickly and easily. ConvertPDFtoPostScript is based on the more sophisticated sample PrintPDF, which is included with the original samples in the core Adobe PDF Library.

XPStoPDF

View Sample Code

XML Paper Specification (XPS) is a standard document format that Microsoft created in 2006 as an alternative to the PDF format. Any document can be saved as an XPS file by printing to the Microsoft XPS Document Writer printer driver, and then opened in the XPS Viewer, provided with Windows 7.

This sample program demonstrates how to convert an XPS file into PDF. It takes an XPS input file, XPStoPDF.xps, and uses it to create a PDF output file.

A similar sample program, CreateDocFromXPS, is provided with the Adobe PDF Library for both .NET and Java.

ConvertToPDFA

View Sample Code

This program converts a PDF file that you provide into the PDF/A format, and generates a PDF output file. You can also define the color space to use, RGB or CYMK. If you open the PDF output file, it will look like your original, but PDF/A has some important differences.

PDF/A is an ISO-standard version the PDF format. Adobe introduced PDF in 1993, and the International Organization for Standardization (ISO) took management of PDF as an open standard in 2008. ISO released PDF/A as a lighter version of the original PDF format in 2005. It is designed to be used with PDF files that need to be archived and stored for long periods. For example, the fonts in a PDF/A file are embedded in the file itself, rather than accessed through a link to a font directory on a local server.

ConvertToPDFX

View Sample Code

This sample program converts a PDF document into a PDF/X compliant document.

PDF/X is used for the graphic arts and printing community, where colors must be completely accurate. The format places restrictions on PDF files so that PDF/X versions of those files can be transmitted reliably through a graphic arts workflow and printed with the colors and appearance that the person creating the PDF/X file is looking for. PDF/X has largely replaced the use of TIFF and other raster-based formats and EPS files in many graphics arts workflows where color is critical, such as in producing catalogs and magazines.

The conversion sample program defaults to PDFX1a2001. This format flattens any transparencies found in the original PDF document and converts colors to the CMYK color space. You can change this to the PDF/X 3-2003 format if you prefer.

Document Optimization

PDFOptimizer

View Sample Code

Use the PDFOptimizer sample program to experiment with the PDF Optimization feature in the Adobe PDF Library. PDF Optimization allows you to compress a PDF document to make it smaller. Besides being easier to manage, an optimized PDF document tends to load faster when opened in a web browser. The PDF Optimizer in the Library is designed to work in a similar way as a feature in Adobe Acrobat that can save a PDF document to optimize that file, when selecting Save As.

Note that the Adobe PDF Library offers the PDF Optimizer for both the C Language Interface and the Java and .NET interface, and a sample program is provided for the feature for each interface as well:

  • PDFOptimizer.cpp for the core Library
  • PDFOptimizerSample.java and PDFOptimizerSample.cs for Java & .NET

See Working with the PDF Optimizer.

The program uses a sample input file called Uylysses.pdf, a selection of pages from the James Joyce novel, and generates an output file called out.pdf.

WebOptimizedPDF

View Sample Code

The WebOptimizedPDF sample program opens an input document called NonLinearized.PDF, optimizes that document, and saves it as WebOptimized.PDF.

The process known as web optimization, or byte serving, creates a linearized PDF document. A linearized PDF is restructured in a way that allows the first page of the file to appear on a user’s web browser while the rest of the file is being downloaded. This type of PDF document can thus display more quickly on a web page; the user does not need to wait for the entire document appear before he or she can start reading it.

You can changer a setting in Adobe Acrobat to save PDF documents as Web Optimized by default.  Click Edit and select Preferences, and then click “Documents” under the list of Categories on the upper left side of the Preferences window. Then, check the option “Save As optimizes for Fast Web View” under Save Settings.

This sample program demonstrates how to create a program that will perform the same task but without needing access to Acrobat.

Alternate File System

AlternateFileSystem

View Sample Code

This sample program shows how to implement an ASFileSys structure in an Adobe PDF Library application. It also demonstrates adding a simplified “in memory” file system for use in an app.

The Alternate File System structure (ASFileSys) is a series of routines within the Adobe PDF Library that allows a developer to implement file system services in an APDFL application. ASFileSys allows an application to open and delete files, read data from a file, and write data to a file. Adobe Acrobat and the Adobe PDF Library both offer a built-in Alternate File System that serves as the platform’s native file system, but developers working with the Adobe PDF Library can create additional ASFileSys objects to serve other file systems.

ASFileSys can be used to read data from and write data to PDF files stored in memory. If, for example, a PDF document is opened in a browser window and thus downloaded from an online source, how would the application read the data from that file? Simply providing the URL to the web site where the file is stored won’t be enough; ASFileSys can find the PDF file in local memory, draw content as needed, and then save the output to a file on a local workstation or server directory.

AlternateFileSystem does not define a default input or output file, and it does not define an input directory. The sample does not demonstrate all of the calls available for use with ASFileSys, but it implements enough to illustrate the most common uses. When the sample closes it saves the output file, but before that ASFileSys searches the local disk to make sure that the system does not already have a file of the same name. This makes sure that the original file is not confused with the new output file, or overwritten.

Working with Images

AddThumbnailsToPDF

View Sample Code

For PDF use, a thumbnail is a small graphic image of a page in a PDF document. Thumbnails appear in a panel on the left side of the Adobe Acrobat window and aid in navigating through a document, as a user can scroll through a series of thumbnails quickly to find a page. This sample program demonstrates how to create thumbnails for a PDF document, one for each page. The program saves the thumbnail images in a PDF output file, using an indexed color table with 256 colors RGB.

Most modern PDF viewing tools like Adobe Acrobat and Adobe Reader generate thumbnails automatically when opening a PDF document, and then discard them when the PDF document is closed. The resulting PDF document will be smaller if the thumbnails are not saved in the PDF file itself.

But the Adobe PDF Library includes an interface that can create thumbnails for PDF pages and add them to the file. If you wanted to create your own PDF viewing tool, with the ability to add thumbnails to any PDF document opened in that viewer, you could use the APDFL interface with your viewer, and use the sample program AddThumbnailstToPDF as a reference.

CalcImageDPI

View Sample Code

This sample program demonstrates how to calculate the resolution for the images found in a PDF document. The sample scans the PDF input file and processes the images on each page one by one, rotating them as needed before calculating the Dots per Inch (DPI) for each image. Then, it lists the results in an output text file.

You could approach this sample as a tool to show how to draw general information about images from a PDF document. But CalcImageDPI also represents an important part of the process for optimizing a PDF document. The optimizing process is intended to reduce the size of a PDF document so that it can be more easily distributed online. Optimizing a PDF document usually includes downsampling the images in that document, or reducing the size of those images. To optimize a PDF document you need to know the effective resolution of the images in that PDF document.

The output file lists the number of images found in the input document, and then it describes each image, including the page number and the sequence on the page, and coordinates for the position and horizontal and vertical rotation. The output file also offers the height and width of the graphic in points, and the resolution in Dots per Inch.

The program calculates the horizontal and vertical DPI for each image in the document, and the output file lists these two values for each image found.

For Horizontal, the resolution is the number of pixels across the bottom of the file divided by the width in points. This yields the Dots per Point, and that value is divided by 72 to provide the Dots per Inch, as there are 72 points in an inch.

For Vertical, the program divides the pixels in the depth (height) of the image by the length of the vertical side, in points, and then divides the result by 72.

In each case the final value is rounded up to a whole number.

CreateImageWithTransparency

View Sample Code

This sample shows how to create a transparency within a PDF document, in the form of a graphic image with an art graphic layered on top. PDF files can have objects that are partially or fully transparent, and thus can blend in various ways with objects behind them. Transparent graphics or images can be stacked in a PDF file, with each one contributing to the final result that appears on the page. This is in contrast to opaque objects, where if you have a stack of graphics or images, only the graphic or image on top of the stack will appear. With a stack of transparent images, the final colors shown are the result of blending the colors of all of the overlapping objects. In this sample the text behind the image is partially visible.

The PDFEdit Layer (PDE) of the Adobe Acrobat API contains classes that provide for editing objects in PDF documents, including images. The program creates a PDEImage object, using a JPG image with an adjustable compression level. The program also provides for creating a soft mask with the transparent object. A SoftMask object in the PDF format allows you to place an image on a PDF page and control the level of transparency of that image. You can provide settings to determine how much of the background color or text on the page shows through the SoftMask image appearing in the foreground.

The program allows you to set the height and width of a JPG image, in pixels, and to enter a factor to define the quality and resolution of the image. The factor determines the relationship between compression and image quality, defaulting to 1.3 for “fair.” Then it defines a 9 by 4 inch yellow rectangle to set on top of the graphic image, and a 7 by 2 inch hole in the middle of this rectangle, with a color gradation for the center of the hold, gradually changing from white (0xFF) to yellow. Finally, it creates the complete image object and adds a mask to the image, as well as a text object to the top of the page.

The output PDF document features a page with a transparency that looks like this:

Transparency, with color gradation

CreateImageWithTransparency does not define a default input file, and it does not define an input directory.

Several other sample programs are available that demonstration working with transparencies in PDF documents:

CreateSeparations

View Sample Code

This sample demonstrates how to create color separations for spot color images.

Color separation is part of high volume offset printing processing. The original digital content is color separated to create a set of plates for printing, generally one plate per page for each of the primary colors—Cyan, Magenta, Yellow, and Black (CMYK). During printing each color layer is printed separately, one on top of the other, blended together to create the depth and variety of color in the final images.

In offset printing, a spot color is a color generated using a single run or a series of plate runs distinct from the initial separated plate run. Generally a spot color is a base color; a spot color is not created by blending several layers of color from four separate color plates in offset printing, but is added on top of the page or image. A common example would be a print job with four separate color plates (CYMK) run first to create the print image, and then a spot color (black) added on top of the image, in the form of text. But a spot color could also be blended with two or more colors, each color using its own print run.

FindImageResolutions

View Sample Code

This sample creates a list of all of the images found within a PDF document. It also describes where these images are found and the resolution for each one. The program will rotate any images it finds, as needed, to orient the image properly for calculating the resolution.

The resolution is calculated for both vertical and horizontal, in Dots per Inch (DPI). You can enter an input file name on a command line, or use the default input file named in the program. FindImageResolutions does not define an input directory.

The program generates an output file with a variety of images, and then performs analytics on these images.

Commonly when an image appears in a PDF document, a reference for that image will also be included in the PDF document front matter. The reference is a data stream; it allows multiple pages within a PDF document to reference the same image, though a reference can also point to a single example of an image.

Sometimes an image appears in a PDF document without a matching reference. This sample program will not find images that appear in the document but are not referenced. So the output file does not include images present in the document but not referenced, or images that are soft masks for use with transparencies.

For Horizontal, the resolution is the number of pixels across the bottom of the file divided by the width in points. This yields the Dots per Point, and that value is divided by 72 to provide the Dots per Inch, as there are 72 points in an inch.

For Vertical, the program divides the pixels in the depth (height) of the image by the length of the vertical side, in points, and then divides the result by 72.

In each case the final value is rounded up to a whole number.

The PDFEdit Layer (PDE) of the Adobe Acrobat API contains classes that provide for editing objects in PDF documents, including color spaces, clip and page objects, fonts, form XObjects, images, and other objects. These objects are stored in the PDE Content Tree.

Masking is a means to edit a photo or drawing to change or remove a feature, or change the background. Masking allows you to select a section of a photo or image so that you can edit or remove that part of the image, while leaving other parts of the image unchanged. The SoftMask object in the PDF format allows you to place an image on a PDF page and control the level of transparency of that image. You can provide settings to determine how much of the background color or text on the page shows through the SoftMask image appearing in the foreground.

FindImageResolutions ignores images that are used in pattern color spaces. A pattern color space would render graphics images with patterns rather than as solid colors, such as plaid coloring, or a tartan background, or a stencil. Patterns are rendered using tiling, where a page background is presented as a series of boxes, one after another, left to right.

OutputPreview

View Sample Code

This sample is based on the CreateSeparations sample described above. OutputPreview demonstrates how to create color separations for spot color images, but it takes the process a step farther. This sample allows for the combination of these plates into a single DeviceN color image, which displays the “true color” representation of the page. Multiple DeviceN color spaces may be used with the same image data to present the page with colorants present or removed. The sample creates a single image file, showing the document in CMYK and the document as a DeviceN image with all colorants present. Depending on the colorants defined for the page, the sample will also generate a series of images with one or more colorants removed. Finally, there will be a set of pages, one per colorant, showing only that colorant, and the percentage of the page covered in it.

Adobe Systems introduced DeviceN to allow systems to combine color channels for composite printing, such as drawing colors from the Pantone Hexachrome color system. The DeviceN allows for printing with an arbitrary number of color components, and thus it can use a wider range of colors.

See CreateSeparations.

RenderPage

View Sample Code

This sample program shows how to render a PDF document page to memory. The program uses the PDPageDrawContentsToMemory method, and then creates an output PDF document with a bitmap image rendered on the page. The program illustrates setting the graphic resolution and color space for the graphic, and setting the coordinates for the crop box for where the image will be presented on the page.

You can adjust the settings in the program to define the export resolution in Dots per Inch, 72, 150, 200, 300, or 600. You can also use a color space:

  • DeviceRGB, Red/Green/Blue
  • DeviceCMYK, Cyan/Magenta/Yellow/Black
  • DeviceGray, for gray scale images

And select a compression algorithm to use:

  • FlateDecode, or Deflate compression, an open source standard used for creating Zip files, PDF, and PNG graphic files.
  • ASCIIHexDecode, Compression for ASCII characters
  • LZEDecode, Lempel-Ziv-Welch, a universal data compression algorithm used with Unix platforms and GIF graphic files.
  • DCTDecode, Discrete Cosine Transform, a format used for rendering photographs as JPG images.

See color management.

Extracting Metadata from a Document

CountColorsInDoc

View Sample Code

This sample reviews a PDF document to determine the distinct colors found, and then generates an output text file listing those colors. The program identifies colors from either RGB or CYMK color spaces, as well as gray scale shading. The program identifies the colors in the PDF document by referring to the list of colors defined in the color profile stored within the PDF document itself. This sample demonstrates how to find information in a PDF document, and how to access an object within a PDF.

High-quality digital hardware can detect millions of shades of colors. To manage the broad range of colors for producing graphics images in digital content, imaging professionals have developed models to define these colors, called color spaces. Some of these color spaces are saved with hardware devices, and so define what a camera can detect, or a printer print, or a monitor display. Others are based on software and thus can be used across many different kinds of devices, such as the Adobe RGB color space. A color space must be defined for any device or software product in order to make sure that coloring patterns remain the same from one device or system to another.

The Standard RGB color space, sRGB, was developed by Microsoft and Hewlett Packard to describe colors available on most monitors and other displays. This color space is also commonly used for web graphics. Adobe Systems’ own Adobe RGB (Red/Green/Blue) color space is designed to hold all of the colors that are likely to be available on any color CMYK (Cyan/Magenta/Yellow/ Black) printer. The first color space was defined by the International Color Consortium, and serves as the basis for all other color spaces; color spaces are expressed in files called color profiles, generally with an .icc file name. This sample uses a color profile to define the colors found in the PDF document, either a profile you provide, or the ICC color profile file stored in the PDF document itself.

Adobe Systems introduced DeviceN to allow systems to combine color channels for composite printing, such as drawing colors from the Pantone Hexachrome color system. The DeviceN allows for printing with an arbitrary number of color components, and thus it can use a wider range of colors.

For each page in the PDF input document, the output text file generated by the CountColorsInDoc sample lists whether the following color types are present (true or false):

  • DeviceGray Color
  • DeviceRGB Color
  • DeviceCMYK Color
  • CalGray Color
  • CalRGB Color
  • Lab Color
  • ICC Color
  • DeviceN Color
  • Separation Color
  • Indexed Color
  • Pattern Color
  • Shading Color

ExtractDocumentInfo

View Sample Code

This sample program opens a PDF file, extracts the standard PDF document information from that file, and saves it in a separate PDF document called DocumentInfo.PDF. The document information values are found in the Document Information Dictionary, and include items like the file’s Title, Author, Subject, and Creation Date.

You can display them in any PDF document if you open that PDF in Adobe Acrobat and select File and Properties, and then click the Description tab. In this sample program, however, the program takes the values and posts them on the first page of the output file:

ExtractDocumentInformation

For more detail on the Document Information Dictionary see section 14.3.3 of the ISO 32000 Reference, page 549.

Printing

PDFPrintDefault

View Sample Code

The sample PDFPrintDefault works for Windows, Mac and Unix, and will send the sample document pdfprint.pdf to the default printer on any platform. A user may enter a file name on the command line and that file will print instead.

PDFPrintGUI

View Sample Code

The sample PDFPrintGUI allows a user to send an input PDF document to a printer using the platform’s Print Interface GUI. The sample can work with the Windows print interface or the Cocoa Print Panel for Mac.

In Windows, this print interface appears.

Printer interface

Select the printer, the number of copies to print (and whether those copies will be collated), and the page range, as you would when printing any other document on a Windows machine. The program will print the document coded in the sample program, by default a file called printpdf.pdf.

This simple program is intended to demonstrate a standard process for working with PDF documents, sending a PDF document to a printer. The sample shows how to build a user interface quickly and easily. PDFPrintGUI is based on the more sophisticated sample PrintPDF, which is included with the original samples in the core Adobe PDF Library.

PostScriptInjection

View Sample Code

This sample illustrates adding PostScript comments and commands into a printable output document, generated using the Adobe PDF Library print API, PDFLPrintDoc. PDFLPrintDoc allows for either manual or automatic merging of printer capabilities when the print stream is generated, and it creates the stream most appropriate for the printer. The output is generated as a PostScript (.ps) file; if a local printer is defined and available, the program will send the output to the printer, to create a paper copy.

To list the printer drivers installed on your Windows machine, and the ports, click the Start button and Devices and Printer.

This opens the Devices and Printers dialog in the Control Panel.  Click on a printer to select it, and then click Print Server Properties on the top of the screen. From this window you can select the Drivers tab or the Ports tab.

Security

AddPassword

View Sample Code

One advantage of the PDF format is security and stability. PDF documents can be used as legal documents, in the place of printed copies, because:

  • the text and graphics content of a PDF cannot be easily altered after the file is created
  • a PDF document can be signed with a digital signature  that can be guaranteed, verifying the document and locking it against any further changes,
  • a password can be added to a PDF document to prevent any unauthorized person from opening and reading the content

This program takes a standard PDF input file and adds a password, and then saves the file with the name AddPassword_Out.PDF. To open this output PDF file, you must enter the password “Datalogics.” Note that this program also encrypts the PDF output document, like the EncryptDocument sample program.

AddRedaction

View Sample Code

Sometimes documents containing private, sensitive or classified information must be edited before they are published or distributed in such a way where the original form of the content remains intact but some words or text are deliberately blacked out, or redacted. Use the AddRedaction sample program to search through a PDF document and find and obscure words that need to be kept hidden.

The Adobe Acrobat PDWordFinder object can identify all of the words in a PDF document and create a list or table of those words, including the pages where each word appears, and the place of these words on each page. The AddRedaction sample program uses PDFWordFinder to find words in a PDF document that are to be redacted, or removed, from that document, and then saves the location of these words in a vector of ASFixedQuads. Then, this sample program creates and applies the redactions. Note that AddRedaction removes only single words, not text strings.

The redaction takes the form of each word to be removed from the document being removed and then covered with a black box.

By default, the sample program opens a PDF input document called AddRedaction.PDF:

AddRedaction1

AddRedaction searches for the words “screen” and “navigation” and redacts them from this one-page document, and saves the result to an output file called RedactedDoc.PDF:

AddRedaction2

If the program does not call the PDDocApplyRedactions object the program will mark the words for redaction, but not remove them. In other words, you can configure this program to identify and create a list of words you plan to redact from a PDF document, without actually removing them from the PDF file. This feature would allow you create a list of all of the locations of words in a document that you would like to redact before you actually create the redacted PDF output document.

AddRedaction is related to the TextSearch program, in that both use the PDFWordFinder to search through a PDF document to identify text strings. AddRedaction removes the words found, however, while TextSearch highlights them. In AddRedaction, however, the Word Finder configuration settings are all set to default. In TextSearch these 20 parameter settings are provided and you can change them if you like.

AddTriangularRedaction

View Sample Code

This sample is similar to the AddRedaction sample program described above. Sometimes documents containing private, sensitive or classified information must be edited before they are published or distributed in such a way where the original form of the content remains intact but some words or text are deliberately blacked out, or redacted. Use the AddRedaction sample program to search through a PDF document and find and obscure words that need to be kept hidden.

AddTriangularRedaction is a little different, however, in that program does not search for specific words in the document to redact. Rather, the redacted area of the PDF document is made up of two triangles, colored in gray, and placed close to each other on every page but not touching. The effect is a narrow diagonal line of text is left showing on the page. The two triangles are created by setting coordinates for QuadPoints, or “quads” for an annotation to add to a page. Each quad has a set of four coordinates used to draw a space on a page, bottom-left, bottom-right, top-left, and top-right. Commonly the four quad coordinates would draw a rectangle or polygon on a PDF page. This sample program, however, uses the same set of coordinates for top left (x3, y3) and top right (x2, y2), creating a triangle.

The type of redaction demonstrated using the AddTriangularRedaction sample program is designed to help satisfy a mandate set by the United States Copyright Office. If an agency or company wants to copyright a published test, meant to serve as a secure test to administer to students or graduates, the Copyright Office requires that a sample page of the test be redacted so that part of the text still shows on the page. Hence the pair of triangles; the page of the test is redacted, but a diagonal strip of text still appears. These tests could be used for a variety of educational purposes:

  • Verify that a student has met the high school equivalency requirement (GED)
  • Test for eligibility for a scholarship or professional certification
  • Determine credit for undergraduate course work
  • Exams for admission to a college or graduate school program

The original page looks like this:

AddTriangularRedaction_in

And the redacted output looks like this:

AddTriangularRedaction_out

AESEncryption

View Sample Code

This sample program demonstrates how to add an encryption key and a password to a PDF document. The program uses the Advanced Encryption Standard (AES) algorithm, introduced by the United States National Institute of Standards and Technology in 2001. The program uses version 3 of AES, which uses a three or four byte random initialization vector.

EncryptDocument

View Sample Code

The EncryptDocument encrypts a PDF document to secure it, and adds a password to an existing PDF document, like the AddPassword sample program.  And with EncryptDocument a password is required to open the output PDF document.

By default the program uses the RC4 encryption algorithm. RC4, or Rivest Cipher 4, was developed by Ronald Rivest of RSA in 1987, and it remains a very popular software cipher for streaming data. It is also heavily used with Internet protocols such as the Transport Layer Security (TLS). RC4 is known for its speed and simplicity. But RC4 has been criticized for being vulnerable in recent years. Microsoft recommends disabling RC4 if possible.

So the EncryptDocument program can also use the Advanced Encryption Standard (AES) algorithm, Version 1, 2, or 3, first introduced by the United States National Institute of Standards and Technology in 2001 and made a standard of the federal government in 2002. Version 1 has a zero initialization vector, version 2 a 16 byte random initialization, and version 3 a 4 byte random initialization vector.

The program generates an output file called EncryptDocument.PDF.

LockDocument

View Sample Code

The LockDocument sample program takes an input document called LockDocument.PDF and makes it read only. All editing permissions are removed. A password is not required to open the document, but the sample program updates the PDF so that the reader must enter a password to change the editing permissions. Then the PDF is encrypted, as with the EncryptDocument sample program (see EncryptDocument).

By default the program uses the RC4 encryption algorithm. RC4, or Rivest Cipher 4, was developed by Ronald Rivest of RSA in 1987, and it remains a very popular software cipher for streaming data. It is also heavily used with Internet protocols such as the Transport Layer Security (TLS). RC4 is known for its speed and simpicity. But RC4 has been criticized for being vulnerable in recent years. Microsoft recommends disabling RC4 if possible.

The program can also use the Advanced Encryption Standard (AES) algorithm, Version 1, 2, or 3, first introduced by the United States National Institute of Standards and Technology in 2001 and made a standard of the federal government in 2002. Version 1 has a zero initialization vector, version 2 a 16 byte random initialization, and version 3 a 4 byte random initialization vector.

OpenEncrypted

View Sample Code

The OpenEncrypted sample program is the opposite of EncryptDocument, in that it removes security and encryption from a PDF file. The program opens the source document, OpenEncrypted.PDF, with a password coded into the program itself. Then it sets the PDDocSetNewCryptHandler value to “ASAtomNull” and that removes the security from the document. The output file is saved as unencrypted.PDF, and the program tests the process by opening the output file to make sure that a password is no longer needed. A series of error messages are provided in the code in case the output file does not open properly.

SetUniquePermissions

View Sample Code

The SetUniquePermissions sample program demonstrates how to assign security permissions to a PDF document. The program adds the permissions you select into the PDF and then saves it as an export file. You can also add a password to the document so that no one can change any of these settings without first entering that password, and you can encrypt the document. The program defaults to the RC4 encryption algorithm (see EncryptDocument).

You can decide to grant All Permissions to a PDF document, or choose among 15 different security settings. Most of the settings found in the SetUniquePermissions program correspond to the Document Restrictions Summary values found in Adobe Acrobat.

From the Tools menu, click Protection and More Protection. The list of Document Restrictions appear on the Security tab:

SetUniquePermissions

We list the security settings found in the program below, along with the corresponding Document Restrictions Summary value found in Adobe Acrobat if relevant.

Security Setting Description Corresponds to Document Restriction
PermAll Assign all permissions
PermUser Assign standard permissions to fill in form fields and signatures and make edits
PermSettable Set these four permissions for the document:
PermEdit
PermEditNotes
PermCopy
PermPrint
PermEdit This allows a user broad access to edit a PDF including completing form fields and signing a document as well as document assembly and spawn a template page Changing the Document
PermEditNotes Add/Change/Delete text notes in a PDF Commenting
PrivPermDocAssembly Insert pages/delete pages/create bookmarks/   rotate pages Document Assembly
PrivPermFillandSign Fill in form fields or complete a digital signature     and spawn a template page Filling of Form Fields and Signing
PrivPermFormSubmit Submit a form outside of a browser window
PermCopy Copy content from a PDF to a clipboard so that it can be pasted elsewhere Content Copying and Page Extraction
PrivPermAccessible Save a PDF with Adobe Acrobat's accessibility features to serve people with disabilities
PermPrint Ability to print PDF Printing
PrivPermHighPrint Print high quality output from a PDF.  This is supplemental to the Print option.  If PermPrint is true and PrivPermHighPrint is false only low quality printing is available.
PermOpen Open and decrypt the document. This will have no effect if a user password is not set for the document.
PermSaveAs Enables a Save As function on a PDF. If PermEdit and PermEditNotes are disabled Save is also disabled. But Save As will still work.
PermSecure Change security settings. This will have no effect unless an owner password is set.

ValidateSignature

View Sample Code

PDF documents can allow readers to add digital signatures. In many cases someone who signs a PDF document is simply showing that he or she has read the document and approves of the content. But it is possible for a digital signature or digital signatures to be added to a PDF document so that the document becomes binding, and can be presented a legal document or business contract just as valid as if the document were printed and signed by hand. Recent changes in Federal and State law make this possible. But for a PDF file to be used in this setting, it must be possible to lock the PDF document after a digital signature is added, to prevent any further changes from being applied. More important, it must also be possible to validate the signature on that document, to demonstrate that the PDF document was not altered or tampered with after a digital signature was added.

ValidateSignatures does not define a default output file.

This sample analyzes an input PDF document, lists the number of signatures found in that document, and determines if those signatures are valid. This represents a programmatic way to validate a PDF document. Adobe Acrobat also automatically verifies the signatures in any PDF document opened in Acrobat, but it is necessary to manually open the PDF in Acrobat, and only one PDF at a time.

Note that the default input PDF document named for use by this sample only has one digital signature. But this program is designed to work with PDF documents with multiple signatures.

Digital signatures are annotations, embedded in PDF documents as Acroform, or Acrobat Form, fields. If the sample scans the PDF document and does not find any Acroform fields, the program ends because that means the document does not have any digital signatures. If the sample finds signatures, ValidateSignature uses the VeifySig function to check each signature to determine whether it is valid. The VerifySig function runs in a loop, checking each signature one at a time, until the program has reviewed all of the Acroform fields found in the PDF document.

When a digital signature is added to a PDF document, the MD5 algorithm is used to create a digest of that document. This digest is a 128-bit hash value that represents all of the content found in the PDF document at the time it was signed, everything but the signature itself. To verify the digital signature in the PDF document, the ValidateSignature sample program reviews the PDF document and calculates a new MD5 digest value for that document. If the new value does not exactly match the original digest value, it indicates that someone attempted to change the PDF document, and as a result, the signature is not considered valid.

If a document has more than one digital signature, each signature can be assigned its own MD5 digest value. The digest value represents all of the content in the PDF document up to that signature, including any digital signatures found previously in that same document. So the digest for signature #3 would represent all of the previous content in the document including signatures #1 and #2. That way, every signature in the document will have a unique digest value that the sample program can review, and verify.

Working with Text

AddText

View Sample Code

The AddText sample program creates a single-page PDF document and adds text to that page.

AddText is similar to the AddArt program, in that both AddText and AddArt set at least one PDEGraphicsState object, used to define any objects that appear on the page. Commonly PDEGraphicsState is used to set information about colors, colorspace, line width, and other graphic values, as it does with AddArt. But while AddArt uses PDEGraphicState to define the attributes for drawing an arrow, AddText sets PDEGraphicState to default, and then uses the ASDoubleMatrix test matrix to define the space for where the text will be placed on the page.

The program uses the PDEFont element to define a font to use for text, and draws the text itself from PDEText. The graphics state is also added to PDEText. Then, the content for this text and graphic state are stored within the PDEContent object on the document page.

The PDFEdit Layer (PDE) of the Adobe Acrobat API contains classes that provide for editing in PDF documents including colorspaces, clip and page objects, fonts, form XObjects, text, and other objects.

ExtractText

View Sample Code

This program extracts text found in a source PDF document and exports it to two output files, a separate PDF document and a text file. The program completes two steps. The program first demonstrates extracting ASCII characters from a PDF called ExtractText.PDF, and saves the content to ExtractedText.PDF. Then, it extracts Unicode characters from a file called ExtractUnicode.PDF, and saves these Unicode characters to a text file, ExtractedUnicodeText.txt. The program allows you to define the typeface and font size for the export documents, and the text placement on the new page.

ExtractText, like AddRedaction and TextSearch, uses PDWordFinder to search for text strings within a PDF document. As with AddRedaction, the WordFinder configuration settings are all set to default. You can view and change these parameters in the TextSearch sample program.

HelloJapan

View Sample Code

This simple sample program is effectively a version of Hello World, except that when run it generates a PDF document with the text “Hello Japan,” using Japanese Kanji characters. Two default fonts are provided in the sample program, though the sample does not define a default input file, and it does not define an input directory. The program generates an output PDF with two pages, using each of the two fonts provided, one on each page. If you run the program from the command line you can define the name of the output file and the fonts to use.

The files for the fonts used for this sample program are shipped with the Adobe PDF Library, stored in the Resource directory under APDFL. When the Adobe PDF Library initializes, it loads the files found in the Resource directory.

Parameter Description
CMAPs Some fonts in PDF files use predefined mappings between character encodings and specific predefined character identifier sets. These mappings are stored in files called Character Maps (CMaps). Commonly CMAP files are used to map Unicode characters to Chinese/Japanese/Korean (CJK) characters found in PDF documents.
Unicode Unicode was introduced in 1991. It is an international font standard for coding and representing text digitally. It contains all of the characters for most of the alphabets and writing systems in the world. This includes more one hundred thousand characters used in 129 modern and historic alphabets and scripts. Besides western alphabets Unicode contains characters for Multi-byte character sets. This includes Asian writing systems like Japanese.
The Unicode strings used in this program do not have a Byte Order mark (BOM) and are big endian. BOM serves as the beginning of a Unicode code point and indicates if the proceeding byte sequence is big endian or little endian. Big endian means that the most significant byte of a word is stored at the smallest memory address and that the least significant byte is at the largest memory address.
Type 0 and CID fonts Type 0/Original Composite Font (OCF) format is a composite font designed to support a character set with a large number of glyphs. This is particularly useful with Asian languages like Korean Japanese and Mandarin. Adobe Systems developed the Character Identifier Font (CID) to improve the performance the OCF format.
PDE The PDFEdit Layer (PDE) of the Adobe Acrobat API contains classes that provide for editing objects in PDF documents. This includes a variety of objects stored in the PDE Content Tree:
color spaces
clip and page objects
fonts
form XObjects
images

To learn more, see “Working with Unicode” and “Accessing Font Information.”

InsertHeadFoot

View Sample Code

This sample reads an input PDF document and inserts text for a header and for a footer on each page. The program adjusts the contents of each page if necessary to make sure that the header and footer will fit. It also provides default text for the header and footer, and then saves the PDF as an output file. InsertHeadFoot does not work with rotated pages.

The font used for the text in the header and footer is defined in DEF_FONT and DEF_CHARSET, and the point size for the text for the header and footer is also set. You can name any font you like, but keep in mind that the program must be able to find this font stored on the system where it is run. The program queries the local environment to see if the default font named in the program is available there.

The program is also designed to encrypt the output file so that the header and footer cannot be removed or edited without a password.

To learn more about the matrix values associated with ASFixedMatrix, visit the Datalogics KnowledgeBase and search on “ASFixedMatrix.”

TextSearch

View Sample Code

Use TextSearch to find and highlight every example of a specific word in an input PDF document. The program searches through a document called TextSearch.PDF, which holds text from “A Pirate’s Pocket Book” by Dion Clayton Calthrop (1907), and identifies every place where the word “pirate” appears. Then, the program highlights each instance and saves the output file as a file called Out.PDF in the working directory. TextSearch is related to the AddRedaction program, in that both use the PDFWordFinder to search through a PDF document to identify text strings. AddRedaction removes the words found, however, while TextSearch highlights them.

The Adobe Acrobat PDWordFinder object can identify all of the words in a PDF document and create a list or table of those words, including the pages where each word appears, and the place of these words on each page. The TextSearch sample program uses PDFWordFinder to find words in a PDF document that are to be highlighted.

Step 1: Configuring the WordFinder

In the first step, the program configures the word finder, defining, for example, how to handle spaces between words, hyphens, line breaks, different font styles, and other variables related to searching for target words.

For more information on these parameters in the PDWordFinderConfig structure see the Adobe Systems online Acrobat and PDF Library API Reference for PDWordFinder at help.adobe.com.

You can adjust the settings under Step 1 in this sample program.

Parameter Description
Recsize The record size must be set to match the size of the object PDWordFinderConfigRec.
disableTaggedPDF Set to true to treat this as a non-tagged PDF document. A tagged PDF document contains metadata to describe instructions related to headers and other content on a page. Tagging is generally used with a PDF document to meet accessibility requirements. For example tags in a PDF document might be placed so that text/headings/footnotes/ and other content in the document can be interpreted by a screen reading software tool. The tool could use this information to text in the document out loud for a blind person or respond to voice commands for a reader who can’t easily use a mouse or keyboard.
noXYSort Set to true by default; the WordFinder will not sort the words found in the input document by the x/y coordinates of their locations in the document. Instead the program sorts the words by the place they are found in the content stream. The xy sort method sorts words found top to bottom and right to left as they appear on the page. But this does not necessarily match how the words appear in the content stream for the same document. If the document was created to present the content in reading order instead (in columns) the words will not appear the same way in the content stream as they do on the page. Sometimes it is better to preserve the original order of the words in the content stream because text might be written in text blocks in an order that has a specific purpose.
preserveSpaces If this parameter is set to true the word finder preserves spaces between words. In TextSearch the value is set to false so that means that spaces are removed from the output text. This allows the program to discard extra space characters in the document that are not needed.
noLigatureExp Enable expansion of ligatures in a PDF document and replacing them with regular characters. Ligatures are combinations of two or more characters or glyphs in print documents. In ancient manuscript writing letters were sometimes run together to make the writing faster. Later the practice was adapted for typesetting so that text could be set more efficiently. The ampersand character (&) is a ligature of the Latin letters “e” and “t.” In PDF ligatures are pairs of character glyphs that are converted into a single glyph to make a character pair smaller. A variety of ligatures appear. The most common character pairs for ligatures in PDF would be ff/fl/ffl.
noEncodingGuess When encountering fonts in an input PDF document that it cannot immediately recognize (because of unknown or custom font encoding) the WordFinder will “guess” at an appropriate substitute font to use. This can lead to the WordFinder converting text to a font that is unsuitable and that leads to export text that is unreadable. For TextSearch this value is set to True to turn off the ability of the WordFinder to guess at fonts. Instead the WordFinder tries to provide the original characters without any encoding conversion.
unknownToStdEnc Set to false. The WordFinder will not assume that all fonts found in the document are standard Times Roman. It this parameter is turned on it will override the noEncodingGuess option above.
ignoreCharGaps Set to true. The WordFinder will not convert large gaps between characters into blank spaces. The WordFinder will only report a character space for the export document when a blank space character appears in the source PDF document.
ignoreLineGaps Unlike ignoreCharApps this parameter is turned off. If ignoreLineGaps is set to true the WordFinder will determine line breaks in text only when it finds a line break character in the source PDF document. But by default this parameter is set to false meaning that the WordFinder will interpret any vertical movements in text as a line break and place a line break in the export PDF document.
noAnnots The WordFinder will not extract text embedded in annotations found in the source PDF document. So if a text note is added to a PDF document the WordFinder will ignore it.
noHyphenDetection WordFinder will not make a distinction between hard hyphens and soft hyphens used to break words between syllables in a PDF source document. A soft hyphen (or optional hyphen) is an invisible marker in the text where the editing tool used (such as MS Word) could insert a hyphen to break a word from the end of one line to the beginning of the next. A soft hyphen shows where a hyphenated break should occur if needed. A hard hyphen would be placed manually by an editor to force a break in a word.
trustNBSpace This value defaults to false. The parameter to trust non-breaking spaces is disabled so the WordFinder will not try to tell breaking and non-breaking spaces apart. If set to true the Word Finder will seek to preserve a space without breaking a word. A non-breaking space will prevent an automatic line break (line wrap) at the position it is found on a page.
noExtCharOffset Defaults to false. The WordFinder will generate extended character offset information when searching for words. The parameter is sometimes disabled to make the process faster and to use system memory more efficiently. If the WordFinder is selecting words to highlight character by character the offset information for each character is necessary to select words accurately. The character offset describes the position of the character on the page both horizontally and vertically. A positive horizontal offset value moves a character to the right and a positive vertical offset moves it up.
noStyleInfo By default this feature is set to false so the WordFinder will use character style information when searching for words. Sometimes this feature is disabled to make the word search faster and use system memory more efficiently. Here character style refers to formatting attributes stored in the PDStyle object. This would be fonts and font sizes and colors.
decomposeTbl Defaults to nullptr (Null pointer). This refers to a 16 bit Unicode Transformation Format (UTF-16) decomposition table. This table is used to expand Unicode ligatures not found in the default ligature list by converting a ligature to a string of characters. Every record in the table includes a UTF-16 character value and a UTF16 string to use to replace the original ligature character found. For example the table might be used to convert the ligature “ll” to a set of two separate “l” characters. By default the decomposeTbl table is not used.
decomposeTblSize The decomposeTbl is not used so the size of the table is set equal to zero (0) bytes.
charTypeTbl Defaults to nullptr (Null pointer). This is a custom table used to improve the ability of the WordFinder to complete word breaks. Each character type record in the table includes a region start and end value to make it easier to identify when a character starts and begins. Character type here refers to the semantic type of character such as digit/upper case character/ or punctuation mark (see the Adobe Systems header file PDExpT.h lines 3376-3456).
charTypeTblSize The charTypeTblSize is not used by default so the size of the table is set equal to zero (0) bytes.
preserveRedundantChars This feature is disabled. By default the WordFinder will remove redundant characters in a PDF document. Sometimes a PDF document might feature the same text presented multiple times in the same place to create special effects (such as a shadow). These extra instances of the same text are not needed when seeking to highlight words in a document. Therefore the feature is turned off in the TextExtract sample program.
disableCharReordering Set to false. The program will use character reordering. The WordFinder orders characters found on a single line on a page in order by their relative locations horizontally. This usually improves the ability of the Word Finder to identify and extract words. But if a page has a number of overlapping bounding boxes the result can be uncertain. So character reordering may be needed. Character reordering is used to make sure that the words found on a page are ordered in a way that they are separate and distinct even if they overlap. The Bounding Box in this context is a quadrilateral that indicates the position of a part of the word on the page. It is based on coordinates associated with that bounding box. A hyphenated word might have two bounding boxes (one for each part of the word).

In TextSearch, the Decompose Table (decomposeTbl) Character Type Table (charTypeTbl) are not used. Hence both are set to nullptr, and the table size is set to zero bytes. But we include these parameters in the sample so that you can see that they are available, and you can set your own table names and sizes.

Step 2 Highlight color

Then, Text Search defines the color to use for highlighting the text it finds in the PDF input document, using RGB color space coordinates.

Step 3 Word search

With those settings in place, the program checks each page for words that match the search value. By default the program looks for any text strings in the input document that contain the search text. So the program looks for the word “pirate” and highlights, in the output file the phrases “pirates,” “pirate–,” and “pirate.”

Step 4 Highlight the words and create the output file

Finally, the program adds a highlight annotation to each word found and saves the annotations to the output document. Each annotation appears as a rectangle, and the word fills in the rectangle with the highlight color.

TextSearch

UnicodeText

View Sample Code

This sample program demonstrates how Adobe PDF Library works with Unicode text. Unicode is an international standard for coding, handling, and representing text in writing systems around the world. Every letter, numeral, punctuation mark, or symbol found in a group that includes most of the world’s alphabets and writing systems is represented in Unicode by a unique value, called its code point. Unicode code points are standard across many different computer systems and platforms. Unicode features more than 120,000 characters used in 129 modern and historic alphabets and scripts.

The UnicodeText program converts sequences of byte values represented in hexadecimal, embedded in the program code (starting at line 46), into Unicode characters. Then it saves these characters in a PDF output document called Unicode.PDF. The samples feature English, Japanese, French, Korean, and Russian versions of the phrase “Universal Declaration of Human Rights,” shown both horizontally and vertically.

The program defines the fonts to use for each block of text, and whether the text will be horizontal or vertical. UnicodeText also verifies that the fonts selected are compatible.

The Unicode strings used in this program do not have a Byte Order mark (BOM) and are big endian. BOM serves as the beginning of a Unicode code point, and indicates if the proceeding byte sequence is big endian or little endian. Big endian means that the most significant byte of a word is stored at the smallest memory address, and that the least significant byte is at the largest memory address.