Adobe® PDF Library

Redacting Text from a PDF Document

Redactions

View Sample Code

Sometimes documents containing private, sensitive or classified information must be edited before they are published or distributed. The editing needs to be done so that the original form of the content remains intact but some words or text are deliberately blacked out, or redacted. Use the Redactions sample program to search through a PDF document and find and obscure words that need to be kept hidden.

This sample opens an input PDF, searches for specific words using the Adobe PDF Library PDFWordFinder, and then removes these words from the text. The Adobe Acrobat PDWordFinder object can identify all of the words in a PDF document and create a list or table of those words, including the pages where each word appears, and the place of these words on each page. The PDFWordFinder finds the locations of each word as a Quad. The quad represents a physical four-point rectangle in the document.

The sample removes the words “rain” and “cloudy” from the document.  For the word "cloudy,” the sample shows how to change the display details of the redaction, such as changing the default color of the redacted box from black to red.

The sample defines three optional PDF documents, one input and two output. The text on the input document is redacted, and then the program saves this input document. One of the output documents is saved with the redacted values applied, and the other is saved without the redacted values.

AddRegexRedaction

View Sample Code

Sometimes documents containing private, sensitive or classified information must be edited before they are published or distributed in such a way where the original form of the content remains intact but some words or text are deliberately blacked out, or redacted. Use this sample to search for and redact phrases or text patterns within a PDF input document.  You can search the entire PDF document or within a page range you provide.

Enter your Regex search string to define the text or pattern to locate and redact. You can also enter the name of the input file you plan to use, and the name of the output file. The sample uses PDDocTextFinder to find instances of a phrase or pattern in a PDF input document.

For AddRegexRedaction, the program generates two PDF output documents, one showing the search text highlighted, and the other showing the text redacted. The redaction takes the form of each match to be removed from the document appearing covered with a rectangle.

The sample normally highlights search text with a box that surrounds the entire phrase found. But if the search text is on multiple lines, or if the font changes within the phrase, the content appears in multiple boxes. The redaction may also show multiple rectangles for the same reasons.

See the description of the RegexTextSearch and RegexExtractText sample programs.