Adobe® PDF Library

Searching for Content

RegexTextSearch

View Sample Code

This sample searches for phrases or text patterns in a PDF input document. It supplies sample regular expressions to use in searching for phone numbers, email addresses, or URLs, and you can use them or create your own. You can search the entire PDF document or provide a page range for your search. The program generates an output PDF document that matches the input file except that the search content appears highlighted.  You can enter the name of the input file you plan to use, and the name of the output file.  The sample uses PDDocTextFinder to find instances of a phrase or pattern in a PDF input document.

The sample normally highlights search text with a box that surrounds the entire phrase found. But if the search text is on multiple lines, or if the font changes within the phrase, the content appears in multiple boxes.

RegexExtractText

View Sample Code

The RegexExtractText sample searches for phrases or text patterns in a PDF input document, and allows you to search the entire PDF document or within a page range you provide. The program finds every phrase that matches the regular expression provided, and then extracts the matching phrase and details about the quad or quads for each phrase and saves the content to a JSON output file. The quad is a bounding box that describes the position of a phrase on the PDF page, defined by four coordinates, top left, top right, bottom left, and bottom right.

This sample supplies some default regular expressions to use when searching for phone numbers, email addresses, or URLs. You can use them or create your own.

You can also enter the name of the input file you plan to use, and the name of the output file.  The sample uses the PDDocTextFinder API to find instances of a phrase or pattern in a PDF input document.