Adobe PDF Library

Extracting text from PDF files

TextExtract

View Sample Code

This program pulls text from a PDF file and exports it to a text file (TXT).  It will open a PDF file called Constitution.PDF and create an output file called TextExtract-untagged-out.txt.  The export file includes page number references, and the text is produced using standard Times Roman encoding.  The program is also written to include a provision for working with tagged documents, and determines if the original PDF file is tagged or untagged.  Tagging is used to make PDF files accessible to the blind or to people with vision problems.