Adobe® PDF Library

Finding and Editing File Content

Besides exporting text or graphics from a PDF file, the Adobe PDF Library offers several Java and C# sample programs that can find, present, or edit information about a PDF file.

Automatically applying keywords to a group of PDF files

A global manufacturing concern has a group of subsidiaries that build electronic tools and instruments intended for the health care field, academic, governmental and scientific laboratories, and regulators that monitor the construction trade.  These products tend to be highly specialized, so besides having small sales volumes, the products are hard to market.  You don’t create TV commercials or place ads in Vogue for a device that measures the soundness of 15-year-old concrete in a bridge deck, or use Facebook to help with an effort to sell a background radiation detector intended for use in hospitals.  At the same time, the firm faces stiff competition from several other manufacturers in Asia, and the firm’s customers are generally small institutions that are broadly distributed around the world.  Besides sales reps and trade shows, the firm needs to help its potential customers find its products by making effective use of Internet search tools.

Part of this effort involves effectively placing PDF files so that potential customers can find them.  The firm provides a great deal of technical information about its products online, in the form of product descriptions, technical marketing content, specifications, and performance reports.  All of these items are updated at least once every six months.  Some are generated as frequently as once a month, and the firm has nearly 1000 of these documents available to support over one hundred individual products.  All of these materials are provided as PDF files, as PDF is better suited than HTML for presenting long documents with diagrams and technical content.   Further, the firm has automated tools in place to generate and post these updated files.

But PDF files do not lend themselves as well to online searching.  PDF files lack the tag structure found in HTML files that the Google search engine favors.  Rather, metadata needs to be added to the Properties page for each PDF file, including the firm’s name, the name of each product, and a set of general keywords related to the firm’s products.

When the firm generates a new PDF file the product name is included in the name of the PDF file, and this name also appears in the Title field in the Properties page.  But the firm wants to create a utility that will automatically add the Firm name in the Author field, and a series of keywords in the Keywords field.  The keywords need to be specific to the type of product.  So the documents would need to be sorted into three categories first, health care, laboratory, and construction, and then a set of relevant keywords applied separately to each group.  All of the health care products would receive the same set of keywords, added to the Keyword field in the Properties window, and the same is true for the laboratory and construction instruments.  Further, the firm wants to be able to apply new keywords to existing PDF files, rather than always have to generate new PDF files with the keywords added and then save the updated PDF files.  This takes a lot less time and system overhead, and allows the marketing group to update the files regularly whenever new keywords are introduced.

  1. Generate the PDF files for each instrument, with the name of the product included in the file name for each file.
  2. Sort the files by product type, health care, laboratory equipment, and construction.
  3. Copy the PDF files to the appropriate server directories.
  4. Run a program that will apply keywords for each type of product. The program will draw keywords from an external text file.
  5. Run the utility for each batch of files to add the keywords to the PDF files.
  6. Save the PDF files back to the original directories and then post them to the appropriate web pages.

The ListInfo program prompts a user to enter the name of a PDF file and then displays the Title, Subject, Author, Keywords, Creator, and Producer.  Then the system invites the user to enter new values for these items.  But for the purpose of this firm the user interface is not needed.  A developer could use ListInfo code as the basis for a program that would automatically select a PDF file and automatically add metadata values to the Properties window before saving that file.  The program would need to access the PDF files from a server directory, as described above, and feature a loop so that it would review them one by one.  It would also need to be able to access the keywords from an external text file to apply to the PDF files, in place of a user entering them at a command line prompt, and apply them in a group to the Keywords field in the Properties Screen.  And it would need to automatically save each PDF file back to its original directory, overwriting the original version of the file.

The ListInfo program is written to display the current metadata value in a selected file, and then prompts the user to enter a new value to replace it, shown here in C#:

Console.WriteLine("Document Subject " + doc.Subject);
Console.WriteLine("Change document subject to: ");
string newsubject = Console.ReadLine();
Console.WriteLine("Document Author " + doc.Author);
Console.WriteLine("Change document author to: ");
string newauthor = Console.ReadLine();

The same code appears in Java:

System.out.println("Document Subject " + doc.getSubject());
System.out.println("Change document subject to: ");
String subject = stdin.readLine();
doc.setSubject(subject);
System.out.println("Document Author " + doc.getAuthor());
System.out.println("Change document author to: ");
String author = stdin.readLine();
doc.setAuthor(author);

After the user provides the value, the program replaces the original value with the new value provided by the user, shown here in C#:

doc.Subject = newsubject;
doc.Author = newauthor;

In Java, the matching line of code:

doc.setSubject(subject);

appears above, in the prompt shown.

And then the program saves the values to a new PDF file, using the Document.Save method. The application described above would need to replace the user input for the keywords by accessing the keyword values in an external text file.  The file could be updated at will as long as the name and location of the file remain the same.  The application would also need to be written to select a PDF file, update the keywords in that file, and then save the file with the new values provided.

Adding and editing text in a PDF file

The Adobe PDF Library provides several sample programs that you can use to add text to a PDF file, in the form of glyphs or Unicode characters.

Suppose your firm has a computer system that regularly generates PDF files that you post to a series of web pages.  These are marketing documents for your firm’s products; periodically you want to add a note to the top of one of these PDF files announcing special offers, sales, or news items.  These notes are commonly 15 to 20 words long; your customers have learned to expect them and look for them. To make the messages distinctive, you use a special font that your firm created for that purpose.  But that means that you need to render the text on each PDF file as a series of glyphs.  In PDF files, each character in a font shown is rendered as a glyph.

The process would include these steps to update a PDF file with a special marketing message:

  1. A user types a brief statement in a user interface, or at a command prompt.
  2. The interface or program saves the message to a text file and then translates it into a series of glyph characters.
  3. Another program adds the glyphs to the upper left corner of the appropriate PDF file.

This program would run once for each PDF file to be updated.

You could consider the AddGlyphs sample program in APDFL.  This program creates a new PDF file and adds a series of glyphs to that file:

Document doc = new Document();

Rect pageRect = new Rect(0, 0, 612, 792);
Page docpage = doc.CreatePage(Document.BeforeFirstPage, pageRect);
Console.WriteLine("Created page.");

Font font = new Font("Arial");

List<Char> glyphIDs = new List<Char>();
glyphIDs.Add('u002b');
glyphIDs.Add('u0028');
glyphIDs.Add('u002f');
glyphIDs.Add('u002f');
glyphIDs.Add('u0032');

Then, it names and saves this PDF file.

The same code appears in our Java sample code:

Document doc = new Document();
Rect pageRect = new Rect(0, 0, 612, 792);
Page docpage = doc.createPage(Document.BEFORE_FIRST_PAGE, pageRect);
System.out.println("Created page.");
Font font = new Font("Arial");
List<Character> glyphIDs = new ArrayList<Character>();
glyphIDs.add(new Character('u002b'));
glyphIDs.add(new Character('u0028'));
glyphIDs.add(new Character('u002f'));
glyphIDs.add(new Character('u002f'));
glyphIDs.add(new Character('u0032'));

You could use the command prompt interface provided with several of our other sample programs to create a simple way for a user to enter text, like this one found in ListInfo, here in C:

Console.WriteLine("Opened a document.");
Console.WriteLine("Document Title " + doc.Title);
Console.WriteLine("Change document title to: ");
string newtitle = Console.ReadLine();

and in Java:

System.out.println("Opened a document.");

BufferedReader stdin = new BufferedReader(new InputStreamReader(System.in));

System.out.println("Document Title " + doc.getTitle());
System.out.println("Change document title to: ");
String title = stdin.readLine();
doc.setTitle(title);

To convert the text provided by the user into glyphs, you would need to set up a translation table that would map upper and lower case letters and standard punctuation characters for a typical font, like Times Roman or Arial, into the corresponding glyphs for your unique font.  The table would need to map about 60 characters to glyphs.

Then, the program would need to add them to the PDF file, one glyph at a time.

You could do without the conversion table if you wanted to post a stock message to a PDF file.  If you don’t need to change the message, you can simply hard code the glyph characters into the program, as appears in the AddGlyphs sample program in APDFL.

Adding a watermark to PDF file

Your firm distributes PDF files to your customers in the form of monthly financial statements.  One of the pages for this financial statement is a page from your outside auditor, verifying that the information that you provide on these reports has been audited and verified by an objective third party.  You add this auditor’s page to the back of each quarterly statement; the auditor’s statement varies depending on the financial product offered, and you have 18 different product lines, so 18 different PDF files once a month from your auditor, to be added to each customer statement.

The auditor, of course, is a separate company that provides the monthly report on its own letterhead and using its own format.  But you want to include your company logo on this statement in the upper left corner.  And you want to automate this process.  You want to be able to:

  1. Generate the PDF files for your 1400 customers and sort them by product line into a set of server directories, sorted in order by account number
  2. Accept the 18 different auditor statements each month and store them in a server directory
  3. Apply the watermark to each of these 18 PDF files automatically
  4. Copy the 18 auditor statements to the appropriate server directories where the customer statements are stored for each product
  5. Add the auditor statement page for each product to the end of every other PDF file in the directory, and save those files
  6. Delete the auditor statement

Then, you can send the PDF files out as email attachments, or print and mail them by US Mail.

To do this you can consider the APDFL Watermark sample program, to apply a graphic to a PDF file.

The Watermark sample program prompts a user to enter the name of the PDF file to apply the watermark, and then the name of the file holding the watermark graphic.

The program defines the size and placement of the watermark:

WatermarkParams watermarkParams = new WatermarkParams();
watermarkParams.Opacity = 0.8f;
watermarkParams.Rotation = 45.3f;
watermarkParams.Scale = 0.5f;

It defines the pages where the watermark will appear, in this case, every other page:

watermarkParams.TargetRange.PageSpec = PageSpec.EvenPagesOnly;

The code is similar in Java:

WatermarkParams watermarkParams = new WatermarkParams();
watermarkParams.setOpacity(0.8f);
watermarkParams.setRotation(45.3f);
watermarkParams.setScale(0.5f);

watermarkParams.getTargetRange().setPageSpec(PageSpec.EVEN_PAGES_ONLY);

The program then defines the page where a different watermark will appear, on odd pages:

doc.Watermark(watermarkDoc.GetPage(0), watermarkParams);

watermarkParams.TargetRange.PageSpec = PageSpec.OddPagesOnly;

For this watermark, the program uses text rather than a graphic, with the text hard-coded into the program itself, “Multilinewatermark.”  The program defines the color for the text, the font, the point size, and the text alignment:

WatermarkTextParams watermarkTextParams = new WatermarkTextParams();
Color color = new Color(109.0f/255.0f, 15.0f/255.0f, 161.0f/255.0f);
watermarkTextParams.Color = color;

watermarkTextParams.Text = "MultilinenWatermark";

Datalogics.PDFL.Font f = new Datalogics.PDFL.Font("Courier", FontCreateFlags.Embedded | FontCreateFlags.Subset);
watermarkTextParams.Font = f;
watermarkTextParams.TextAlign = HorizontalAlignment.Center;

doc.Watermark(watermarkTextParams, watermarkParams);

doc.EmbedFonts();

You might decide to use text rather than a graphic image as your watermark.  The sample program provides both options.

This section of code in Java looks like this:

doc.watermark(watermarkDoc.getPage(0), watermarkParams);

watermarkParams.getTargetRange().setPageSpec(PageSpec.ODD_PAGES_ONLY);

WatermarkTextParams watermarkTextParams = new WatermarkTextParams();
watermarkTextParams.setText("MultilinenWatermark");

Font f = new Font("Courier", EnumSet.of(FontCreateFlags.EMBEDDED, FontCreateFlags.SUBSET));
watermarkTextParams.setFont(f);
watermarkTextParams.setTextAlign(HorizontalAlignment.CENTER);
Color c = new Color(109.0f/255.0f, 15.0f/255.0f, 161.0f/255.0f);
watermarkTextParams.setColor(c);

doc.watermark(watermarkTextParams, watermarkParams);

doc.embedFonts();