Adobe® PDF Library

Merging Multiple PDF Files into a Single Document

Your employer provides investment counseling and wealth management for a variety of foundations, colleges & universities, museums, and non-profit institutions in the United States and Europe.  One of the services you offer to your customers is to provide each one with a custom monthly statement.  You generate this as a PDF file; some of your customers prefer it to be printed and mailed, but for most of them you send it as an email attachment or offer secured download from your FTP site.  The information comes from a variety of sources, including government regulators from several countries, financial data and charts from banks, exchanges, and underwriters, and market analysis and prediction reports from a variety of independent economists and financial analysts.  You also add a cover letter to each report, specific to each customer, and each customer report receives a cover page with the current date.

All of the content is in English but you typically end up with seven or eight different PDF files for each of your 150 customers, not including the cover page, and the formatting varies from one file to another.  Some documents are in A4, others 8 ½ x 11 inch, and some are in landscape layout.  First you need to sort the PDF files in the right order, identify pages in the files that you don’t need, and delete those pages.  Then, you need to adjust the format for these individual PDF files for each customer so that they are consistent and then merge the pages together, and in the right order.

What you need to do is to create a program that will format the PDF pages and then merge them into a new PDF file for each customer, one by one. The program will need to define a standard page size and layout to appear in the final PDF output file.  Then, the program needs to be able to analyze the set of PDF files one by one and identify those where the page sizes must be adjusted.  After the page sizes are corrected, the program must be able to merge these pages in the right order, creating a final PDF output file that you can print or deliver to the customer directly.  The vendors and institutions that provide your original source PDF files create them each month on a pre-determined schedule and use the same format, layout, and page order each month, so it is possible to create a program that can automatically identify and delete pages that aren’t needed after merging the PDF files into a single output file.

To complete these steps you would rely on the code from the sample program MergePDF.  This program is designed to allow a user to enter the names of two different PDF files.  The program joins them together in a new PDF output file.  In this case you would use the sample code to create a process that would identify a series of PDF files and merge them together into a single output file.

But that is the end of the process.  Before you can create the final merged files, you need to create a program or set of scripts that will complete the following steps:

  1. Collect all of the source PDF files from vendors and suppliers into a single server directory. These will be delivered to you from various sources, and over several days.
  2. Create a new monthly server directory for each customer. Each one will serve as a staging area for assembling the final reports to these customers.
  3. Identify those PDF files that are specific to individual customers or groups of customers as they are received. Rename each file to add the customer account number to the file name.
  4. Move these customer-specific PDF files to the appropriate customer subdirectories.
  5. Make a copy of the generic PDF files and put one copy in each customer subdirectory. For example, if you have a general financial forecast that you want to add to each customer monthly statement, and you are creating 175 monthly statements, you would make 175 copies of this financial forecast PDF and place one copy in each of the customer subdirectories.
  6. Now you have all of the content you need for each customer’s monthly statement. Eliminate the pages that you don’t need in each of the PDF files.  Select each PDF file one by one, based on its name prefix, and use the DeletePage method in the C# Document Class.

 To delete a single page in a PDF file:

Public void DeletePage (
    Int pageNumber
)

Or, for a series of pages:

public void DeletePages(
   int firstPageNumber,
   int lastPageNumber,
   string destinationFileName
)
  1. Process the PDF files in each subdirectory so that the page sizes are all adjusted to the 8½ x 11 inch standard. Landscape diagrams can be left in the landscape layout.

The European A4 standard is close to the American 8½ x 11 inch page size, so you might want to use the Page.Mediabox method to simply change the page size by setting the page dimensions.

If you need to scale the contents of a page, especially if one page is considerably larger than the standard, you could import the content of the larger page into a Form object.  Then, you could insert that form into a new destination page where you have set the dimensions to the size you want.  For C# the code might look like this:

    Form pageForm;
    using (Page page = originalDoc.GetPage(i))
      {
           // Make a form that has the page's content
           pageForm = new Form(page.Content);
      }
      //creates the next page in the newDocument
      Page newPage = originalDoc.CreatePage(mergeAfterIndex, newPageMediaBox);

      // Put a copy of that form into the newPage
      newPage.Content.AddElement(pageForm.Clone());

      //update the content of the page that was just created
      newPage.UpdateContent();
  1. When you have the final set of PDF files, with the unneeded pages removed, and the page formats set to match the 8½ x 11 inch standard, run code from the MergePDF program to create a single output PDF file for each customer.

The MergePDF program defines a new file for each original PDF document, in C#:

       Document doc1 = new Document(filename);

And Java:

       Document doc1 = new Document(filename);

And then insert pages, attaching the second file to the end of the first, for C#:

doc1.InsertPages(Document.LastPage, doc2, 0, Document.AllPages, PageInsertFlags.All);

or for Java:

doc1.insertPages(Document.LAST_PAGE, doc2, 0, Document.ALL_PAGES, EnumSet.of(PageInsertFlags.ALL));

When this process is complete, you will have one server subdirectory for each customer, with all of the individual PDF files needed for each customer to create the next monthly report.

Delete all of the unneeded files in each subdirectory, leaving the actual report file for each customer.  Then, send the PDF file to the customer as an email attachment or send it to a printer.