Adobe PDF Library

Memory Management

Overview

When you create an application based on the Adobe PDF Library Java and .NET Interface code, it is best to use good memory management practices.

Systems written in Java or C# use a garbage collection to clear system memory automatically, deleting objects from memory when they are no longer needed.  Objects are added to sections of memory referred to as “heaps,” and when a heap is filled with objects, the garbage collection process runs automatically to clear that memory space.

In general, managed objects, in the .NET or Java garbage collected heap, appear very small to the runtime environment.  They are not much more than handles to the underlying unmanaged, native resources in the Adobe PDF Library itself. When the system collects garbage, and the Java or .NET object is collected and finalized, the system releases the corresponding PDFL resources.

The garbage collector uses groups called generations to improve performance.  When objects are added to memory space, they tend to be removed quickly.  But the longer an object remains in memory the longer it is likely to survive there.  So the garbage collector sorts objects by age—how long they have been stored in memory—in generations.

The small Java and .NET Interface objects exert little memory pressure on the runtime, so it may take a long time for the garbage collection process to run. The underlying PDF Library objects do consume system memory and resources, but most of this is hidden from the Java and .NET runtime because the PDF Library is mostly native code.   This means that by the time the garbage collection runs the system may have a large amount of memory consumed by objects no longer in use. You are therefore encouraged to dispose of objects explicitly, as described below.

We provide a pair of sample programs called MemoryFileSystem, one Microsoft .NET and the other Java, that demonstrate how to use RAM memory instead of the hard disk for saving temporary files.

How objects get deleted

When the garbage collection process runs it starts automatically, and runs as needed.  Any unreferenced objects, including Java and .NET objects, will be considered for garbage collection.

But Java and .NET objects have finalizers, which ultimately release the underlying PDFL resources. So on the first garbage collection run, these objects will be placed in a finalization queue.

Later, the garbage collector finalizes the objects in the finalization queue, releasing the PDFL resources.

After that, the Java and .NET objects themselves are garbage collected on the next pass, finally freeing the last memory space in use.

That means that it may take two garbage collection runs before the PDFL resources are released. If the Java and .NET objects are in a list or other container, it may take more passes to finally release the PDFL resources.  This is besides those objects that survive in memory into generation 1 or generation 2, or later.

As a result, some PDFL usage patterns may result in large numbers of PDFL resources remaining for a long time in memory while the program is running.

Implicit disposal

Implicit disposal happens when the garbage collector finalizes a Java or .NET object. With implicit disposal, the object releases its hold on PDFL resources. Other objects depending on the same internal PDFL resources may still cause the PDFL resources to be held, if required.

For instance, consider the case where a program creates a Document object, and subsequently obtains a Page object from the document. It is possible that if there are no more references to the Document, the corresponding object may be released. However, the Page object internally holds a PDPage, and for the PDPage to exist, the PDDoc containing it must still exist. So even though the Document object has been garbage collected, the PDDoc is open, the corresponding PDF file may still be locked on Windows, and so forth. After the Page object is garbage collected and finalized, the PDDoc will be closed.

Explicit disposal

Explicit disposal happens when a user program calls Dispose() (.NET) or delete() (Java) on a Java or .NET object. During this call, the PDFL resources are released immediately, including any resources required by dependent objects. Releasing the resources of dependent objects distinguishes explicit disposal from implicit disposal.

For example, consider the case where a program creates a Document object, and subsequently obtains a Page object from the document. If the program then calls Dispose on the Document, the document’s internal PDFL resources are released, but all dependent resources are released first. Thus, the Page object is no longer valid and attempting to use the Page object will result in an error: “Object is no longer valid (perhaps a parent object was already destroyed).”

Best practices

It is important, then, to release the unmanaged resources in PDFL as soon as an object is no longer in use. To do this, call the Dispose method in .NET, or the delete method in Java languages.

In C#, a using statement does automatic call to Dispose():

using (Page page = doc.GetPage(0)) {
// code here
} // page.Dispose() automatically called here, or if an exception is thrown

In Java, use a

try..finally

block to ensure disposing the object at the end of the block or if an exception is thrown:

Page page = doc.getPage(0);
try {
// code here
}
finally {
page.delete();
}

 Special cases

Library.  Disposing of the Library object ends the use of the Adobe PDF Library, and thus releases all PDF Library data resources allocated to the Library on the thread. This is an important part of cleaning up after the use of the Library. Note that multiple Library objects can be created on the same thread; the release of resources happens when all of the Library objects are disposed.

Document. Disposing the Document also closes the corresponding .PDF file.  Therefore, it is good practice to dispose of the Document after the system has finished processing on that document. It is also a natural consequence that calling Dispose on the Document automatically cleans up any resources allocated to work on that document.

Page. Disposing of the Page may be useful if many objects were allocated during work on that page, including work done on the Content of the Page.

Word. The PDF Library holds a table of all words on a PDF page (and information about these words) to be used when extracting text from a page.  This table remains as long as a Word object refers to it.  Please be sure to dispose of Word objects when they are no longer needed, to avoid retaining unwanted references in memory.