Adobe® PDF Library

Memory Management

Overview

When you create an application based on the Adobe PDF Library and .NET Core code, it is best to use good memory management practices.

Systems written in C# use a garbage collection to clear system memory automatically, deleting objects from memory when they are no longer needed.  Objects are added to sections of memory referred to as “heaps,” and when a heap is filled with objects, the garbage collection process runs automatically to clear that memory space.

In general, managed objects, in the .NET garbage collected heap, appear very small to the runtime environment.  They are not much more than handles to the underlying unmanaged, native resources in the Adobe PDF Library itself. When the system collects garbage, and the .NET Core object is collected and finalized, the system releases the corresponding Library resources.

The garbage collector uses groups called generations to improve performance.  When objects are added to memory space, they tend to be removed quickly.  But the longer an object remains in memory the longer it is likely to survive there.  So the garbage collector sorts objects by age—how long they have been stored in memory—in generations.

The small .NET Core objects exert little memory pressure on the runtime, so it may take a long time for the garbage collection process to run. The underlying PDF Library objects do consume system memory and resources, but most of this is hidden from the .NET Core runtime because the PDF Library is mostly native code.   This means that by the time the garbage collection runs the system may have a large amount of memory consumed by objects no longer in use. You are therefore encouraged to dispose of objects explicitly, as described below.

We provide a sample program called MemoryFileSystem for Microsoft .NET Core that demonstrate how to use RAM memory instead of the hard disk for saving temporary files.

How objects get deleted

When the garbage collection process runs it starts automatically, and runs as needed.  Any unreferenced .NET objects will be considered for garbage collection.

But .NET objects have finalizers, which ultimately release the underlying resources for the Adobe PDF Library. So on the first garbage collection run, these objects will be placed in a finalization queue.

Later, the garbage collector finalizes the objects in the finalization queue, releasing Library resources.

After that, the .NET objects themselves are garbage collected on the next pass, finally freeing the last memory space in use.

That means that it may take two garbage collection runs before the Adobe PDF Library resources are released. If the .NET objects are in a list or other container, it may take more passes to finally release these resources.  This is besides those objects that survive in memory into generation 1 or generation 2, or later.

As a result, some usage patterns may result in large amounts of resources for the Library remaining for a long time in memory while the system is running.

Implicit disposal

Implicit disposal happens when the garbage collector finalizes a .NET object. With implicit disposal, the object releases its hold on Adobe PDF Library resources. Other objects depending on the same internal resources may still cause these Library resources to be held, if required.

For instance, consider the case where a program creates a Document object, and subsequently obtains a Page object from the document. It is possible that if there are no more references to the Document, the corresponding object may be released. However, the Page object internally holds a PDPage, and for the PDPage to exist, the PDDoc containing it must still exist. So even though the Document object has been garbage collected, the PDDoc is open, the corresponding PDF file may still be locked on Windows, and so forth. After the Page object is garbage collected and finalized, the PDDoc will be closed.

Explicit disposal

Explicit disposal happens when a user program calls Dispose() on a .NET object. During this call, the Adobe PDF Library resources are released immediately, including any resources required by dependent objects. Releasing the resources of dependent objects distinguishes explicit disposal from implicit disposal.

For example, consider the case where a program creates a Document object, and subsequently obtains a Page object from the document. If the program then calls Dispose on the document, the document's internal Library resources are released, but all dependent resources are released first. Thus, the Page object is no longer valid and attempting to use the Page object will result in an error: "Object is no longer valid (perhaps a parent object was already destroyed)."

Best practices

It is important, then, to release the unmanaged resources in Adobe PDF Library as soon as an object is no longer in use. To do this, call the Dispose method in .NET.

In C#, a using statement does automatic call to Dispose():

using (Page page = doc.GetPage(0)) {
// code here
} // page.Dispose() automatically called here, or if an exception is thrown

Special cases

Library.  Disposing of the Library object ends the use of the Adobe PDF Library, and thus releases all Adobe PDF Library data resources allocated to the Library on the thread. This is an important part of cleaning up after the use of the Library. Note that multiple Library objects can be created on the same thread; the release of resources happens when all of the Library objects are disposed.

Document. Disposing the Document also closes the corresponding PDF file.  Therefore, it is good practice to dispose of the Document after the system has finished processing on that document. It is also a natural consequence that calling Dispose on the Document automatically cleans up any resources allocated to work on that document.

Page. Disposing of the Page may be useful if many objects were allocated during work on that page, including work done on the Content of the Page.

Word. The PDF Library holds a table of all words on a PDF page (and information about these words) to be used when extracting text from a page.  This table remains as long as a Word object refers to it.  Please be sure to dispose of Word objects when they are no longer needed, to avoid retaining unwanted references in memory.