PDF CHECKER

Description of JSON profile parameters

 

general: unable-to-open This is not a valid PDF document, or it has been corrupted to the point that it cannot be displayed in a browser or viewing tool.
general: password-protected The file cannot be opened without a password. To open the file, provide the password in a command line argument.
general: contains-owner-password When you create a PDF document you can restrict the ability of others to work with that document, and add a PDF Owner Password to the file to secure those settings. Other users will be able to open and read your PDF document, but without this password they can’t change your restrictions. With the Owner Password you can allow others to open and read your document but stop them from printing the file, copying content, adding changes or comments, adding or extracting pages or graphics, signing the document, or making other changes.
general: xfa-type XFA, or XML Forms Architecture, is a set of proprietary XML specifications for use with web forms. XFA forms are saved internally in PDF files, and the standard is owned by Adobe Systems. The original PDF forms technology, Acrobat Forms or Acroforms, was created by Adobe Systems in 1998. Most PDF documents use Acroforms rather than XFA, because Acroforms are compatible with a much wider range of software applications, as well as with Acrobat itself.
general: pdf-v2 The PDF format was published as an open file format by the International Organization for Standardization (ISO) in 2008. Version 2.0 of PDF was released in July of 2017. PDF CHECKER can identify if a PDF document was created recently, as a PDF 2.0 document.
general: contains-signature A PDF document can contain one or more digital signatures, and these signatures can be verified by a vendor known as a Certifying Authority. If a PDF document has a certified digital signature it can be used as a legal document. It is also possible to lock a signed PDF document against any further changes.
general: claims-pdfa-conformance A PDF/A, or PDF Archive document, is a type of PDF file that is designed to be stored so that it can be accessed for many years to come. PDF/A documents must be able to be opened and read using viewing tools available in the future, so they are designed to be self-contained. For example, all of the fonts used in a PDF/A document must be embedded within the PDF document itself for the file to be considered PDF/A compliant. PDF CHECKER can identify if a PDF document is considered PDF/A compliant. Note that the software does not verify that the PDF is compliant, but rather, that is has been saved as such. It is possible for a PDF document to be labeled as compliant when in fact the file has been later altered, making the PDF/A compliance no longer valid.
cleanup: suboptimal-compression A data stream in a PDF document contains text, an image, or an object, with instructions on how the content will be rendered on the page. These data streams can be compressed in a document to make the PDF smaller and more portable. This check looks for data streams in the input document that are not compressed, or that are using a simple algorithm that is not as efficient in compression, such as ASCII, or Run Length, or LZW.

ASCII characters are encoded as 8 bits each, but strings of ASCII characters can be compressed to require fewer bytes for transmission or storage.

Run Length Encoding (RLE) is a simple method for compressing values that appear in the form of runs of data. A data run features a sequence of characters or binary digits where the same value appears many times, and often in long strings. A long string, or run, of the same character can be replaced with a shorthand description of those characters, thus saving storage space and making the resulting PDF document smaller.

Think of an image on a white background with a black square in the middle. Instead of representing a row in the image with 600 white pixels followed by 200 black pixels and then 600 more white pixels, this row of 1400 binary digits could be represented by the statement “600W200B600W” instead. As a result, 1400 characters are replaced with 12.

LZW, or Lempel-Ziv-Welch, is a universal data compression algorithm, once widely used with Unix platforms. This method appears in some old PDF documents but is rarely used now.

fonts: uses-fonts-not-embedded

fonts: uses-base14fonts-not-embedded

fonts: uses-fonts-fully-embedded

It is a best practice when working with PDF documents to embed, or save, every font used in a PDF document in the document itself. That way, a viewing tool (like Acrobat) does not have to look for a font stored on the local system or choose a substitute font. Use this setting to find out if a PDF is using fonts that are not embedded. Removing an embedded font file from a PDF document can make the document smaller, but this practice can also make the PDF load more slowly. It might also change the appearance of the file if the viewing software cannot find a font it needs on the local machine nor a suitable substitute.

PDF CHECKER looks in the /FontDescriptor directory, or the font dictionary, within the PDF document, to identify the fonts that are in use in that document. Then, it looks to see if those same font files are embedded in that document, or if the viewer will need to access a font from the host machine.

Fonts that are not embedded in the PDF document, either Base 14 fonts or otherwise, and fonts that are embedded, can be listed in the results.

fonts: fontdescriptor-missing-fields

fonts: fontdescriptor-missing-capheight

A font descriptor describes the characteristics of a font, as opposed to the widths and characteristics of individual glyphs (characters) within that font set. Font descriptor values include the name of the font, the angle in degrees used for creating italic characters, the maximum height above the baseline that a glyph can reach in this font (ascent) and the maximum depth below that it can reach (descent), and a variety of other values.

If one of the required font descriptor settings is not included in the font descriptor dictionary for a PDF document, PDF CHECKER can determine that and list the missing values. If a font descriptor value is not provided the PDF document may not work properly in some applications.

The CapHeight is the coordinate showing the placement of the tops of flat capital letters, such as T or R, as measured from the baseline. It is required for all fonts that have Latin characters except for Type 3 fonts (that use PostScript). But as it is hard to determine if a given font has Latin characters, it is possible with a standard font descriptor search for the CapHeight value to be overlooked. To avoid that, PDF CHECKER searches for CapHeight separately.

objects: contains-javascript-actions This PDF document contains blocks of JavaScript code that actions that may alter appearance of the document. Common JavaScript actions in PDF documents include submitting a form (a Submit button), accessing a web site from a web address, or sending the document to a printer.
objects: contains-thumbnails Thumbnail images are used to preview pages in a PDF document and appear in a panel on the left side of the viewer window. A user could scroll through a series of thumbnails to find a page he or she is looking for.
userdata: contains-annots Annotations are changes you can add to a PDF document, such as notes, highlighted text, file attachments, crossed out text, and text callout boxes.
userdata: contains-annots-not-for-viewing Annotations can be added to a PDF document and hidden, so that they do not appear when the document is opened in a viewing tool.
userdata: contains-annots-not-for-printing Annotations can be added to a PDF document so that they appear on the page in a viewing tool but are not included when the document is sent to a printer.
userdata: contains-annots-without-normal-appearances Every annotation included in a PDF document features an optional entry that describes what the annotation will look like when the document is rendered in a viewer. Generally, this value is not provided, so if the PDF is opened in Adobe Acrobat, Acrobat will fill in the appearance, based on what the value should be. When you open a PDF document in Adobe Acrobat or Adobe Reader and this viewer fills in the appearance value, you will be prompted to save the file when you close it.  If you do save the file, the annotation appearance is made a part of the updated PDF document.  This is the normal annotation appearance.
userdata: contains-optional-content A PDF document can contain features such as attached files, hyperlinks to web pages, and data in form fields.
userdata: contains-transparency It is possible to stack objects, such as graphics, images, text boxes, and form fields, on top of each other on a PDF document. These objects can be partially or fully transparent, and thus can interact in various ways with objects behind them. If a set of transparencies are stacked in a PDF file, each one contributes to the final result that appears on the page, such as the colors blending together into a final color that appears.
userdata: contains-private-data Some applications, like Adobe Illustrator, add their own unique values to a PDF document when generating that document. These values are useful to the original software product if the PDF is opened and edited in that product again.
userdata: contains-metadata Most PDF documents store information that describes that document, such as the author, creation date, and the software used to generate the file.
userdata: contains-embedded-files PDF documents can hold other files that are embedded or attached in that document, including other PDF documents, email messages, spreadsheets, graphics files, and the like.
images:color resolution-too-low Number of low-resolution color images present in the document. PDF CHECKER determines the resolution of each image in the PDF document, and any color image with a resolution below this Trigger DPI value is counted as a low-resolution image. This Trigger value parameter defaults to 150 DPI for color images.
resolution-too-high Number of high-resolution color images present in the document. Any color image PDF CHECKER finds with a resolution greater than this Trigger DPI value is counted as a high-resolution image. The Trigger value parameter defaults to 600 DPI for color images.
uses-jpeg2000-compression Number of color images in the document using JPEG2000 compression. JPEG compression is a compression format used for rendering photographs as image files. It is also known as DCT, Discrete Cosine Transform.
image-depth PDF Checker lists the number of 16-bit color images found in the PDF document. Image depth refers to the number of bits needed to store color for each pixel in a graphic. Color graphics are often 8-bit, but higher quality images are 16-bit or more. A color graphic with 16-bit image depth will usually render better on a screen or when printing, but the image is also a lot larger, making the PDF document a lot larger as well.
images:grayscale resolution-too-low Number of low-resolution grayscale images present in the document. PDF CHECKER determines the resolution of each image in the PDF document, and any grayscale image with a resolution below this Trigger DPI value is counted as a low-resolution image. This Trigger value parameter defaults to 150 DPI for grayscale images.
resolution-too-high Number of high-resolution grayscale images present in the document. Any grayscale image PDF CHECKER finds with a resolution greater than this Trigger DPI value is counted as a high-resolution image. This Trigger value parameter defaults to 600 DPI for grayscale images.
uses-jpeg2000-compression Number of grayscale images in the document using JPEG2000 compression. JPEG compression is a compression format used for rendering photographs as image files. It is also known as DCT, Discrete Cosine Transform.
images:monochrome resolution-too-low Number of low-resolution monochrome images present in the document. PDF CHECKER determines the resolution of each image in the PDF document, and any monochrome image with a resolution below this Trigger DPI value is counted as a low-resolution image. This Trigger value parameter defaults to 200 DPI for monochrome images.
resolution-too-high Number of high-resolution monochrome images present in the document. Any monochrome image PDF CHECKER finds with a resolution greater than this Trigger DPI value is counted as a high-resolution image. This Trigger value parameter defaults to 1200 DPI for monochrome images.
uses-jbig2-compression Number of monochrome images in the document using JBIG2 compression. JBIG2 is a compression algorithm designed for binary images, or images where each pixel can only have one of two possible colors. For PDF CHECKER JBIG2 is used for black and white images.
images: alternate-images A PDF document can be set up to specify alternate images, or multiple versions of one image within the same document. These images can be used to meet different needs. For example, a PDF could present one image with a lower resolution for display on a monitor, and an alternate image with a higher resolution to use when the PDF document is sent to a printer. These images can make a PDF document very large. Today alternate images are rarely used.