Font & Unicode

Adobe PDF Library .NET

Font Object from a Name

When creating a Font object from a name, if the found font is a Type0 font, an Identity encoding is used. Otherwise, it makes a font with the default encoding (MacRomanEncoding on Mac, WinAnsiEncoding elsewhere, or for Symbol fonts, their own custom encoding).  

When adding Text to a Page, the Library checks to see if the Text is representable in the font's encoding. If not, then it makes a Type0 font with an Identity-H encoding. For non-Type0 fonts, this may result in two versions of a font in the output document. 

The EmbedFonts() method of the Document class should be called before saving as a best practice. 

When saving, the Library enumerates all the fonts in the document and makes sure that any required Unicode and Width information is created to ensure proper text extraction and display. 

PDFString Objects

PDFString objects offer two properties for accessing their contents: 

These are stored within a PDF file encoded either in UTF-16BE or in PDFDocEncoding (an 8-bit encoding scheme that can encode all of ISO Latin-1).  

They can be created with an input string in either encoding. Most PDFString constructors will attempt to convert the data string to PDFDocEncoding if it can be converted without a loss of information; otherwise they will save it in UTF-16BE. If you want the string to be encoded in UTF-16BE even if it could be converted to PDFDocEncoding, use a PDFString constructor that takes the storedAsUTF16 argument, with that flag set to true

PDFString Objects Properties

  1. The Value property will convert the string from its internal encoding to a Unicode string object appropriate to the platform (encoded as UTF-16). This is useful for cases where the object represents human-readable text. 
  1. The Bytes property returns the contents of the PDFString object as they are stored in the file, without respect to an encoding. This is useful for cases where the object contains binary data.