Optical Character Recognition (OCR) technology

OCR has proven to be an enormously useful tool for the business community, in academic research, and with government applications.

  • With OCR, it is not necessary to retype print documents to preserve their content.
  • You can also use OCR to create a list of keywords for searching digital documents, to make those documents much easier to find. If you have 20,000 invoices as PDF files, you can use OCR to build an index for each one. Then, you can search through these files quickly using customer names or account numbers to find the one you need. The text extracted from a PDF document using OCR can be saved with the PDF document itself or exported to a separate text file or Word document, or both.
  • In the business community and legal profession, using OCR has led to spectacular savings in the search and discovery process. A search for information hidden in thousands of pages of contracts, invoices, letters, and other files that once required weeks of effort for a team of lawyers and paralegals can now be completed in seconds.
  • Businesses and law firms are also saving a lot of money by scanning print documents, storing them as PDF files, and throwing the original print material away. Paying for storage is no longer needed, and using OCR technology makes this possible, because it is easy to find content in these digital documents using electronic search tools.
  • With OCR, print content can be made much more accessible. Paper documents can be converted into digital files that can be read out loud for the visually impaired.
  • OCR can also be used to secure content from print documents. Unlike rare or unique books, magazines, and manuscripts, digital documents can be easily kept safe from flood and fire because they can be copied quickly to multiple locations.
  • The print content becomes much easier to distribute as well. No longer does a scholar or researcher need to travel to the library that holds a specific book, journal or manuscript. When the printed content is scanned and then indexed using OCR, the person who wants to read it can find it online from anywhere in the world.