Convert unstructured text into searchable data in an instant.

Intelligent Data Extraction (OCR)

In today’s world, where instant data access, business intelligence, security, and efficiency are critical to success, many companies are realizing that valuable data is trapped in their documents. These documents could be paper, email or standard electronic office documents. The data contained in these documents must be manually read, tracked, routed, processed and reported upon. In fact, more than 80% of information is trapped in unstructured content. This means that only 20% of data is structured and can be easily searched and retrieved from relational databases.


Document capture technology is not new, but the industry has advanced with innovative tools and functionality that allows businesses to do much more than simply scan documents. Now, technology automation enables businesses to classify, learn and extract meaning from their documents. Through automation, we can leverage and organize all data, both structured and unstructured.


The bottom line is that smart document capture technology is important not only for gaining efficiencies and reducing operating costs but through classification and data extraction, it can lead to better business processes.

Smart Capture Workflow Overview:


There are 6 main steps of the workflow, although only two require human touchpoints:


1. Ingestion:


There are multiple ways to capture data: scanners, multi-function peripherals (MFPs), UNC folders (network folders), fax, email, content services or document repositories, mobile devises or through an outsourced business process organization (BPO).


2. Image Processing:


Documents and images are normalized, cleaned up and rotated in preparation for classification. The system applies despeckle and deskew filters to improve image quality. The resulting document can then be identified, and the data can be easily extracted.


3. Classification:


This is where the system determines what type of document it ingested through Optical Character Recognition (OCR), Intelligent Character Recognition (ICR) and/or Optical Mark Recognition (OMR). This step will determine if a document is, for example, an invoice, patient record, loan file, or tax record. An advanced document capture system only needs one or two samples, so it can “learn” to classify the documents; Shamrock Solutions accomplishes this via patented, supervised machine learning algorithms. The system uses a variety of technologies to classify the data: search content, images, bar codes and one document merging. If the system has low confidence in any document it attempts to classify, the processes can call upon a human operator for confirmation.


4. Extraction:


This is the process of identifying metadata within the documents. Metadata is a set of data that describes and gives information about other data. In the case of documents, metadata can be used to organize, find and/or feed documents into another type of business system. The system is set up to extract the data based on business rules and information that a company needs through database lookups and fuzzy logic.


5. Validation: 


If there are any documents that fall below pre-set tolerance levels, they are highlighted for human review. For example, this can happen when there are smudges, spills, blurry characters or possibly missing fields. The system alerts you to these documents for manual verification and correction.


6. Export & Deliver:


Once all documents have been validated, the documents and data are moved to a repository or other line of business system. The exported documents and data can be stored on a local server or cloud-based storage, like Alfresco, Box or SAP.



If your company faces any of the following hurdles due to heavy document volumes requiring classification, contact us


  • Slow, manual processes
  • High costs of ink and paper for printing
  • Need for physical storage space for paper files
  • Time-consuming efforts to manually sort and file documents
  • Manually apply barcodes, document preparation and other labor-intensive processes
  • Ability to scan files but can’t search for specific data in documents
  • Invoice data that can’t be easily matched to purchase orders or receiving documents
  • Difficulty predicting trends or analyzing customer data
  • Prone to human errors



Download our guides to learn more:



Download our Intelligent Data Extraction (OCR) Solution Overview

Download our Shamrock Smart Capture Ebook

Download our Mail Room Automation Solution Guide

Download our Multitenant Architecture Guide

Download our Accounting Solution Guide

Our team creates automated digital processes that go beyond the status quo to redefine what’s possible for the workability of your business.