• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

Capture 2.0 – Disrupting Legacy Capture Solutions with Machine Learning

You are here: Home / Content Capture / Capture 2.0 – Disrupting Legacy Capture Solutions with Machine Learning

December 12, 2019

TSG is predicting upcoming disruptions to content capture within the ECM industry. We have been working hard this quarter to improve metadata extraction capabilities within the OpenContent Management Suite with machine learning. For this post, we want to discuss and demonstrate the interface that controls capture templates as well as how users interact with the capture process and “teach” the system.

Moving from Capture 1.0 to 2.0

When looking at existing legacy Capture tools that have been around for a long time, there are two primary approaches that a Capture 1.0 tool will leverage to automatically capturing data in a document as it’s processed:

  • Target a Specific Location or Zone – using this approach, the administrator defines a zone on the document to denote where a piece of data resides.  For example, the tool could be told to look in a given box in the top right corner of the header to pull the “Report Number” value.  This approach only works well when the positional data is known and very consistent across all documents. This was common with many early image scanning and capture vendors.
  • Look for a Key/Value Pair – using this this approach, instead of defining the zonal position of the data, the tool is told to look for a given key, for example: “Invoice Number”, and then the tool will look at surrounding text to pull the value – for example, preferring text to the left or underneath the key.  This approach works well when the target data may be anywhere within the document, but runs into problems when the key text is inconsistent.  Using our invoice example, some vendors may display Invoice Number as Invoice Num, Invoice Nbr, Invoice #, etc.  Existing Capture tools have approaches for minimizing this problem, but it is still an issue for many clients.

To date, while the above approaches can be successful, clients have struggled when documents change over time. To use invoices as an example, if a new vendor sends in an invoice that has “Invoice Number” listed with an unexpected key, the system will not correctly pick up the value. When the user corrects the system in the indexing screen, the exact same issue will arise for the next invoice that comes in from this vendor until an administrator updates the template. While this may not sound like a big deal, some of our clients have invoices coming in from over 30,000 vendors. This can become a maintenance nightmare as these templates do not automatically improve over time.

And that’s exactly what Capture 2.0 tools will do – learn over time. When the user corrects the Invoice Number value, the tool should use that correction to get it right the next time.

Capture 2.0 Machine Learning

Previous Capture 2.0 posts on this blog have referred to the following diagram:

The post linked above has more detailed information for all of these steps, but in this post we are going to look step 1 as well as 5-7.

Create the Template

In the first step, we need to create a template to set a baseline of what we would like to capture. For example, for invoices we may say that we want to capture Invoice Number, Amount, Due Date, etc. based on the vendor “fingerprint”. Check out the following video to see how this is done.

Index Documents and Teach the System

Once we have a template in place, it’s now ready to use in the OCMS indexer. The following video shows an example of two vendors. One that has been seen many times in the past where the suggestion engine has already been trained, and a second vendor that is brand new.

As you can see, the ability of the OCMS Indexer to “learn” from the user’s interaction with the vendor invoice and improve over time without a template update by the administrator is the key to a Capture 2.0 system.

Let us know your thoughts below:

Filed Under: Content Capture, Machine Learning, OpenContent Management Suite

Reader Interactions

Trackbacks

  1. ECM 2.0 – Vision & Review of 2019 — Technology Services Group says:
    December 17, 2019 at 3:55 pm

    […] to augment people when it comes to indexing of certain content and have posted our thoughts on Capture 2.0.  After the capture of content, it does get tough to justify the replacement of people for all […]

    Reply
  2. OpenContent Management Suite – Fall 2019 3.3 Release — Technology Services Group says:
    December 19, 2019 at 9:35 am

    […] upcoming disruption in the ECM Capture space. During the 3.3 release, we’ve been working on improving the OCMS Capture modules by incorporating Machine Learning. Overall, the process looks like the […]

    Reply
  3. Print to Repository – OpenContent Print Driver Support — Technology Services Group says:
    January 13, 2020 at 3:44 pm

    […] Indexing – Clients have been asking for more and more machine learning and other capabilities in indexing.  Having more than one way to index documents creates […]

    Reply
  4. OCMS Integration with Slack — Technology Services Group says:
    February 13, 2020 at 3:35 pm

    […] file to the ECM repository.  Depending on the setup, this likely could integrate with our Capture 2.0 solution to properly index the document in the ECM […]

    Reply
  5. TSG joins the Alfresco Family — Technology Services Group says:
    March 22, 2020 at 6:00 am

    […] 2.0 – upcoming machine learning focused on next-generating capture of documents leveraging machine learning. […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • Alfresco – Do More with Capture 2.0
  • Capture 2.0 – Metadata Extraction with Machine Learning Upon Ingestion
  • Capture 2.0 – Document Classification with Machine Learning
  • ECM 2.0 – Vision & Review of 2019
  • Machine Learning & ECM -Smarter Policy Management with Kira & OCMS
  • Capture 2.0 – Visualizing Metadata Capture Location
  • Computer Generated Documents – What’s different about Capture 2.0 and Big Data?
  • Capture 2.0 – Improving Metadata Extraction with Machine Learning
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT