• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

Content Capture 2.0 – The Upcoming Disruptions – Thoughts and Analysis

You are here: Home / Content Capture / Content Capture 2.0 – The Upcoming Disruptions – Thoughts and Analysis

July 24, 2019

Similar to the disruption happening within the ECM marketplace, most analysts (and TSG) are predicting that the Capture component of ECM is ripe for disruption as well.  The capture marketplace is very similar to the ECM marketplace with legacy products that haven’t evolved as new requirements and technologies are introduced.  One of the strategic initiatives TSG is pursuing this quarter is to continue to update our capture capabilities for our own offerings.  This post will be the first in a series of posts on our products and how they are evolving to meet the requirements of modern systems today.

Capture 1.0 – Born in the mailroom scanning paper

Most current capture vendors began based on efforts by customers around the electronic capture of paper.  Like the ECM market, the capture marketplace grew up in the 80’s and 90’s as digital capture/scanning of paper documents.  In company initiatives to go paperless, capture solutions evolved around scanning of documents and capturing values for meta-data based on automated OCR processing and manual indexing.  Many of the different capture products still have this basic focus with additional logic for handwriting capture and other automation capabilities.

Focused on the automation of paper capture in the mailroom, capture 1.0 vendors focus on the below processing:

  • Scanning and Recognition – Components that are unique for mailroom scanning include scanning batches of documents, separator pages, bar-code reading, Optical Character Recognition as well as hand writing recognition.
  • Indexing – Screens for indexing of documents.  Based for paper, results are around recognition of characters and fields and leveraging confidence levels for keying of data.
  • Bulk Ingestion into ECM and Data platforms – Growing up as a separate infrastructure, vendors would have specific adapters for where the content would flow to after it was indexed.  For example, Captiva has adapters for Documentum and Application Extender along with many others before being acquired by Documentum.

Capture 2.0 solutions need to do all of the above and more.  Like the ECM legacy vendors, Capture 2.0 solutions will embrace the affordability and accessibility of limitless computing power with technology like Machine Learning/Artificial Intelligence as well as cloud capabilities.

Rather than patching additional capabilities onto scanning solutions, new vendors will emerge that are built from the ground up with a disruptive technology and pricing model.  Like the ECM disruption currently occurring, we would predict Capture 2.0 solutions won’t immediately replace Capture 1.0 solution but will nibble away at the documents that used to flow through paper and the mailroom.

Capture 2.0 – Making intelligent capture smarter with machine learning

Capture 1.0 vendors have branded Intelligent capture to describe their current method of extracting content from documents.  These capture tools generally rely on two approaches to data capture:

  • Location Template Approach – a template defines where data is located in a given document.  A zone is given to denote where a piece of data resides.  For example, the tool could be told to look in a given box in the top right corner of the header to pull the “Report Number” value.  This approach only works well when the positional data is known and very consistent across all documents.  Templates need to be created for every type of captured document.
  • Key/Value pair Template Approach – A second approach is to provide a Key/Value pair template.  In this approach, instead of defining the zonal position of the data, the tool is told to look for a given key, for example: “Invoice Number”, and then the tool will look at surrounding text to pull the value – for example, preferring text to the left or underneath the key.  This approach works well when the target data may be anywhere within the document, but runs into problems when the Key text is inconsistent.  Using our invoice example, some vendors may display Invoice Number as Invoice Num, Invoice Nbr, Invoice #, etc.  Existing Capture tools have approaches for minimizing this problem, but it is still an issue for many clients.

Both approaches are typically augmented with additional processing to look up and verify sources against other systems (example PO number, account number….).  This processing can include both configuration and customization depending on requirements.

Capture 2.0 will combine the above approaches while adding Machine Learning to address handling incorrect data extraction that is corrected by the user during indexing.  For Capture 1.0 tools, an error that is manually corrected on one document will continue to be an error on the next, similar document unless the algorithm or template is changed.  Capture 2.0 approaches will look to provide the infrastructure to gradually reduce the indexing effort for subsequent documents.

In typical capture scenarios, templates created on location or Key/Value pairs are needed for a number of reasons.  Templates can be used to classify documents into certain types (ex: invoice vs. purchase order vs. billing report).   The key for templates in the Capture 2.0 future will be in machine learning and evolving the extraction and identification on the fly.  If a document matches a given template, but incorrect data is extracted from the document, the user’s act of correcting the mistake will feed into machine learning algorithms to improve metadata extraction accuracy for subsequent documents.  Current capture tools require a manual administrative update to the template or an entirely new template.  In reality, this means that templates aren’t updated for most corrected extraction mistakes leading to user frustration.

Focusing on modern technologies, Capture 2.0 tools will do more than just intelligently extract content, Capture 2.0 will take into account machine learning to allow the indexing components to learn over time to achieve better results.  See our previous post on how TSG’s indexing can recognize and improve based on user input on real world data.  Look for posts in the future on how this approach is evolving to address multiple indexing scenarios.

Capture 2.0 – Addressing digitally born content

Capture 2.0 has to address born digital content from external and internal sources.  While some scanning and OCR will always exist for certain scenarios, more and more of the content being captured will be created digitally and data can be easily captured based on the values in the content itself.  Rather than rely on a mail room function, born digital content will arrive at a company from many different and distributed sources.  Capture 2.0 solutions need to provide for both batch digital ingestion from things like internal computer output as well as external vendors via EDI or email.  Both large batch jobs as well as individual content needs to be able to be easily and consistently ingested with automated indexing as appropriate.

As it relates to the TSG product roadmap, look for upcoming posts on how our tools can capture both scanned images as well as a variety of born digital content including forms, computer output and externally generated source content.

Capture 2.0 – Repository Based Capture in bulk or individually

All of the current capture tools were created as standalone or point solutions where the processing of a batch of scanned documents were initially stored with the scanning solution and later exported to the final repository location.  In this manner, capture vendors could more easily support multiple repositories with small integrations rather than have to rely on large integration efforts to all of the different repositories.

TSG solutions have always been repository based as they have evolved out of adding additional capabilities to our repository indexing tools.  Benefits of this approach include:

  • Indexing – Having one indexing process for bulk or individual document ingestion.  Logic for indexing captured content can be the same as general import of documents rather than requiring indexing logic in multiple places.
  • Infrastructure – Less infrastructure to procure and maintain.
  • Speed – Documents are immediately available and available to be processed in the repository rather than waiting for a batch process to run.
  • Business Process rather than point solution – By ingraining the solution in the repository, the business process, including workflow and ECM capabilities, can move from the point solution Capture 1.0 to the full business process of Capture 2.0.

Look for posts in the future on how the repository-based capture provides benefits to typical point solution standalone capture solutions.

Capture 2.0 – Cloud Friendly Solutions

As IT departments evolve within organizations, we are seeing more and more clients move away from on premise data centers to Infrastructure as a Service (IaaS) providers in the cloud such as Amazon AWS and Microsoft Azure.  Legacy Capture solutions were primarily on-premise installations that fed into one or more on-premise document repositories.  Capture 2.0 solutions can utilize a cloud first architecture that negates the need for an on premise installation.  TSG’s repository-based capture approach outlined above allows for our tools to be easily deployed in both IaaS as well as on premise.

Cloud based tools have a couple of major benefits over on-premise architectures:

  • Scaling – typical ingestion processes still occur in bulk requiring large infrastructures for processing the batch components with those infrastructures sitting idle most of the time.  Cloud based pricing models can be more flexible for addressing both surge and idle requirements.
  • Cloud options – Cloud processes also provide different business models where external parties index documents as part of an extranet.  This is one piece that typical Capture 1.0 vendors struggle to understand and price.

Another benefit to cloud-based capture is that related cloud services can be easily integrated.  Using AWS services as examples:

  • Textract – Provides for a modern, cloud-based OCR and form recognition.  See our post on how Textract compares to OpenText on-premise solutions and how we are adding Textract capabilities to our current products.
  • Rekognition – for image and video analysis.
  • Comprehend – to analyze text for items like key phrases, sentiment analysis (positive or negative), topics, people and more.

Look for posts in the future on how cloud-based approaches can provide additional capabilities over typical on-premise solutions.

Capture 2.0 – Mining data and documents with Big Data

Looking beyond the machine learning aspect of Capture 2.0, organizations should also look to big data tools to allow for data mining and analysis of both document content as well as the capture process itself.  During both the automated and manual indexing process, data can be fed into a Hadoop or DynamoDB instance or other big data solutions.  This data can then be mined and analyzed to provide insights such as:

  • What fields are users correcting most often?
  • Which fields have the highest automated extraction success rate?
  • Are there areas of the capture process that are inefficient for the organization?
  • If so, are there tweaks to the process or ways to manipulate the data to improve efficiency?

Look for additional posts in the future on now Capture2.0 is feeding Big Data.  Also, see our related whitepaper on A Big Data Approach to ECM.

Summary

Capture 2.0 represents the evolution (and disruption) of Capture 1.0, a point solution associated with capturing and digitizing paper in a typical mailroom function.  As Capture 2.0 solution evolve, they will disruption Capture 1.0 vendors by:

  • Better addressing both scanned content as well as the bulk of digitally born content.
  • Leverage machine learning over typical template approaches to improve capabilities over time.
  • Leverage the repository to move from a point solution to more of a full business process.
  • Cloud Native or friendly solutions to move from on-premise only solutions.
  • Better address data mining and other big data requirements.

TSG is excited to explore these areas with our clients and incorporate Capture 2.0 features into our OpenContent Management Suite.

“Capture 2.0 in leveraging the power of AI and Machine Learning will require a complete rethink, often a rebuild from the ground up,” said Alan Pelz-Sharpe long time ECM analyst at Deep Analysis. “TSG’s new roadmap for Capture 2.0 appears to be building from the ground up focused on AI & Machine Learning and takes in the lessons learned from decades of deploying Capture solutions.”

Are there other areas you believe will be addressed in the next generation of capture tools? Let us know your thoughts below.

Filed Under: Content Capture, ECM Landscape, ECM Solutions

Reader Interactions

Trackbacks

  1. Capture 2.0 – Improving Metadata Extraction with Machine Learning says:
    July 31, 2019 at 8:00 am

    […] of the future is the inclusion of machine learning to the capture process. As discussed in our Capture 2.0 introductory post, the majority of legacy tools do not improve over time. This post will take a deeper dive into how […]

    Reply
  2. Computer Generated Documents – What’s different about Capture 2.0 and Big Data? says:
    August 1, 2019 at 8:00 am

    […] part of our series on Capture 2.0, this quarter TSG is focused on improving our ability to capture documents that are “borne […]

    Reply
  3. Claim Document Efficiency – How to improve customer experience and satisfaction. says:
    September 17, 2019 at 10:54 am

    […] New tools coming available to leverage machine learning.  Check out our Capture 2.0 roadmap. […]

    Reply
  4. Gartner Content Services Platform (CPS) Magic Quadrant 2019 – Where is the Vision? says:
    November 5, 2019 at 3:31 pm

    […] state CPS will leverage machine learning to increase productivity.  We have started to look at Capture 2.0 focus on how machine learning can improve the capture process.  Visionary clients will start […]

    Reply
  5. Capture 2.0 – Disrupting Legacy Capture Solutions with Machine Learning — Technology Services Group says:
    December 12, 2019 at 7:00 am

    […] is predicting upcoming disruptions to content capture within the ECM industry. We have been working hard this quarter to improve metadata extraction […]

    Reply
  6. OpenContent Management Suite – Fall 2019 3.3 Release — Technology Services Group says:
    December 19, 2019 at 9:29 am

    […] our long time readers will know, TSG has been predicting an upcoming disruption in the ECM Capture space. During the 3.3 release, we’ve been working on improving the OCMS Capture modules by […]

    Reply
  7. Capture 2.0 - Metadata Extraction with Machine Learning Upon Ingestion — Technology Services Group says:
    April 7, 2020 at 6:56 pm

    […] is predicting future disruptions to content capture within the ECM industry. In the 4th quarter of 2019, we focused on improving […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • Capture 2.0 – Document Classification with Machine Learning
  • Alfresco Ranked a Leader in Independent Content Management Market Analysis
  • Alfresco – Do More with Capture 2.0
  • ECM 2.0 – Vision & Review of 2019
  • AWS with DynamoDB for Content Management – Reference Architecture & Cost Estimate
  • Google Cloud with BigTable for Content Management – Reference Architecture & Cost Estimate
  • Reference Architecture for Content Management on Azure HDInsight with HBase
  • DynamoDB 11 Billion Benchmark 11 Thousand Concurrent Users Success!!! – Lessons Learned
  • Alfresco – Application Performance Monitoring
  • Technology Services Group Releases new DynamoDB Document Management Capabilities for AWS

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT