• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

Capture 2.0 – Metadata Extraction with Machine Learning Upon Ingestion

You are here: Home / Content Capture / Capture 2.0 – Metadata Extraction with Machine Learning Upon Ingestion

April 7, 2020

TSG is predicting future disruptions to content capture within the ECM industry. In the 4th quarter of 2019, we focused on improving the OpenContent Management Suite  by disrupting legacy capture solutions with machine learning. As we are predicting that even more customers will be moving to all digital documents given the pandemic, our Capture 2.0 efforts for 1st quarter 2020 have focused on integrating OpenMigrate into the ecosystem in order to extract metadata as the document is ingested into the system. Utilizing this approach, the user would then verify the index metadata in OCMS before finalizing the document in the repository.

Capture 2.0 Machine Learning

Previous Capture 2.0 posts have referred to the following diagram:

  1. Create and Train – Capture administrators will be able to create initial templates with extraction rules (ex: zonal, key/value pair, etc). These templates will be fed into the suggestion engine
  2. Bulk Ingestion – As documents enter the system, OpenMigrate can call the suggestion engine to classify documents and extract metadata.
  3. Store Completed Docs – After receiving the extracted data, if required fields are all filled with a high enough confidence level, the document is filed in the repository in the correct location.
  4. Queue Incomplete Docs – If all required fields cannot be completed with high enough confidence, the document is placed into the repository and queued for indexing in OCMS.
    • Note that in either case above, the document is always ingested to the repository.
  5. Extract Metadata – During OCMS indexing, the suggestion engine can be called to return metadata suggestions for documents that have not yet been processed through the suggestion engine. This can happen, for example, for documents that were queued for indexing by a process other than OpenMigrate.
  6. Finalize Document – the user works through the queue of documents to index, verifying the metadata suggestions extracted from the document and saving the final metadata values.
  7. Extraction Error Corrections – during the previous step, the indexing module of OCMS keeps track of any error corrections that were made. For example, if the user dismisses one of the original suggestions and selects a different value on the document, that correction is fed back into the suggestion engine so that the next time a similar document is processed, the same mistake is not repeated.

In our prior post, we’ve focused on step 1 (creating and training) as well as steps 5-7 (indexing and feedback loop). In this post, we are focusing on steps 2-4 by integrating OpenMigrate into the Capture 2.0 ecosystem.

Scenario Overview

Many customers accept invoices and other documents from vendors and 3rd parties via email or other electronic ingestion methods. Since OpenMigrate can easily monitor an email inbox and ingest attachments, it can reach out to the Capture 2.0 suggestion engine as part of the OpenMigrate process. After receiving metadata values from the suggestion engine, OpenMigrate then queues the document for review within OCMS.

The video below overviews this process and shows how the system can allow the user to simply verify the indexing information for the already learned invoices as well as teach the system where indexing information exists on new vendor invoices that the system has not yet learned.

Let us know your thoughts below:

Filed Under: Content Capture, Machine Learning, OpenContent Management Suite, OpenMigrate

Reader Interactions

Comments

  1. SCOTT H BUBLITZ says

    April 8, 2020 at 10:00 am

    Looks good, George! You guys have done a nice job on Capture 2.0. What is the pricing model? Thanks

    Reply

Trackbacks

  1. Alfresco - Do More with Capture 2.0 — Technology Services Group says:
    April 17, 2020 at 9:36 am

    […] Metadata Extraction with Machine Learning Upon Ingestion […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • Alfresco – Do More with Capture 2.0
  • Capture 2.0 – Disrupting Legacy Capture Solutions with Machine Learning
  • Computer Generated Documents – What’s different about Capture 2.0 and Big Data?
  • Capture 2.0 – Document Classification with Machine Learning
  • ECM 2.0 – Vision & Review of 2019
  • Machine Learning & ECM -Smarter Policy Management with Kira & OCMS
  • Capture 2.0 – Visualizing Metadata Capture Location
  • Capture 2.0 – Improving Metadata Extraction with Machine Learning
  • TECHNOLOGY SERVICES GROUP SUCCESSFULLY BENCHMARKS 11 BILLION DOCUMENT REPOSITORY WITH AMAZON WEB SERVICES – PRESS RELEASE
  • Amazon Textract for Full Text Search

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT