• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

Capture 2.0 – Document Classification with Machine Learning

You are here: Home / Alfresco / Capture 2.0 – Document Classification with Machine Learning

August 3, 2020

Recently we have added on to the machine learning power of Capture 2.0 with the development of the Document Classification Engine. This Capture component allows for unstructured data entering Alfresco from a variety of sources to be automatically categorized according to our clients’ object models. This post will explain how we leverage machine learning in our Classification Engine to automatically collect the data necessary for 21 CFR compliant batch records.

Document Classification Supports SuggestR Intelligent Indexing

We have previously demonstrated a machine learning approach to extracting metadata from AP invoices using our Capture 2.0. For that example, the invoices were entering Alfresco already classified by vendor based on the email address they were sent from. Capture 2.0 used the classified vendor primary key to look up the locational data needed to extract metadata from that vendor’s invoice type.

More challenging is a batch records scenario, in which different types of relevant batch documents are received from different sources, requiring manual intervention to determine the object type (primary key) before Capture can index the metadata.

Using the Capture 2.0 Classification engine based on SuggestR, data from scenarios that don’t have an easy way to distinguish documents can be automatically categorized allowing the user to focus on validation and efficienty process incoming documents.

Leveraging Naïve Bayes for Machine Learning

Similar to SuggestR, the Classification Engine is built to learn and scale in a real-time production system. The engine uses the Naïve Bayes probabilistic classification technique, in which a document is represented as a “bag of words” (no location data) and the classifying features of the document are the frequencies with which each word appears. The Naïve Bayes assumes the probability that an incoming document is of a particular type, t,can be determined by evaluating the similarity in word counts of previously classified t documents.

Leveraging a Naïve Bayes classifier offers us the following benefits:

  • Learning based on incoming data – The word frequency features of a classified document, once validated by the user, join the dataset and are evaluated in subsequent classifications.
  • Scaling – The Naïve Bayes classifier scales based on the number of categories, not the number of processed documents, so it is highly efficient, even in a large document repository.
  • Minimal training required – A very small training set is needed for the algorithm to begin meaningful evaluation of features.

Here is a demo of this Document Classification Engine for Batch Records:

Keep an eye on our blog for the next steps in our on-going capture 2.0 work:

  • Multi-strategy capabilities (run strategies in addition to Naive Bayes to improve confidence)
  • Additional engines, such as image classification
  • Leveraging classification to reduce setup overhead large-scale migrations (I.E. filestore to Alfresco)

Filed Under: Alfresco, Content Capture, Demo, ECM Solutions, Machine Learning, OpenCapture

Reader Interactions

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • Alfresco – Do More with Capture 2.0
  • Alfresco – Do More With Redactions and Personally Identifiable Information
  • Capture 2.0 – Metadata Extraction with Machine Learning Upon Ingestion
  • Alfresco – Do More with Compliance & Regulated Industries
  • Documentum – Do More with OpenContent and OpenAnnotate
  • Alfresco – Increasing Efficiency of Insurance Case Management
  • Elastic Services for ECM – TSG OpenContent Roadmap
  • Print to Repository – OpenContent Print Driver Support
  • ECM 2.0 – Vision & Review of 2019
  • Capture 2.0 – Disrupting Legacy Capture Solutions with Machine Learning

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT