TSG just finished our quarterly manager meeting where one of the focal points includes updating our product roadmaps. With some great brainstorming amongst our team, as well as some coaching from Alan Pelz-Sharpe, some of our more interesting work this year will focus on learning systems and automation. This post will present our thoughts and rough roadmap for our products.
Learning Systems for ECM – Why Now?
Given that Artificial Intelligence (AI) has been around for quite some time, one obvious question should be “why is AI getting so much attention now?” Whether it is self-driving cars, Siri, or Alexa, machines are getting smarter and are able to do more. Both open source efforts and cloud vendors like Amazon are able to deliver capabilities for learning systems at commodity prices. While clear winning technologies and approaches haven’t emerged, we always strive to make our software and services focused on the future of the industry and our clients’ needs. As a vendor heavily committed to open source, as well as an Amazon partner, we will probably start our efforts there with some coaching from a variety of advisors.
Learning Systems for ECM – What to do?
When introducing any new technology to our clients, we typically try to remove the hype and focus on goals with tangible results. Rather than a “start from scratch” approach, our best results come from taking things we are currently successful with and adjusting them for the new technology. Some of the scenarios that could easily be added to our existing products and clients include:
- AI extracting information from a document – so many of our clients are transactional, the need to summarize a document is somewhat limited given very similar content. That being said, extracting the key differentiators and data between documents is very relevant.
- Robotic Process Automation (RPA) and Machine Learning (ML) to automate mundane document tasks – many of the heavy user tasks surrounding the document processing can be automated with ML and RPA.
- Analytic Support by providing indexes and insights into the documents contained in the repository.
Given the focus on the above, the remainder of this post will discuss our goals and thoughts moving forward.
Intelligent Redaction for ECM
TSG has been doing redaction of documents for our clients with OpenRedact, part of OpenAnnotate, for years. Our approach with redaction is to currently offer both a redacted copy for distribution as well as keeping the original in place for backup and evidence rules. Key requirements are being driven from Personally identifiable information (PII), GDPR and HIPAA compliance. Leveraging learning systems, TSG roadmap goals include:
- Automated Pattern Redaction – Based on certain patterns (social security number, phone number), OpenRedact could redact or make suggestions to redact. In a small manner we already provide this capability but think it could be significantly enhanced with AI.
- Specific Field Redaction – Many of our unique scenarios involve knowing the data that needs to be redacted. In an Insurance Claims Scenario, we might know the injured parties name, phone number, address and other relevant fields. Assisted by AI, we could find and redact these data points from existing documents and new documents that enter the claim.
- Intelligent Suggestion Redaction – based on other documents that have been redacted, the system would suggest redactions to new documents. Corrections from user review would be fed back into the learning system to improve the suggestions for the next documents.
- Intelligent Automated Redaction – with a high enough success rate of better than human review, redaction could move from suggested to automatic. Any exceptions found during processing could be corrected and fed back into the learning engine.
Intelligent Data Extraction and Tagging for ECM
Similar to the redaction process described above, the learning engine can be trained based on a sample set to both extract and tag documents as part of Robotic Process Automation. One key success factor for any learning engine is the ability to have a large sample set, something our ECM customers possess. Like Redaction, TSG already has an indexing application that could be enhanced with AI. Intelligent Data Extraction and Tagging Scenarios include:
- Identifying what Case/Folder a document should be placed into based on data contained within the document or container.
- Identifying key classifications/document types based on values contained in the document
- Identifying key meta-data fields based on values contained in the document
While most of our clients are more “heads down” and doing transactional document management, we are seeing an increase in other areas or groups asking for access to their systems to support Analytic requirements. Just like redaction and indexing, TSG has already been implementing many of the underlying requirements for Analytics with our Solr as well as Elastic Search add-ons for ECM. Scenarios in the roadmap include:
- OpenContent Solr Services – As we presented two weeks ago, by leveraging our skills with OpenMigrate and Solr, we are targeting to provide a variety of Solr indexes from the ECM repository as well as other sources. These indexes could be leverage by Analytics to learn key insights into the documents contained in the repository without necessarily getting access to the repository itself. Add in redaction before the index is created to provide event better PII compliance.
- OpenContent Elastic Search Services – Similar to Solr, we want to leverage our capabilities with the ELK stack to provide additional indexes.
It is an interesting time with learning systems and we are excited to see how much and how practical the solutions our for our ECM customers. Look for posts in the future as we start to test out scenarios with sample data over the next couple of months.
Let us know your thoughts (or any other areas that merit consideration) below: