Redaction for AWS, Alfresco, Documentum and Hadoop – Bulk Redaction upon Ingestion or Migration

As we presented in our Redaction Roadmap earlier this year, one of our product development additions to OpenMigrate this quarter is the ability to bulk redact incoming documents as part of an ingestion or migration into Alfresco, Documentum, AWS, or Hadoop. As detailed earlier in our Redacting Roadmap, both OpenMigrate and the OpenContent Management Suite will have capabilities surrounding the redaction of specific values. This post will focus on demonstrating how OpenMigrate can be used to redact content, particularly focused on a case management scenario during ingestion or migration.

OpenMigrate Redaction Capabilities

OpenMigrate is one of the most successful enterprise migration tools for Documentum and Alfresco. OpenMigrate uses a high-throughput, multi-threaded configurable approach to migrate content to, from, or within a variety of repositories (e.g. FileNet, CMOD, OpenText, and others) as well as for specific cloud vendors like Azure and Amazon Web Services S3. With the new capabilities added to OpenMigrate, the following redaction scenario is supported:

Document is extracted from either an ECM (Alfresco, Documentum, Filenet, etc.) or a file system
If the document is PDF Text or PDF Image with Text, the redaction processing can occur immediately
If the document is not PDF Text or PDF Image with Text, (e.g. TIF, PDF Image, or Microsoft Word) a text searchable PDF is created leveraging Adlib, Nuance, or another vendor-specific transformation tools
The text-searchable PDF is analyzed and redacted for any configured patterns requiring redaction. This could include credit card numbers, social security numbers, or phone numbers
The text-searchable PDF document is analyzed and redacted for specific components configured to be redacted for that particular document based on specific metadata defined for the document. This could include case file names, addresses, or other metadata associated with the PDF document
The redacted document is stored in the repository either as a redacted copy or as the primary document. The original document can also be stored in the repository to support evidence rules as required

For our demonstration scenario, we will migrate documents for a medical case folder. In this scenario, we are automatically redacting social security numbers based on a pattern, and we’re redacting other personally identifiable information (PII) for the patient based on metadata that’s defined for the document, such as the patient’s name. As part of the ingestion process, OpenMigrate is importing the medical case files from an Excel file that contains the documents’ metadata, including the patient name and patient ID. To support privacy rules, the patient ID will be the only property stored in the target ECM repository, and the patient name and other PII will be automatically redacted from the documents upon ingestion with OpenMigrate.

Summary

Redacting documents as part of an ongoing ingestion or migration is a common request. OpenMigrate now has the capabilities to both pattern redact for common fields like social security numbers, as well as redact specific fields for known values (patient name). Look here for future posts as TSG continues to add additional capabilities, including redaction for values (e.g. dates older than 18 years – birthdates) as well as analyze documents for additional values (incident date) that could be extracted from the documents.

See our previous posts for how documents can be redacted once already in the system with either manual redaction leveraging OpenAnnotate or Case Field Redaction leveraging the OpenContent Management Suite.

Reader Interactions

Leave a ReplyCancel reply