As more business processes are being automated for storing and working with documents in ECM repositories, many companies are encountering a need to redact sensitive information in these documents. Due to clients asking about redaction, TSG released OpenRedact nearly three years ago as a manual way for redacting documents stored in Documentum, Alfresco, or Hadoop ECM repository. Recently we have had discussions with existing and new OpenRedact clients on how they need to redact information in a more automated fashion. This post will outline how OpenRedact can allow for automated and/or semi-automated redaction for clients looking to streamline their process of redacting information from documents in their ECM repository.
We recently had a conversation with a client about the need to redact information from three different types of Personally Identifiable Information (PII) from documents that they were importing to their ECM repository. Their three needs were to redact:
- Account and Routing Number from the bottom of checks
- Social Security Numbers that may appear across all document types
- Mailing Addresses that may appear in the content of any document types
The three ways OpenAnnotate could be leveraged for these use cases are as follows:
Manual Redaction
This is how OpenAnnotate has been used by existing clients, which is to force the user to manually draw boxes over the sensitive information and save the redactions. The obvious downside to this approach is that users must manually find the sensitive information in the document and draw boxes around each piece of information they wish to redact. The advantages are that this can be used for ad-hoc redactions for information that isn’t always sensitive, or doesn’t follow clear patterns to allow for automated redaction. Addresses are a good example for this type of redaction since their formats can vary and may not want to redact EVERY address that is in the document (just specific client addresses).
Semi-Automated Redaction
A second way of redacting can leverage OpenRedact in its “Indexing Mode” which allows for automated searching in the document for certain patterns of data. When documents are uploaded into the repository using Indexing Mode, the user is prompted with all of the instances of the predefined patterns across the entire document for redaction. The user then has the ability to “auto-redact” the information that was located and flagged as PII. In this semi-automated redaction, there is still the human element to review that the information really is and should be redacted, rather than potentially blanket removing all instances of a particular pattern without any human review. We have seen clients struggle with a 100% automated redaction of items such as social security numbers since the pattern of xxx-xx-xxxx may be used for account numbers, invoice numbers, or other information that is useful (and important) to leave in the document unredacted.
Automated Redaction
For documents that follow a predefined template and will always fit a certain format, the final way of redacting is an automated redaction which allows for predefined coordinates to be used to always black out certain areas of the document when it is imported. The example we see from our clients of this is a check, which always contains the account number and routing number in the lower left corner of the check, which can be redacted with predefined rules upon import.
We would love to hear your thoughts on how OpenRedact might work for your business process. Let us know in the comments below!
[…] Automated Redaction for Documentum, Alfresco, Hadoop ECM Repositories […]