Hadoop – OpenContent/HPI Product Plans – Technology Services Group

The first step in supporting all of the TSG products on Hadoop is building our OpenContent REST Web Services layer to access Hadoop in the same manner we access Documentum, Alfresco and other content management systems. This post will present our plans and timelines for OpenContent along with associated TSG solutions.

OpenContent for Hadoop – Phase 1 – Minimally Viable Product

We have focused phase 1 to provide a minimally viable product to allow for content migration by OpenMigrate as well as basic access by HPI. As a first vertical solution, we are targeting all of the functionality required for our insurance solution that includes:

Add Documents and Meta Data – index full-text and metadata with Solr
Retrieval of Documents
Delete Documents
Update Documents (no versioning yet)
Annotate with OpenAnnotate
Rendition to PDF
Searching (using Solr as we mentioned in our previous blog)

To enable these capabilities, we are supporting the following Web Services Calls

createDocument
readDocument
updateDocument
deleteDocument
getProperties
addRelation
addRendition
removeRendition
getContentFormats
search

As of today (01/28/2015) we are feature complete for Search/Retrieval, Add/Delete. Plans for the upcoming weeks include:

Addition of Transformation Server – we typically rely on the vendor so are building our own. Might make available for other users by utilizing open source libraries to perform the transformations.
Addition of PDF Annotations with OpenAnnotate

OpenContent for Hadooop – Phase 2 – Versioning, LifeCycle and Security

The second phase of OpenContent will be to add versioning, lifecycles, and security. Some of the unique options we are planning for Hadoop will include:

Different properties supported on different versions.
Version Tree/Numbering consistent with Compliance Solution
ACLs, LDAP groups, as well as some content specific security

We are planning for this phase to be complete by March ’15. Additional OpenContent Services required will include:

checkout
checkin
cancelCheckout
getLockOwner
getDocumentVersion
getAllVersions
setPermissions

OpenContent for Hadoop – Future Phases

Future plans will include full support for all our solutions:

Let us know your thoughts in the comments below.

Comments

Rob Lancaster says

January 29, 2015 at 5:25 pm

Hadoop as a document store is a natural progression for those early adopter companies that are starting to standardize on HDFS the file system within an enterprise data hub. Increasingly, Hadoop users are recognizing the power of the big data platform as a “single source of the truth” for enterprise information, not just data, and not just things like log files. That said, I believe this is an opportunity for the ECM vendors that are agile enough to understand that they have a ton of value to add on top of document storage. Its early days for sure, but the Hadoop ecosystem evolving so quickly it will be fun to watch…

Reader Interactions

Comments

Trackbacks

Leave a ReplyCancel reply