The first step in supporting all of the TSG products on Hadoop is building our OpenContent REST Web Services layer to access Hadoop in the same manner we access Documentum, Alfresco and other content management systems. This post will present our plans and timelines for OpenContent along with associated TSG solutions.
OpenContent for Hadoop – Phase 1 – Minimally Viable Product
We have focused phase 1 to provide a minimally viable product to allow for content migration by OpenMigrate as well as basic access by HPI. As a first vertical solution, we are targeting all of the functionality required for our insurance solution that includes:
- Add Documents and Meta Data – index full-text and metadata with Solr
- Retrieval of Documents
- Delete Documents
- Update Documents (no versioning yet)
- Annotate with OpenAnnotate
- Rendition to PDF
- Searching (using Solr as we mentioned in our previous blog)
To enable these capabilities, we are supporting the following Web Services Calls
As of today (01/28/2015) we are feature complete for Search/Retrieval, Add/Delete. Plans for the upcoming weeks include:
- Addition of Transformation Server – we typically rely on the vendor so are building our own. Might make available for other users by utilizing open source libraries to perform the transformations.
- Addition of PDF Annotations with OpenAnnotate
OpenContent for Hadooop – Phase 2 – Versioning, LifeCycle and Security
The second phase of OpenContent will be to add versioning, lifecycles, and security. Some of the unique options we are planning for Hadoop will include:
- Different properties supported on different versions.
- Version Tree/Numbering consistent with Compliance Solution
- ACLs, LDAP groups, as well as some content specific security
We are planning for this phase to be complete by March ’15. Additional OpenContent Services required will include:
OpenContent for Hadoop – Future Phases
- Future plans will include full support for all our solutions:
Let us know your thoughts in the comments below.
Rob Lancaster says
Hadoop as a document store is a natural progression for those early adopter companies that are starting to standardize on HDFS the file system within an enterprise data hub. Increasingly, Hadoop users are recognizing the power of the big data platform as a “single source of the truth” for enterprise information, not just data, and not just things like log files. That said, I believe this is an opportunity for the ECM vendors that are agile enough to understand that they have a ton of value to add on top of document storage. Its early days for sure, but the Hadoop ecosystem evolving so quickly it will be fun to watch…