Recently, TSG has seen increased interest in our Elasticsearch capabilities, particularly as it related to our 11 Billion Document Benchmark on AWS. As part of our product roadmap this quarter, TSG is announcing the formalization of our OpenContent Elastic Services as well as some new features that we are adding to our work with the Elastic Stack. We have been working with Lucene and later Solr for over 14 years but have recently discovered several client scenarios and opportunities that are driving our expansion of our Elastic capabilities for Documentum, Alfresco and our other NoSQL repositories. This post will detail the different scenarios as well as our initial OpenContent Elastic Services roadmap.
OpenContent Elastic Services – Scenarios
Our initial work with Lucene and later Solr came as a direct result of our Alfresco work where Solr is bundled with the repository. For Documentum clients, we have been implementing a publishing approach to push content out of Documentum with full-text and meta-data pushed to a Solr repository for consumer access for over 12 years. As stated here many times, the publishing approach for consumers provides many benefits with regards to fault tolerance, performance improvement and enhanced user experience. Building on those experiences, we have begun building our solutions to integrate with Elastic to address additional scenarios including:
- One to Many Indices – As clients look for maximum performance and security for data separation, one giant index for all content in the repository can cause performance and security concerns. OpenContent Elastic Services will provide the ability to create and manage separate, small or large potentially sharded indices tailored to the client’s specific scenario where access is controlled to the indices itself. Document access can either be to a published copy, direct access to object store or via the ECM itself depending on security needs. We are planning for OpenContent Elastic Services to be available for Alfresco, Documentum and NoSQL alternatives.
- Documentum Clients – Documentum clients have struggled with xPlore, an old xDB/Lucene mis-mash of products that hasn’t been updated for quite some time. OpenContent Elastic Services will add to our Elastic services to provide an alternative for xPlore to allow Documentum modern tools for efficient searching provided clients leverage the OpenContent Management Suite Interface. Documentum clients can pick one direct index replacement for xPlore as well as multiple indices for performance, security, and other scenarios.
- Multi-Tenant Clients – Whether Documentum, Alfresco, NoSQL or other repositories, clients have always struggled with allowing access to a large repository but implementing the security to allow clients to perform efficient searches against only their content. OpenContent Elastic Services will allow for indices to be created given client requirements that offer maximum performance while protecting documents from other parties.
- Consumer Access – OpenContent Web Services will include the ability to publish the meta-data and full-text data to an Elastic index and either the content itself (typically PDF) or a link to an object store for consumers. For an energy client, TSG is currently publishing 12 plant subsets of documents from a large Documentum instance to 12 individual plants for performance and fault tolerance. While TSG has been providing this service for years, new features added to OpenAdmin will allow customers to better monitor and manage the separate indices.
- Multiple Repositories – OpenContent Elastic Services will provide the ability to publish content to a single index from more than one repository with either access to published content or back to the source repository. For clients struggling with multiple authoring applications, OpenContent Elastic Services will provide a consistent, secure search interface with different indices created for different client scenarios.
- Data Scientists – OpenContent Elastic Services would provide access to the Elastic Admin tool or any other tool that supports Elastic. Data Scientists can set up any indices for their analysis “out of the box”.
- Administration – Leveraging OpenAdmin, administrators will have the ability to create and maintain the many indices as well as create new indices when necessary.
OpenContent Elastic Services – Offering Components
OpenContent Elastic Services are made of up several of our existing products. The architecture could look something similar to the following:
Components of the solution and their capabilities include:
- OpenMigrate – would be used to populate historical data in the Elasticsearch index. OpenMigrate already provides an enterprise level high performance multi-threaded migration platform to move content between repositories. OpenMigrate can be configured to build the initial indices as well as monitor the repository to update the indices on an interval basis when new content is added.
- OpenContent Web Services – OpenContent Web Services is already used to query and update Documentum, Alfresco, NoSQL and Elastic for add/delete capabilities. Additional capabilities would update the 1 to many Elastic indices for add/update/delete in Documentum, Alfresco or NoSQL repositories to maintain an up to date Elastic indices.
- OpenContent Management Suite – Would provide configurable access to the Elastic indices as well as ECM repositories or cached content.
- OpenContent Admin – would provide the ability to identify both 1 to many indexes as well as identify properties of how the index would be accessed. Configurations could include whether the content is cached (published), accessed directly to the object store or accessed via the ECM repository.
OpenContent Elastic Services Roadmap
TSG is implementing OpenContent Elastic Services in a phased approach.
Phase 1 – Elasticsearch on Alfresco – Greenfield Content
- Manually set up index in existing cluster
- OpenContent events push insert/update metadata to Elasticsearch as a transaction
- OpenContent Search Alfresco implementation uses Elasticsearch index
- OpenContent configuration to use either internal (Solr) or external (Elasticsearch) index
Phase 2 – Elasticsearch on Alfresco – On-demand index
- OpenContent endpoint to create index on-demand in existing cluster
- OCMS configuration to create index, specifying which object type(s) and attributes to index
- OpenContent configuration to allow search implementation to query internal and external index
- Historical Content – when a new index is created, all existing content will be pushed to that index.
Maintaining one “do all” index for a repository can result in security, performance and maintenance concerns. OpenContent Elastic Services adds administration, creation and maintenance of 1 to many Elastic indices for addressing a variety of different user and performance scenarios for Alfresco, Documentum or NoSQL (Hbase/DynamoDB) alternatives. Look for more posts here as this solution matures over the coming year.