As part of our campaign to make Documentum better, TSG is announcing the formalization of our OpenContent Solr Services as well as some new features that we are adding to our work with Solr. We have been working with Lucene and later Solr for over 12 years but have recently discovered several client scenarios and opportunities that are driving our expansion of our Solr capabilities for Documentum, Alfresco, Hadoop and other repositories. This post will detail the different scenarios as well as our initial OpenContent Solr Services roadmap.
OpenContent Solr Services – Scenarios
Our initial work with Lucene and later Solr came from our work with Alfresco where Solr is bundled into the Alfresco repository. For Documentum clients, we have been implementing a publishing approach to push content out of Documentum with full-text and meta-data pushed to a Solr repository for consumer access for over 10 years. As stated here many times, the publishing approach for consumers provides many benefits regarding fault tolerance, performance and user experience. Building on our experience, we are currently building our solutions to address additional scenarios including:
- One to Many Indices – As clients look for maximum performance and security for data separation, one giant Solr or full-text index for all content in the repository can cause performance and security concerns. OpenContent Solr Services will provide the ability to create and manage separate, small or large potentially sharded indices tailored to the client’s specific scenario where access is controlled to the indices itself. Document access can either be to a published copy, direct access to object store or via the ECM itself depending on security needs.
- Documentum Clients – Documentum clients have struggled with xPlore, an old xDB/Lucene mis-mash of products that hasn’t been updated to Solr. OpenContent Solr services will provide an alternative for xPlore to allow Documentum modern tools for efficient searching provided clients leverage the OpenContent Management Suite Interface. Documentum clients can either pick one direct replacement for xPlore as well as multiple indices for performance, security, and other scenarios.
- Multi-Tenant Clients – Whether Documentum, Alfresco, Hadoop or other repositories, clients have always struggled with allowing access to a large repository but implementing the security to allow clients to perform efficient searches against only their content. OpenContent Solr Services will allow for indices to be created given client requirements that offer maximum performance while protecting documents from other parties.
- Consumer Access – OpenContent Web Services will include the ability to publish the meta-data and full-text to a Solr index and either the content itself (typically PDF) or a link to an object store for consumers. For an energy client, TSG is currently publishing 12 plant subsets of documents from a large Documentum instance to 12 individual plants for performance and fault tolerance. While TSG has been providing this service for years, new features added to OpenAdmin will allow customers to better monitor and manage the separate indices.
- Multiple Repositories – OpenContent Solr Services will provide the ability to publish content to a single index from more than one repository with either access to published content or back to the source repository. For clients struggling with multiple authoring applications, OpenContent Solr Services will provide a consistent secure search interface with different indices created for different client scenarios.
- Data Scientists – OpenContent Solr Services would provide access to the Solr Admin tool or any other tool that supports Solr. Data Scientists can set up any indices for their analysis “out of the box”.
- Administration – Leveraging OpenAdmin, administrators will have the ability to create and maintain the many indices as well as create new indices when necessary.
OpenContent Solr Services – Offering components
OpenContent Solr Services are made of up several of our existing products. The architecture could look something similar to the following:
Components of the solution and their capabilities include:
- OpenMigrate – would be used to create and maintain the Solr index. OpenMigrate already provides an enterprise level high performance multi-threaded migration platform to move content between repositories. OpenMigrate can be configured to build the initial indices as well as monitor the repository to update the indices on an interval basis when new content is added.
- OpenContent Web Services – OpenContent Web Services is already used to query and update Documentum, Alfresco, Hadoop and Solr for add/delete capabilities. Additional capabilities would update the 1 to many Solr indices for add/update/delete in Documentum, Alfresco or Hadoop repository to maintain an up to date Solr indices.
- OpenContent Management Suite – Would provide configurable access to the Solr indices as well as ECM repositories or cached content.
- OpenAdmin – would provide the ability to identify both 1 to many indexes as well as identify properties of how the index would be accessed (security). Configurations could include whether the content is cached (published), accessed directly to the object store (add link) or accessed via the ECM repository.
Maintaining one “do all” index for a repository can result in security, performance and maintenance concerns. OpenContent Solr Services adds administration, creation and maintenance of 1 to many Solr indices for addressing a variety of different user and performance scenarios. Components of the solution have been in production for years at different clients. Look for more posts here as this solution matures over the coming year.