Way back at Momentum 2001 in Chicago, I remember having an in-depth conversation with a Documentum architect in integrating Autonomy into the Documentum platform. TSG was implementing Autonomy at the time and Documentum was looking to build a pluggable architecture into Documentum in which any search engine could be integrated. The 5.3 platform helped usher in that pluggable architecture with the replacement of Verity (now owned by Autonomy) with FAST. 9 years later at EMC World 2010, Documentum is getting closer to releasing Documentum Search Services, which is essentially an integration between Lucene and xDB.
Ed Buche and Aamir Farooq both presented at EMC World, providing a good technical overview of DSS and lessons learned from how FAST currently interacts with the Content Server. I’ve always looked forward to Ed Buche’s presentations, and glad he has been very involved in the architecture of DSS. A couple of items to highlight:
Overview
Using an XML database like xDB in conjunction with Lucene makes a lot of sense in regards to performance and scalability. All metadata for content is being converted to an XML file and stored within xDB. This is very similar to how FAST ingests metadata today. However, with DSS, an XML representation of the ACL will also be created and stored in xDB, allowing security to be evaluated by the search engine, not at the Documentum level. Replication of ACLs from the Content Server to DSS will be asynchronous, not necessarily transaction based.
A new full text admin interface will also be available, providing much more detailed reports on indexing status, errors, graphs, etc.
Performance and Scalability
Queries that may have taken minutes in FAST, will take seconds in DSS. Documentum has taken a number of lessons learned from the FAST integration and has addressed a number of performance issues that have caused angst in the past. Querying inside folders with a large number of subfolders has been optimized. Additionally, underprivileged users belonging to a small subset of content but searching a wide range of content should see a significant increase in performance. This is a specific issue we’ve run into with our clients and looking forward to comparing the performance difference.
Facets
Facets provide the ability to display your search results and drill down further by a set of pre-defined categories. If you have a large results set, you can further drill down by date, format, etc. to refine your search. CenterStage will support this out of the box. I will be curious how or if this will be integrated into Webtop Search Results or how custom search applications will be able to make use of the capability.
Cost / Upgrading to DSS
DSS will remain part of the Content Server and will not be licensed separately.
Microsoft/FAST and Documentum have agreed on extended support for customers until the end of 2011. Therefore, customers making use of full text indexing must upgrade to at least 6.5 SP2 and migrate to DSS by then. DSS will become standard starting with the D6.7 Release. This may be a key driver for customers to start planning you upgrades based on the 2011 date.
Customers who are currently deployed on 6.5 SP2 or later will be able to upgrade to DSS. To evaluate and test DSS compared to FAST, a new docbase may be created using DSS. Both FAST and DSS can therefore be running at the same time and provide a seamless transition from one search platform to another.
When Documentum replaced Verity with Fast, it did this out of desperation. There were a lot of nice things with Verity such as not requiring external servers for search which Fast introduced. Also, HA sucked with FAST. But, Documentum never listened to their customers and went ahead with the decision anyway. We asked them for the lucene search engine that was built into their OEM edition, but they wouldn’t give it to their non-OEM customers. They told us FAST is their strategic search platform.
Guess all that changed when Microsoft bought FAST!!!
Is the Documentum search engine really pluggable? Have you done anything with making the Sharepoint/FAST index the target of the Documentum search action rather than DSS or Documentum FAST?
Tim,
The pluggable architecture mentioned in this article is referring to the vision that Documentum had for search services back at the Chicago Momentum conference in 2001. EMC does allow you to configure the search engine in the server.ini through the ft_engine_to_use property, but we don’t know of any clients that have implemented search engines other than xPlore, Documentum FAST, or Verity.