• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

Documentum Search – Lucene, FAST, Verity, Google and upcoming DSS

You are here: Home / Documentum / D6 / Documentum Search – Lucene, FAST, Verity, Google and upcoming DSS

October 27, 2009

Since the new Documentum Search Services beta program just started last week, we thought we would share some of TSG’s thoughts on full-text search and our plans to add Lucene capabilities to our open source offerings.

Documentum Search Services (DSS), was tentatively called Enterprise Search Services (ESS) early in the product development.    DSS promises to be “the next generation of search in EMC and will be built upon xDB with Apache Lucene as the underlying indices”.  Specific highlights from EMC World included:

  • Relevance Sorting
  • Advanced Query Processing
    • Parallel, Native Facet computation, Xquery for structured and unstructured search
    • Lower Hardware and Storage Costs
  • Native VMWare, NAS, SAN support and Advanced Data Placement

At the present time, DSS is targeted for heavy testing through the end of 2009 with a release in 2010.

TSG Thoughts on DSS

At the present time, we are very encouraged with the progress and the direction of DSS.  We have been using Lucene for a couple of clients and can safely say that the tool will address many of the shortcomings of FAST including index rebuild, overall performance and server requirements.  That being said, the scope of DSS needs to encompass all of the Documentum API level functionality that FAST or Verity have addressed in the past.  As the beta progresses, truly the “devil is in the details” of how DSS evolves so we will with hold our final thoughts until the beta is complete.

Other Tools (Autonomy, Google Appliance, SearchBlox, Vivisimo….)

As an integrator, we do get asked to integrate in different search tools.  We began working with Autonomy for EMC on an internal Documentum project (pre-Documentum purchase) back in the late 90’s.  Overall, most search tools meet full-text needs but are typically built as “crawlers” focused on the web.  As a crawler, the tool needs to scan a directory/website for changes and then update the full text index.  We have found this approach difficult when Documentum clients want to do true “Documentum  Searches” of combining attribute, security and full text.  For example – one client wanted to search on secure documents a certain plant (attribute), create date (attribute) and containing this part-number (full-text).

Also, a couple of clients have had concerns in regards to latency of when a document is stored in Documentum and indexed (after the crawler runs) in the full-text search engine.  One client complained that with FAST, sometimes the latency was 2 minutes and other times it was 2 hours.

Our last concern with the crawler approach is how to get the index data and security added to the index to avoid having to run the query against Documentum (plant, create date, security), against full-text (part-number) and then only displaying the results that are on both lists.

Native Lucene with Documentum?

One scenario we are building out for clients is a Documentum 5.3 or 6.5 application that indexes documents into Lucene from either Documentum or a cached copy (whitepaper here).   To differentiate from DSS, our approach won’t provide support for inline DQL but rather a pure web services approach.

In the diagram below, both OpenMigrate and HPI use OpenContent web services to communicate with Lucene.  OpenMigrate is used to keep the Lucene index up to date, and HPI is used to query the index for full text searches and optionally metadata searches as well:

full_text_arch

A couple of key factors:

  • 5.3 Support – we are focused on supporting both 5.3, 6.0, 6.5 and future releases.  Many of our clients have chosen to delay their upgrades due to variety of reasons.  By implementing Lucene now, clients can remove FAST in their current environment and from an eventual D6.5 upgrade.
  • Attributes – we are focused on storing both the content, attributes and security in Lucene to avoid having to search both the Documentum attributes and the Lucene full-text index.
  • Indexing – we are leveraging OpenMigrate to index/delete content and meta data to Lucene on a real-time, multi-threaded push basis to avoid a crawler approach.   We think the push approach can better control updates to the index, reduce server load on the full-text index and improve audit control to insure everything is indexed.
  • Security – One issue we addressed was how to manage security concerns versus high-performance search.  Verifying that the user has access to browse each document retrieved from the search (Documentum lookup) is expensive and would hurt performance as identified in the crawler discussion above.  One approach was to cache document ACL information with each document in Lucene and update as ACL’s are updated.  Since Documentum ACL’s don’t change often, we would leverage one lookup to retrieve the users ACL access and add that information to the Lucene query.

So far our results have been favorable.  Please contact us if you are interested in this type of solution as we are looking for additional case studies.

Filed Under: D6, D6.5, Documentum, ECM Landscape, Lucene, OpenContent Management Suite, OpenMigrate, Search, Upgrades, xPlore

Reader Interactions

Trackbacks

  1. Documentum – What’s Next Updated for 2010 « TSG Blog says:
    February 23, 2010 at 3:22 pm

    […] Documentum Search – Lucene, FAST, Verity, Google and upcoming DSS […]

    Reply
  2. Documentum Search – Lucene versus FAST « TSG Blog says:
    March 17, 2010 at 1:18 pm

    […] Documentum , HPI , Open Source , R&D , Search Leave a Comment As mentioned in a previous article, many clients are moving to away from FAST in preparation for the eventual release of Documentum […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • Documentum Full Text Search with Lucene – Honoring ACL Security
  • Documentum – What’s Next Updated for 2010
  • Documentum 6.5 Upgrade – Character Encoding Issues
  • Documentum – Top 12 Tips
  • TSG Open Source Product Plans
  • Documentum 6.5 to 6.7 Upgrade Lessons Learned
  • Documentum Search – How to get around the user request of “I just want a search like Google”
  • Documentum Consulting and Open Source
  • Documentum Search – Why the Google appliance just doesn’t cut it
  • Documentum – EMC World/Momentum 2012 – TSG Recap

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT