• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

Documentum Full Text Search with Lucene – Honoring ACL Security

You are here: Home / Documentum / D6 / Documentum Full Text Search with Lucene – Honoring ACL Security

March 30, 2010

The last post discussed the results of an HPI Lucene Search test compared to a Webtop FAST Search as part of a proof of concept for a client looking to provide a consumer interface.  As we have often mentioned on this forum, we continually see clients looking for a better search interface than Webtop, as well as some content cached outside of Documentum for business continuity, performance, and licensing.

One accurate comment raised by the post was that our comparison of HPI/Lucene against a Webtop/FAST search wasn’t really comparing apples to apples as the Webtop search was running against Documentum with security, while the Lucene search was not.  While the client’s goals were to show the benefits of the cached repository and Lucene against Documentum, many Documentum users would like to know how Lucene would perform directly against a Documentum repository (as with upcoming DSS).

For this post, we will discuss TSG’s strategy and initial proof of concept results in leveraging Lucene for a Documentum full text search engine.

Security

In our typical consumer portal, we often have clients choose to either push only “World View” documents, or implement some type of light application security (ex: Only these users have access to these types of documents).  For integration with Documentum, the Documentum search should leverage the existing ACL security layer already in place in the repository.  The main security issue that needs to be addressed is that users without at least “browse” access (can see the document’s metadata, but can’t open it) on a certain document shouldn’t be able to see that document in the Lucene search results.  Keep in mind that the Documentum API would be used to view the document (either from HPI or Webtop) and would check the ACL for “read” access so unauthorized viewing of a document is not an issue.

Lucene Integration to Documentum

The goal of the Lucene integration is to continue to return results quickly and avoiding Documentum API calls, if possible, while following the ACL requirements.  One strategy would be to check each Lucene search result against Documentum for “browse” access.  Although this approach will perform the same way regardless of the complexity of the repository security model, it was quickly determined that having a database hit for every search result would slow performance.  An alternative method that we feel would be faster is depicted below:

One approach is to look up the Documentum user’s ACL rights before a search, determining which ACLs the user has at least “browse” access to in the repository.  By indexing documents in Lucene with content, metadata (including ACL information), we were able leverage one Lucene search (as with the cached approach) without having to check ACLs on documents individually.  Our thoughts were that document ACLs only change after certain events such as a lifecycle state change, and that we could capture and re-index these documents in Lucene after these changes.   With this approach, the Lucene query ends up looking like this:

document_type:sop AND text:”Change Request” AND acl_name:(“Global Read ACL” OR “SOP Effective ACL” OR “SOP Approved ACL” OR “Drawing Read ACL”)

This approach works well for systems with a small and finite number of ACLs.  Because many systems have a large number of ACLs that grant dm_world “read” access, it’s possible that the Lucene query could become very large for systems with complex security models.  An alternative, and more hybrid approach would be to continue to look up the user’s “browse” ACLs before running the Lucene query, but rather than adding ACL clauses to the query, perform a security check on each search result against the list of the user’s “browse” ACLs.  Because this list can be easily stored in memory, it eliminates the need for costly Documentum API calls or DQL queries for each search result.

Because the Documentum/Lucene integration utilizes OpenMigrate to perform the incremental publish to the Lucene index, a lot of flexibility is automatically built in.  The publishing job is DQL query based, so it can easily be configured to only index the desired searchable documents, while ignoring others, including job logs and other repository documents that users should never see.  This flexibility combined with options for security integration described above provide for a more tunable full text indexing solution designed with performance in mind.

Based on our preliminary results, we believe that integrating Documentum ACL security with Lucene search has minimal performance impact on the system.  Additional test results will be posted here as they become available.  Please comment below if you have any thoughts or questions.

Filed Under: D6, D6.5, Documentum, Lucene, OpenContent Management Suite, OpenMigrate, R&D, Search, Upgrades, Webtop

Reader Interactions

Trackbacks

  1. EMC Documentum Search Services (DSS) Beta Recap « TSG Blog says:
    May 5, 2010 at 8:04 am

    […] Documentum Full Text Search with Lucene – Honoring ACL Security […]

    Reply
  2. TSG Blog – 18 months and counting « TSG Blog says:
    September 13, 2010 at 6:16 am

    […] Search – multiple posts regarding leveraging Lucene, Documentum Search Services,  as well as understanding the differences of FAST versus Lucene […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • Documentum 6.5 Upgrade – Character Encoding Issues
  • Documentum – Top 12 Tips
  • Documentum Search – Lucene, FAST, Verity, Google and upcoming DSS
  • TSG Open Source Product Plans
  • Documentum 6.5 to 6.7 Upgrade Lessons Learned
  • Documentum – What’s Next Updated for 2010
  • PDF Annotation Tools That Work Beyond Documentum 5.3
  • Documentum Upgrade – Inplace or Migration
  • Documentum Cross-Repository Searching – an integrated open source approach
  • Documentum Search – Why the Google appliance just doesn’t cut it

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT