• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

Alfresco Performance – Making Property Queries Faster

You are here: Home / Alfresco / Alfresco Performance – Making Property Queries Faster

May 7, 2018

Recently TSG has assisted multiple clients with unique scenarios to make Alfresco queries more efficient.  Like a good auto mechanic that has learned tips and tricks for getting better performance from a car without sacrificing quality, TSG has recently developed performance improvements for Alfresco in OpenContent to address clients’ needs.  This post will summarize the approach and updates to OpenContent Web Services.

Alfresco Queries – How do they work?

For those unfamiliar with ECM, most ECM systems including Alfresco are built on a database to store meta-data, security, as well as pointers to the file locations.  In TSG’s opinion, Alfresco has the most flexible database schema in the marketplace.  By isolating property names and values, Alfresco can best support aspects to allow for hundreds of different document types and the ability to easily search across documents.

Alfresco has both Solr and direct database query options.  Many of TSG’s clients that have large volumes of transactional documents have chosen the database query (Alfresco Transactional Metadata Query – TMDQ) access method for a variety of reasons including:

  • No requirement for full-text search – as all searching relies on properties.
  • Large volumes of documents – (ex: over 200 million) make setting up a sharded Solr environment costly.
  • Performance – tests comparing Solr to TMDQ for metadata queries have resulted in TMDQ being comparable for certain queries.

See a more detailed discussion of the differences between Solr and TMDQ in a previous post – Alfresco Solr Issues – when does just leveraging the database make sense?

Performance issues for both TMDQ and Solr occur when queries return large amounts of metadata along with the documents.  Users typically will require name of the document, tags, author, create date or a variety of other values returned about the document.  Examples of client large queries have included:

  • Administration Queries – For large volume clients that need to see a large list of documents. Examples could include how many documents were migrated today or last week.  Another example is which documents haven’t been renditioned yet.
  • Large User Queries – Searches that will return large number of results in a user scenario. For example a listing of all the documents (and metadata) in a folder when the folder contains 60,000 documents.

For some of our clients, OOTB the Alfresco queries above might run as long as 30 minutes and potentially time out.  In working with alternatives for our clients, TSG was able to confirm that it wasn’t so much the issue with how long the query ran but more the formatting of results once queries returned.  Because a query against the database would quickly return a list of objects that satisfied the query, performance improvements have focused on how to more quickly retrieve the values of the properties for the list of objects.

Alfresco Properties and Security – Understanding how they work

Typical SQL database queries for Alfresco with the API look something like:

SELECT node.id as id  FROM alf_node node, etc…
WHERE node.type_qname_id <> 149
AND node.store_id = 6
AND node.id IN ( SELECT PROP.node_id FROM alf_node_properties PROP
WHERE (26 = PROP.qname_id) AND PROP.string_value = ‘123456’)

The above query would return the list of node ids that satisfied the value of the name being equal to “123456”.  Once Alfresco has a list of node-ids, the API will then run security logic on the entire set to pare down the results to what the user executing the query can access. The API will then return the pared down set of node-ids.  Regardless of whether the query is run by Solr or TMDQ, further API calls for each one of the returned node-ids would need to look up any properties that might be needed for display in the application.  That API would look something like:

Map<QName, Serializable> props = serviceRegistry.getNodeService().getProperties(node);

For Alfresco and particularly large volume customers, two unique situations can occur based on this approach:

  • For a large list returned values, the number of calls for the properties is very large. (1 call for each node)
  • For large volume customers, the alf_node_properties table can get very large. For one client with 33 properties per document and 380 million documents, we estimated that the table would include 12.5 billion rows making it difficult to cache/tune the large table.

For a test benchmark, TSG built a 250,000 Alfresco repository with Solr enabled.  To make sure we had a clean test, security was not applied to any of the queries.  TSG had the following very similar results for Solr and TMDQ.

  • 100 document results
    • TMDQ (1.967 seconds) – Solr (1.814 seconds)
  • 1,000 document results
    • TMDQ (8.536 seconds) – Solr (5.471 seconds)
  • 10,000 document results
    • TMDQ (1 minute 12.427 seconds) – Solr (1 minute 10.555 seconds)
  • 50,000 document results
    • TMDQ (6 minutes 2.336 seconds) – Solr (5 minutes 8.269 seconds)
  • 100,000 document results
    • TMDQ (11 minutes 14.938 seconds) – Solr (10 minutes 33.680 seconds)

In running our tests, we were surprised just how close the timings were between Solr and TMDQ.  TMDQ would execute the query quickly (4% of the total time) but most of the time was retrieving the properties (96% of the total time).  Solr would take longer to execute the query (95% of the total time) but the property fetch (5% of the total time) was quicker due to Solr caching.

OpenContent Database Query Updates

OpenContent Web Services has typically just provided a front-end to either the Alfresco API or CMIS implementation doing the normal database or Solr Query.  OpenContent Web Services executes the Alfresco query and then calls the property API and passes the results back.  Back in January, for our large clients, TSG started executing database calls within our OpenAdmin tool to provide counts and other administrative access to the Alfresco database.  See related post – Alfresco – Adding the ability to query on very large data sets.

To improve query performance and extending on the experience from January, the new capability built into OpenContent Web Services leverages the power of the database to put all of the property retrieval in one query avoiding the separate calls to retrieve the necessary property values.  The resulting query looks something like the below:

SELECT node.id                  AS nodeID,
        props.boolean_value      AS boolean_value,
        props.long_value         AS long_value,
        props.float_value        AS float_value,
        props.double_value       AS double_value,
        props.string_value       AS string_value,
        node.audit_creator       AS creator,
        node.audit_created       AS created,
        node.audit_modifier      AS modifier,
        node.audit_modified      AS modified, etc…
FROM   alf_node_properties props
       JOIN alf_node node
       ON node.id = props.node_id, etc…
WHERE node.type_qname_id <> 149
AND node.store_id = 6
AND node.id IN (
SELECT PROP.node_id
FROM alf_node_properties PROP
WHERE (26 = PROP.qname_id AND PROP.string_value = ‘123456’)
AND (29 = PROP.qname_id AND PROP.string_value = ‘Medical’)

Some of the reasons for the performance improvement include:

  • Direct Database Query – Leveraging a read-only database user, OpenContent formats a more efficient database query and parsing of results.
  • No Security – to support admin users as well as large volume clients that leverage a different security model passed to the query in regards to the search itself, the DB query does not have to check Alfresco security.
  • Returning properties in query results – by restructuring the query, OpenContent is able to return properties from the initial query itself, negating the need for a second API call per document to retrieve properties for display in the application.

From our benchmark test against the same repository:

  • 100 document results
    • OpenContent (0.169 seconds) – TMDQ (1.967 seconds) – Solr (1.814 seconds)
  • 1,000 document results
    • OpenContent (1.361 seconds) – TMDQ (8.536 seconds) – Solr (5.471 seconds)
  • 10,000 document results
    • OpenContent (5.725 seconds) – TMDQ (1 minute 12.427 seconds) – Solr (1 minute 10.555 seconds)
  • 50,000 document results
    • OpenContent (25.209 seconds) – TMDQ (6 minutes 02.336 seconds) – Solr (5 minutes 08.269 seconds)
  • 100,000 document results
    • OpenContent (30.216 seconds) – TMDQ (11 minutes 14.938 seconds) – Solr (10 minutes 33.680 seconds)

From the numbers above, OpenContent was roughly 4 to 20 times faster depending on query size results.

Summary

OpenContent can now provide performance improvements for property queries for many client scenarios by making a more efficient database call against Alfresco regardless of the size of the repository or query result set.  This performance is particularly important for queries that will have large result sets but would also increase performance for normal queries as well.  For other scenarios or even great performance improvements, TSG would recommend looking at our Solr Services.

Let us know your thoughts below.

Filed Under: Alfresco, OpenContent Web Services, Tech Tip

Reader Interactions

Trackbacks

  1. DynamoDB – ECM / Content Process Services Roadmap says:
    November 27, 2018 at 8:04 am

    […] the speed of certain Alfresco queries for large repositories, OpenContent Web Services now includes specific DB tuned queries not available through the Alfresco API to improve performance […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • Alfresco No Code – Do More with OpenContent
  • Elastic Services for ECM – TSG OpenContent Roadmap
  • Alfresco – ADF and OpenContent Comparison- REST Web Services
  • Alfresco – Adding the ability to query on very large data sets
  • Alfresco and Angular 2.0: Technical First Look
  • Avoiding Alfresco Performance Problems from Day 1 – Keeping My ECM Healthy
  • Alfresco’s Transactional Metadata Query System – Mystery Results
  • Upgrading to Alfresco 5 – Keys to Success
  • Harnessing the Power of Alfresco Data Lists for Cascading Value Assistance
  • Auto-Numbering Content in Alfresco

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT