One of the big improvements coming with Alfresco 5.2 has been an update of the Solr integration to Solr 6. Some of our clients leveraging the old version of Solr with 5.1 or 5.0 have sometimes experience Solr inconsistency issues where documents in Alfresco haven’t yet been indexed in Solr for a variety of reasons. Recently, a couple of our large clients have leveraged Alresco’s Transactional Metadata Query System to rely on the database rather than leveraging Solr for certain searches. The approach both improved performance as well as reduced Solr issues in pre-5.2 instances.
Alfresco Solr Integration versus the database
One of the big benefits of Alfresco versus other ECM tools is the integration of Lucene and later Solr as part of the infrastructure from the beginning. TSG has been leveraging Lucene and Solr for a variety of clients for years as part of our consumer caching approach for many applications including critical business continuity solutions. TSG has always been very impressed not only with the cost of Solr (open source so no cost) but also Solr’s performance and capabilities compared to other search tools. Many vendors, like Documentum, have moved from proprietary tools and now support Solr/Lucene.
Just like legacy ECM tools, Alfresco has a relational database for transactional storage of the meta-data and file pointers. Unlike Legacy tools, Alfresco leverages Solr as the default for all searches with all content being automatically indexed in Solr. Some advantages of this approach include:
- Reduced Database maintenance – it doesn’t happen as much anymore but for many of the legacy ECM tools, often times a DBA was required to “tune” the repository by adding indexes to key attributes to improve performance. With Solr maintaining it’s own indexes, DBA support can really be minimal.
- Full-Text and Meta-Data search – initially, for legacy ECM tools, adding full-text support had to include adding and purchasing a full-text search engine (Verity and FAST are a couple of examples). All of the meta-data and the full-text needed to be indexed within the tool. With Alfresco, those abilities come “out of the box” at no charge
In the pre-Alfresco 5.2 releases, sometimes clients would experience Solr inconsistencies where content might not have been indexed. See our post from back in November about Alfresco’s Transactional Metadata Query System. From a review with clients, there are multiple factors that might have resulted in the Solr issues based on each client’s individual infrastructure. It is worth noting that TSG has seen these issues with other legacy ECM vendors Solr implementations as well.
Alfresco Clients – Memory Cost and leveraging the database
One more recent major impact for both Solr and database usage has been the change to the cost of memory. During a discussion with John Newton (Documentum co-founder, Alfresco co-founder) at the Alfresco Sales Kickoff Meeting two weeks ago, we both remembered how the databases used to be tuned based on where things were written on magnetic disc to reduce disc head seek time. The cheap use of memory makes some of the index and particularly disc storage requirements moot.
For our clients, neither is using Alfresco Share for searching. Both clients have leveraged our High Performance Interface for searching as well as document viewing/manipulation. HPI provides a method to configure a simplified search based on user departments and roles as well as the ability to have both a simple search as well as an advanced search. For our clients:
- An insurance client forecasting 1 billion objects in Alfresco did not have a need to do cross-claim search or full-text search. Search was simply limited to the meta-data on a document within a claim folder (could have up to 65,000 documents in a claim folder). Leveraging the database not only improved performance but simplified their large infrastructure.
- A cloud service provider client with a multi-tenant repository realized that the bulk of the searches were only on meta-data. While Solr was still required for full-text and certain types of searches, the database could be leveraged in their pre-5.2 environment to improve performance as well as reduce Solr inconsistency errors.
Leveraging the Database with Alfresco – How to do it
With the release of version 4.2, Alfresco began supporting a system called Transactional Metadata Query. This system allows particular CMIS and FTS language queries to be run directly against database indexes instead of the Solr index. Having CMIS and FTS queries run directly against the database allows content to be retrieved as soon as it is committed, as opposed to searching against the Solr indexes. By default, the Transactional Metadata Query system takes a CMIS or FTS query and parses the query to determine if each part is supported by the database query engine. If the entire query is supported, it is run directly against the database. If the entire query is not supported by the database query engine, it is run against Solr instead.
Meta-Data searches that can rely on the database include:
- Date
- Datetime
- Text
- Integer
- Long
- Equals
- Not Equals
- Like
- All
Meta-Data searches that currently cannot rely on the database include:
- Repeating
- Boolean
- Double
- Any
- Full-text
More detail available on specific queries in our previous post and on the Alfresco support site for Alfresco 4.2.7 or 5.0.
For both our clients, the ability to run the bulk of their queries against the database has improved performance and reduced errors. With new releases, we have noticed that Alfresco has been adding more and more database support for FTS.
Summary
Solr integration to Alfresco is one of the strengths of the repository that differentiates Alfresco from some of the Legacy ECM vendors. Client’s on pre-5.2 releases have sometimes struggled with Solr inconsistency issues. Where possible, clients can leverage the database query provided by Transactional Metadata Query System rather than Solr to improve performance and reduce Solr inconsistency errors.
[…] For these clients, a single Alfresco database compliant query (check out our previous posts on utilizing the Alfresco database and Alfresco TMQ Queries for more information) that runs against the database does not perform […]