Over the last couple of years, some ECM vendors have been touting a Federated Search model to “cure” the issue of access to document content contained in legacy ECM systems. Whether from an ECM vendor like Nuxeo or supporting software vendor like Simflofy, the marketing message of getting access to multiple repositories from one search hearkens back to the previous promises from Enterprise Search vendors like Automony that never really fulfilled their promise. This post will discuss the problem Federated Search attempts to solve and present some of the reasons we will typically recommend alternative solutions including a publishing approach.
Federated Search – Easier than migration?
Whether tied to digital transformation or new ECM development efforts, federated search usually arrives as a solution when content is contained in a legacy ECM system that isn’t going to be migrated or replaced by the new development efforts. As we have talked about here before, migration isn’t easy. If the new system just needs access to the content but doesn’t want to change what the system is doing, compelling legacy reasons for not migrating include:
- Finding a reason to move everybody and everything
- Moving legacy users
- Migrating legacy content
- Moving legacy integrations
- Accessing legacy resources
To better visual the issue, imagine a typical Accounts Payable scenario. The team that is managing invoice payment and owns the capturing of invoices would like their payment analysts to have access to the signed contracts contained in the legacy contract system. The invoice payment team is on Alfresco and the contracts are contained within the legal group’s iManage system. Federated Search would enable to invoice payment team the ability to show both the invoices as well as the contracts within their one invoice by connecting the Alfresco system to the iManage system.
Federated Search – Is it really that simple?
The marketing message for Federated Search typically contains the message “why move when you can just access the legacy content” but is it really that simple? As pointed out in an excellent article from Accenture, The primary advantage of this approach is ease of implementation because no additional indexing of content is necessary. The query federation system simply taps into existing systems and extracts results, which are then merged….but cons include:
- Performance issues can occur if the federator waits for the slowest remote search engine to respond
- The merging of search results into a sensible hit list is difficult if based on relevancy, as each search engine called will score relevancy in a different way.
- Search engines provide varying levels of query sophistication. Federation at query time usually implies a “dumbing down” to suit the least capable search engine.
- Document-level security is a potential cause of performance issues, but this depends on the complexity of the security environment
In addition to the points raised above, we have seen our clients that have attempted federated search struggle with other issues including:
- Security Logistics – In our AP example, legal would have some concerns about allowing access to their system, particularly DRAFT contracts. Making sure that invoice payment only has access to certain documents would require updates to iManage, something legal would not necessarily want to do and support.
- System Logistics – Legal might be concerned about the load the new access will place on their legacy system as federated searches are not always the best performing.
- Licensing Logistics – Users would require license access to both systems. In our AP example, all invoice payment analysts would need both Alfresco and iManage licenses.
- System fault tolerance – Relying on both systems being available increases the concern that if one is made unavailable or struggles with performance issues for any reason, the end user experience will struggle. Adding more repositories increases this risk.
Data Warehousing Lessons Learned – A Publishing Approach
When it comes to federated search or enterprise search, TSG sees parallels in the data warehouse approach. In a data warehouse approach, clients wanted access to data contained in other systems but did not want to replace those systems. Rather than a federated approach, the data warehouse focuses on publishing content from the legacy system to the data warehouse. With the cost of storage always getting cheaper and cheaper, TSG has been recommending a publishing approach for documents. As we recommended back in 2015 when Enterprise Search was being discussed, TSG will typically recommend a publishing approach rather than a crawler or federated search.
In this publishing approach, a job is set up to monitor the business system looking for documents of a type and that have reached a stage that they can be pushed to the separate repository. With this push, the new repository will have all the meta-data as well as a copy of the document itself. Typically we see clients just publish a PDF of the document since it is to only be used for read access. The publishing job might also push a light version of security in the form of meta-data if required.
In this manner, the legal department can insure that access to their own system is still controlled and documents that are needed to be shared can be pushed to invoice payment system as required. Advantages of this approach over a Federated approach include:
- Integration – Rather than having to write real-time integration to the departmental repository, the integration would be required at the publishing job. The Search Interface could be written for just new repository (Alfresco) and take advantage of all the capabilities of the repository.
- Performance – Search performance is not limited by the system with the slowest response time.
- Content Format – As part of the publishing job, content could be changed (typically to PDF) and also include additional items (headers/footers….) to provide consistency between systems.
- Administration – Each user would need to be defined and maintained in the overall search repository rather than the departmental system.
TSG has implemented the publishing approach for multiple clients with OpenMigrate. Several features include:
- Ability to pull from a wide variety of ECM repositories including Documentum, FileNet, Alfresco, SharePoint as well as database driven systems (example Custom Oracle/SAP)
- Ability to “poll” a repository and push content on a set interval (example 5 minutes or once a day).
- Ability to transform content from a variety of formats into PDF.
- Ability to store and index into a variety of repositories including Alfresco, Documentum as well as Lucene/Solr and Hadoop.
- Ability to delete outdated or superseded documents from target repository.
Federated Search, like Enterprise Search before it, has some positive marketing capabilities but also has some downsides.
Quoting Alan Pelz-Sharpe from Deep Analysis
Federated search has been around a long while, but in my experience its never been easy to implement and in many cases simply not worth the effort.
Similar to Data Warehouse efforts, TSG typically recommends a publishing approach based on licensing, fault tolerance and overall user acceptance.
Let us know your thoughts below: