Documentum and Alfresco – Performance Thoughts for a Hybrid/Rolling Migration

We recently discussed the concept of a rolling migration versus a more traditional big-bang migration approach. See our initial thoughts on development of a rolling migration approach for the eventual retirement of a client’s old system over time. After working through some of the challenges, we now have a system that can quickly move content into the new repository on demand. This post will discuss our results.

What is a rolling migration?

Our current client is an insurance client that developed a custom system for storage of claim related documents. Like many old custom systems, the client’s current system consists of a database containing all of the document metadata with database pointers specifying the location of the content, which is located on an FTP server. Considering the large quantity of documents and large storage needs, the client opted for a “rolling migration” approach. The rolling migration moves content over to the new repository only once it becomes “active” – essentially when the user needs to view it. For this client, the user is leveraging Alfresco with TSG HPI interface for their interface.

The next time the content needs to be accessed, the system logic checks if the content has already been moved to the new repository and, if so, redirects the user directly to HPI. Benefits of this approach include:

Gradual build-up of content, avoiding the complications of a “big bang” migration.
Gradual build-up of users.
Gradual movement of automatic feeds from the old system to the new system.
Gradual build-up of repository size and infrastructure.

All of this benefit comes in an intermediary web layer that sits between the client’s source system and HPI/OpenContent. This web layer contains the functions to pull the documents into the new repository and set the corresponding metadata, along with a simple UI that displays the progress of the migration. Once the migration is complete, the user is routed into HPI inside the claim folder that was just migrated as depicted above. Until the old system is completely retired, the service will always check for new updates to the old system and migrate new documents as required to Alfresco. The concept of separating this logic into a separate web layer is very beneficial. Once the client is ready to retire the old system, the web service layer can be removed and bring the users directly into HPI.

Rolling “On-Demand” Migration Performance?

The new web layer employs multi-threading techniques in order to speed up migration times, as well as a simple UI to show the user the progress of the migration. As shown in the table below, the time for the first visit to the claim folder is greatly sped up by utilizing mutli-threading. The time shown is for a claim folder with 12 documents. The second visit to the claim folder, when the web layer checks if any new content needs to be migrated, is brought down to 2.5 seconds, most of which is a programmed delay to provide the user with feedback as to what the web layer is doing.

Single-Threaded/ Multi-Threaded	Time for first visit (seconds)	Time for second visit (seconds)
Single-Threaded	31.39	N/A
Multi-Threaded	6.86	2.49

In addition to multi-threading, the web layer also converts several image types (TIFF, JPEG, PNG, BMP, GIF) to PDF. This conversion adds a small amount of time to migration (this can be removed based on the migration requirements), but gives the benefit of allowing the documents to be easily viewed in HPI and also be annotated using TSG’s OpenAnnotate tool.

Summary

With the implementation of the new rolling migration web layer that takes advantage of multi-threading, we see the benefits of a smaller rollout with great performance increases.

Please let us know your thoughts in the comments below.

Reader Interactions

Leave a ReplyCancel reply