TSG recently worked with a client to migrate nearly 4 million documents with metadata from 3 different FileNet systems to Alfresco using OpenMigrate. This post will focus on the technical approach used to migrate all of the content, as well as some of the unique challenges of the migration.
One of the unique characteristics of this migration is that it combined content and metadata from 3 different FileNet solutions into a single Alfresco repository and content model. The FileNet solutions include the following:
- FileNet Image Services – formerly Panagon Image Services
- FileNet Document Services – formerly Panagon Content Services, and even earlier Saros Mezzanine
- FileNet P8
FileNet Image Services
We’ve written posts in the past about migrating from FileNet Image Services (IS). See our previous post here. OpenMigrate has a multi-threaded source adapter for extracting content from IS. For this particular migration, the IS content was still stored on an Optical Storage and Retrieval (OSAR) jukebox. With a jukebox, content is stored on disks (more commonly referred to as surfaces) that are swapped in an out of drives inside the device as files are read and written. One of the challenges of working with this type of device is to make sure that content is migrated in an order that minimizes the number of disk swaps. Reading content from the jukeboxes is slow in general, but frequent surface swaps would exponentially add to the total migration time.
To further add to the challenge, the client had outgrown their jukeboxes, so not all surfaces could be loaded at one time. To address this, we broke the migration into several batches, allowing time for surfaces to be ejected and loaded into the jukeboxes between batches.
Another unique requirement for this migration was that all content extracted from IS was converted to PDF/A for archival purposes. Previous migrations from IS for other clients used OpenMigrate to automatically convert single-page TIFF files extracted from IS into multi-page PDF files. This migration added the additional layer of compliance with the PDF/A standard. This was accomplished using the iText PDF manipulation software that was already built into OpenMigrate.
FileNet Document Services
Until this project, OpenMigrate had never been used to migrate content from FileNet Document Services (DS). Similar to Image Services, DS is built on legacy technologies and does not have a Java API to use with OpenMigrate to extract content. After some reverse engineering of DS, we found that it was pretty simple to use OpenMigrate’s JDBC database connector to query the SQL database that DS runs on. We used the connector to extract the metadata, and then pulled the content from the file system based on a path referenced in the database. We’ve had success using this approach with other legacy content management systems as well. Given that we were able to bypass legacy APIs and go directly to the database and file system, the DS migration ran very quickly with OpenMigrate’s multi-threading capabilities.
FileNet P8
Similar to FileNet DS, this project was the first time that we had ever attempted to migrate content out of FileNet P8 using OpenMigrate. P8 is IBM FileNet’s modern content management platform that’s built on a Java stack.
After some investigation, we discovered that P8 also has a relatively simple SQL database model that made it easy to locate the content and metadata to be migrated. Not so simple, however, was determining the location of the binary content on the file system. Because P8 uses an obfuscated method for storing content, we found it necessary to use P8’s Java API for extracting the content from the system. We found the P8 API to be relatively developer friendly, and since it’s a Java API, it was fairly simple to create a P8 source connector for OpenMigrate. Using the P8 API’s JNDI connection protocol combined with OpenMigrate’s multi-threading capabilities, we were able to extract content from P8 at speeds comparable to other ECM platforms like Documentum and Alfresco.
2 Phase Migration
Further complicating this migration was the need to extract metadata from 2 additional databases prior to migrating the FileNet content to Alfresco. Utilizing OpenMigrate’s JDBC query event listener, we were able to easily pull metadata from these databases at migration time.
Due to the complexity added by 3 separate source content management systems, as well as 2 additional database systems providing metadata, we decided to perform this migration in 2 phases. The first phase involved extracting content and metadata from all of the source systems and exporting the content to a file system and the metadata to a temporary database. The second phase involved migrating the content and metadata from the file system and temporary database into Alfresco. We chose this approach primarily because the Alfresco target repository was not yet ready at the time of the migration. It also provided the opportunity for performing metadata verification and cleanup prior to migrating to Alfresco.
Alfresco Target
This migration project was also one of the first to utilize OpenMigrate’s new Content Management Interoperability Services (CMIS) target connector for Alfresco. The previous version of the Alfresco target connector for OpenMigrate utilized the recently decommissioned SOAP API for Alfresco. We’re pleased to discover that the CMIS connector also has performance improvements over the legacy connector, as well as an increased level of configurability.
Summary
OpenMigrate has had a connector for migrating content from FileNet Images Services for quite some time. A recent client project brought about the need for additional connectors for FileNet Document Services and FileNet P8. This particular project involved migrating from FileNet to Alfresco in a 2 phase process, but the configurability of OpenMigrate would allow other target adapters to be swapped in to migrate to other systems, such a Documentum or file system.