TSG just completed the bulk of activities to migrate over 4 Billion documents for a large health insurance company. This was one of our most complex migrations ever with over 4 billion documents and both legacy FileNet Image Services as well as a large amount of the content in FileNet P8. For this post we will share our FileNet best practices and lessons learned.
FileNet Migration Background
Back in 2016 we wrote a post on best practices for a FileNet migration. At the time we were focused on migration to Alfresco for a very old FileNet customer (on optical platters with a jukebox). Since 2016 we have seen more interest from clients in migrating from FileNet as many have struggled with the relationship with IBM. The struggle was described back in 2017 when Gartner removed IBM from their Leader Quadrant for customer services and vision reasons. For this latest client we leveraged our NoSQL alternative to migrate to HBase/Hadoop although Alfresco, AWS/DynamoDB, Google or Azure would have all been appropriate modern targets as well. Components of FileNet Migrations can include:
- FileNet Image Services/COLD – In 2016, the FileNet migration included a very old OSAR jukebox and optical discs. By 2020, most FileNet customers have moved to all magnetic removing the physical loading (and performance issues) with optical platters. While the content has been moved to magnetic, the format and other issues with DAT files still exist.
- FileNet P8 – P8 is a more modern platform that many clients that have Image Services have adopted but many still left Image Services in place.
Specific Details for this 4 Billion plus migration included:
- Over 3.5 billion documents in ImageServices DAT files
- Over 500 million documents in P8
- Unknown number of Annotations that were not migrated to the new system
- Conversion of FileNet COLD and TIFF formats to PDF
FileNet Image Services Migration – Best Practices
Despite content no longer being stored on optical platters, FileNet Image Services still relies on a meta-data in a database with content stored in a DAT files. DAT files contain multiple pages and documents consistent with content that was previous written to optical platters. Best practices for migration include two distinct steps.
- Metadata migration – moving all the document metadata from FileNet to the new repository.
- Content migration – moving the documents from DAT to the new repository. Content migration could also involve moving from FileNet TIFF and COLD to PDF. FileNet DAT file format can be different between installations.
TSG’s past FileNet migrations relied on calling the FileNet API from the command line to retrieve images from storage. This worked well for lower volume clients and clients that utilized optical disk jukebox storage devices (OSAR) that required constant swapping of disks. For higher volume customers utilizing magnetic storage (MSAR), the FileNet API proved to be too slow to extract all of the documents in a reasonable amount of time. Because of this, TSG developed an adapter for OpenMigrate to be able to extract content directly from the storage device, bypassing the FileNet API.
FileNet stores content by mashing hundreds of files together into a single blob file known as a DAT file. OpenMigrate’s adapter is able to deconstruct the DAT file into its individual file components and match those files with the metadata stored in FileNet’s database at very high speeds.
For this large migration, TSG set up OpenMigrate jobs to move meta-data as well as move content in separate jobs. Originally we proposed a rolling migration as detailed in this post from last year. For this large client we eventually moved to a move everything approach as most of the content was archival.
We posted in detail about our FileNet Image Services updates, specifically around the COLD (Computer Output to Laser Disc) approach. Again, rather than leveraging the FileNet API, TSG built a COLD adapter that has the following benefits over other traditional conversion approaches.
- Native File Access – The COLD adapter does not require FileNet to access and convert COLD files. For both scenarios where FileNet is still running and contention might cause user performance issues or where FileNet is to be retired as part of a rolling migration, native file access allows for fulfilling both of these scenarios.
- Performance – Compared to traditional “launch a viewer and print to PDF”, by working at the file level, the TSG transformation utility can transform files much quicker to better support rolling migration and bulk migrations. This strategy brought migration speeds from an estimated 100 documents/hour to 100,000 documents/hour.
- Multi-Threaded Infrastructure – Rather than requiring multiple client machines or other complicated infrastructure, the adapter can take advantage of OpenMigrate’s multi-threaded capabilities to quickly process multiple COLD files at the same time from a single machine.
- Modern Format – FileNet’s COLD format is highly compressed. While PDF is a great universal standard for viewing documents, it also inflated the size of documents. TSG utilized JBIG2 compression drastically reduce the amount of document size inflation.
See a short video that shows off the COLD adapter
FileNet P8 Migration – Best Practices.
With the P8 migration, we approached it in the same way as Image Services, looking for an underlying way to convert the content without leveraging the P8 API. While we got close to cracking the P8 code, given the timing of the project we decided to leverage the API for this component of the migration. Given that P8 is a more modern platform than Image Services, the speed that we’re able to extract content from the P8 API was adequate for the client needs.
Since we were leveraging the API, content and meta-data were moved at the same time. One key finding was that, given the specific client scenario, common explanation of benefit documents were stored multiple times in the repository and were duplicates. Rather than storing these documents, we were able to take advantage of deduplication to reduce the migration time and document storage.
Specific lessons learned for deduplication included:
- Subsets of the system took advantage of FileNet P8’s deduplication functionality, which allows the system to store only one copy of a document, even if that exact document is uploaded by multiple users. TSG took advantage of the data stored in FileNet to preserve deduplication during the migration.
- When a document is deduplicated and is already migrated, TSG can speed up the migration by skipping calls to P8.
- During migration batches with high deduplication rates, migration speeds increased to 100 documents/second, compared to 50 documents/second using the API.
- Preserving deduplication reduced the number of pieces of content to migrate from over 530 million to 320 million.
Overall FileNet Migration Lessons Learned
- FileNet resource availability – FileNet, being a legacy system, will have fewer individuals over time that can support the system. Migration from FileNet becomes increasingly difficult if resources with knowledge of the system have limited availability or have moved on.
- Test Data – as a system that likely spanned decades, it is important to have representative test data that spans the life of the FileNet installation. If test data is incomplete or out of date, test with production data whenever possible.
- Migration planning and timing – as with any migration, understanding the data to be migrated is paramount. Using this understanding and planning migration activities, especially concerning FileNet usage and support is essential. For example, if FileNet is going out of support, migration activities that use the FileNet API should be prioritized while the system is still under support.
Legacy ECM systems like FileNet typically have vast amounts of data spanning many years, which can make them difficult to move. Delaying migrating from these aging systems puts core operating systems at increasing risk over time. Migrating off of a legacy ECM solution like FileNet to a modern repository that will scale to accommodate millions or billions of documents needs to be prioritized.
TSG leveraged both the FileNet API and native file access approaches, as well as file conversion to an open standard (PDF) within OpenMigrate to extract documents from a legacy ECM into a faster, modern solution.
Let us know your thoughts below.