As clients are preparing for 2020, TSG has seen an uptick in the number of requests for OpenMigrate support for migrating from legacy repositories (FileNet, ImagePlus, Mobius, CMOD…..) to a new modern ECM 2.0 repository. TSG will typically recommend a “one step” migration where OpenMigrate can both retrieve documents from the legacy repository and store the document and metadata in the new repository. This approach provides many advantages a “two-step” approach where documents and metadata are first dumped to a file system to be uploaded later. This post will discuss the benefits of the one step approach to ensure a smooth transition to a modern ECM 2.0 repository.
Migration Overview – Migration Infrastructure
Many times clients will try to use existing export processes from the legacy system or bulk import tools from the new repository vendor with content and metadata being placed in a file system between the import and export activities. Concerns with this approach include:
- Limitations of file systems
- Issue resolution responsibility between the two different processes
- Issues with the “dump and load” approach versus delta migrations
- The impact of multiple file stores and retrieval on migration speed and accuracy
Rather than just a one time migration, in working with clients on migrations and ongoing ingestion, TSG recommends clients look for a migration infrastructure like OpenMigrate that can own responsibility for both the extract and import. Good migration infrastructure tools should also include:
- Ability to repeat the process for ongoing migration needs
- Ability to apply business logic throughout the migration process
- Ability to configure many different migrations quickly
- Ability to quickly address and retry documents/data that failed to migrate correctly
- Ability to repeat the process for different data sources
- Ability to provide accurate counts of documents migrated/failed for final decommissioning reports of the legacy system
While OpenMigrate can play a role in both one step and two step migrations, there are substantial advantages to the one step approach. The remainder of this post will focus on understanding how a one-step migration differs from a two-step migration for the above considerations.
Limitations of File Systems
With a two-step approach, a bulk download tool will export batches of files to a file system where all metadata about the document(s) are either stored in the file name or a separate data file format like CSV. From the batches and file system, OpenMigrate can read the files and load the documents and metadata into the new system. With an OpenMigrate one-step approach, the metadata, versions, renditions, and lifecycle values are read and mapped directly from the old legacy repository to the new system repository without any file system hand off. Specific issues with leveraging the file system as a stopping point between export and import functions include:
- Bulk download tools have limited ability to pull in external data from other systems and can only export data from the legacy repository limiting the ability to transform or store data in a new format.
- Storing every attribute into the correct place in the file system or naming the document correctly can be very difficult. Directory limitations to file naming and special characters can make the export problematic. One Step migrations do not have to be concerned about storing the file format as they are stored directly in the new ECM 2.0 repository.
- Versions, renditions, lifecycles, and custom attributes also need a place to be stored to allow OpenMigrate to populate these values correctly in the new target repository. CSV or other data types can get very complex and inflexible when unexpected data issues arise. One step migrations do not require the complicated version/lifecycle metadata mapping as the detail is stored directly in the ECM 2.0 repository.
- Disk space needs to be procured for the dump itself. For large repository migrations, procuring this temporary space can be expensive and difficult to manage as when to delete the documents from an export needs to be coordinated with the import success. One step migrations do not require large temporary space.
- Migration speeds are slower with a two step approach as documents need to be both stored and read from the file system. While OpenMigrate provides robust multi-threaded and new high speed ingestion for certain repositories, limiting factors can be the legacy repository itself. Waiting for files to be stored can limit the ability and speed of the migration.
- Performance issues often arise when using export utilities to dump large amounts of content and metadata from legacy systems. Export tools are often designed for smaller volumes and often can’t handle large batch sizes required for bulk migrations.
Issue Resolution Responsibility during Two-Step Migrations
One major concern with a two-step approach is problem solving and responsibility during a migration run. Regardless of sample testing, large migrations will often encounter document and metadata issues and anomalies that are unexpected due to the size and age of the legacy repository. In a two-step approach, the documents export could be unsuccessful and the import job would not know of the failure as the exported files wouldn’t exist and export jobs typically don’t always do the best job of reporting exceptions. Responsibility for correcting the issues, particularly if the export job itself fails, can be problematic, as it might require a code change in either the dump process or the import process, which would delay the migration.
Clients also struggle with coordinating the activities for document counts for the exporting and importing activities. Many times the export tool can have issues with querying for the counts of documents that were included in each export, and if there are errors during the exports, it is problematic to try to reconcile this with the separate reports for the import process.
When run in a one-step mode, OpenMigrate provides a complete error log where all failed documents are logged throughout the entire extraction and import process. OpenMigrate supports re-running the migration moving only those documents in the error log. In this manner, the issue can be quickly addressed by taking any or all of the following actions:
- Making a change to the document/metadata in the source system.
- Modifying the OpenMigrate mappings to correct the data issue for the failed documents.
- Re-running a small job to migrate only the failed documents again, allowing the bulk of the other documents to continue migrating.
Summary
Migrating from legacy repositories to new ECM 2.0 repositories can be daunting task. Leveraging legacy export tools to dump content can seem like a simple way to begin the migration but issues in regards to file mapping, file space, migration issue resolution and responsibility and complexity of the migration itself can make a dump and load more complex.
TSG recommends a one step migration where content is moved directly from legacy repository to the new repository. Advantages for this approach include:
- Faster Migrations by not relying on a file download and metadata mapping.
- No need for temporary file space for extracted documents and metadata.
- Simplicity by not having to manage versions, renditions and other document relationships in a dumped file format.
- Better documented migrations by having one tool, approach and responsibility for both document and metadata extraction and storage.
- Improved Issue Resolution by having one tool (and resources) responsible for extraction and storage.
- Faster performance by going directly to the source system via the underlying DB and moving the content in one shot rather than a separate export/import process
Let us know your thoughts below.
[…] for migrations. While you can find a lot of references here to migration best practices (One Step vs Two, File Formats Lessons, Migrating 11 Billion Documents) , we thought for this post we would be […]