TSG has been conducting a number of large migrations for clients (upwards of 1 billion documents) with our OpenMigrate software product. Alfresco offers the Bulk Import Tool as a free option for importing content into Alfresco. While OpenMigrate with its direct file linking option incorporates all of the performance components of the Alfresco Bulk Import Tool, many times clients want to understand all of the differences between the two options. This post will describe the differences between OpenMigrate and the Alfresco Bulk Import Tool for large file migrations.
Similarities Between Alfresco Bulk Import Tool and OpenMigrate
Both the Alfresco Bulk Import Tool and OpenMigrate support the concept of storing the file in the Alfresco content store and then “linking” the content location in Alfresco to dramatically improve content ingestion throughput. This content linking process is also referred to an “in-place” load. Content linking improves performance because it eliminates the need to stream content into Alfresco using the Alfresco API. As performance is very tied to environment, both tools have the capability to hit 200 documents/second. Alfresco recently reported a bulk load on AWS that achieved up to 500 documents/second.
OpenMigrate also supports direct content linking when migrating to Alfresco in order to optimize migration performance.
Key Differences between Alfresco Bulk Import Tool and OpenMigrate
To understand the key differentiators between the Alfresco Bulk Import Tool and OpenMigrate requires an understanding of how both tools were initially developed. OpenMigrate was first developed by TSG to assist clients with Documentum migration and upgrade efforts. As a migration tool rather than just an import tool, OpenMigrate contains both source and target components for not only extracting content and metadata from a variety of platforms (Documentum, Alfresco, FileNet, OpenText, SQL database…) but also target adapters for loading content into Alfresco, Documentum, Hadoop and Solr. The Alfresco Bulk Import Tool was built more recently as an Alfresco Community project to be a fast way to load documents into Alfresco. Based on their development histories, major differences between the two tools include:
- Source Adapters – OpenMigrate has a full suite of source adapters for extracting documents from other repositories, including Documentum, FileNet, OpenText, SQL database, file system, Hummingbird, XML, and others. Content can be exported to a filesystem or database or can be real-time migrated to an Alfresco repository. The Alfresco Bulk Import tool can only migrate content from a file system and has no other source adapter capabilities.
- Types of Migrations – Because of both the source and target adapters, OpenMigrate can support a variety of different migration scenarios, including big bang, delta, hybrid, and on-demand/rolling. See our Webinar with Alfresco on migrating from Documentum to Alfresco for a more detailed understanding. The Alfresco Bulk Import Tool only supports a big bang migration scenario.
- Folder Structure – Built to quickly import documents, the Alfresco Bulk Import Tool assumes the folder structure that the files are placed in prior to import will be the folder structure that the objects will be loaded into in Alfresco. OpenMigrate allows for remapping of the folder structure or to create the folder structure on the fly based on other metadata.
- Metadata – the Alfresco Bulk Import Tool can only pull metadata from a flat XML file (one per document to be imported) with very specific structure that must sit next to the content file. OpenMigrate can pull metadata from many different sources, including database tables, XML, Excel/CSV. OpenMigrate can also perform transformations on the metadata using its mapping layer.
- Object Store – the Alfresco Bulk Import Tool can only do “in-place” migrations for content that’s in a filesystem-based Alfresco content store. OpenMigrate can do in-place migrations for content that is in an Alfresco filesystem-based store as well as other content store types, like S3 and Hitachi.
- Fault Tolerance – the Alfresco Bulk Import Tool stops if a failure occurs at any point during the import process. The problem has to be fixed and then the bulk load must be run again. OpenMigrate tracks migrations failures but continues to run until all documents have been migrated, tracking any migration errors in database tables or log files.
- Logging – OpenMigrate has a more sophisticated logging mechanisms, including the ability to log to CSV and/or database table. The Alfresco Bulk Import Tool provides only minimal logging to via Log4J.
- Contentless Objects – OpenMigrate supports the migration of contentless objects, something not supported by the Alfresco Bulk Import Tool.
- Version Numbering – the Alfresco Bulk Import Tool only supports major versioning (1.0, 2.0, 3.0) when migrating multiple versions. OpenMigrate supports migration any combination of major/minor versions (1.0, 1.1, 2.0, 3.0, 3.1) and can be customized to version documents that already exist, creating the version tree from initial to final. See our latest post on TSG Chain Versioning for Alfresco in regards to additional capabilities that will affect migration speed.
- Server Requirements – the Alfresco Bulk Import Tool must run directly within the Alfresco JVM. OpenMigrate can be run either as an embedded subsystem in the Alfresco JVM, or externally on a remote JVM using CMIS.
- Renditions – the Alfresco Bulk Import Tool relies on the Alfresco Transformation server to create PDF renditions of the migrated content. By default, the Alfresco Transformation server converts documents synchronously during the migration, slowing migration times considerably. TSG has created an add-on to OpenMigrate to move the Alfresco transformations to an asynchronous process to avoid a migration slow-down.
Summary
The Alfresco Bulk Import Tool is a simple and efficient tool for moving documents from a file system into a simple structure in Alfresco. OpenMigrate provides additional features and supports a variety of migration sources and scenarios appropriate for more complex migrations.
Let us know your thoughts below.
[…] before the full production migration. For example, as we pointed out earlier this year, Alfresco’s Bulk Import does not have an error logging capability or the ability to continue the migration if an error is […]