• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

File Formats Lessons Learned – Legacy ECM Migrations

You are here: Home / FileNet / File Formats Lessons Learned – Legacy ECM Migrations

July 11, 2019

As we presented yesterday, one of the bigger issues with FileNet Migrations was clients that have leveraged the COLD format and how TSG has come up with our own adapter to help clients move away from FileNet COLD to a less proprietary PDF format.  The discussion got us thinking about what other formats need special attention during legacy ECM migrations.  This post will discuss some of the various formats and our lessons learned for migrations.

TIFF – Tagged Image File Format

For our legacy ECM clients, many imaging systems store documents in the image format of TIFF.  TIFF became popular with the popularity of facsimile (fax) and can either be a TIFF-3 or TIFF-4 format.  TIFF is a very popular format supporting scanning, faxing, word processing and optical character recognition, as well as other processes.  We typically recommend converting TIFF images to the PDF Image format when migrating to modern ECM systems in order to support in-browser viewing of documents.  Some specific lessons learned about TIFF to PDF conversion include:

  • Image Libraries – There are lots of high performance libraries available for this process.  While ImageMagick is a common image conversion manipulation tool, TSG typically recommends iText for TIFF to PDF conversion for speed and performance reasons.  While many of the transformation tools (like Alfresco’s) can rendition many different formats, for our OpenMigrate product, we have chosen specific libraries in order to optimize migration throughput for high volume migrations.
  • Single Page TIFF to Multi Page PDF – FileNet Image Services stores every page of a document as a separate TIFF file, adding bloat to the repository.  This was helpful 25 years ago when these systems were originally implemented and network bandwidth was at a premium.  It allowed documents to be served to the viewer page-by-page, rather than forcing a download of the entire document.  Now that bandwidth is much less of a concern, we typically recommend combining the single page TIFF files into one multi-page PDF during migration.
  • Proprietary TIFF – FileNet will have special image information in the header and footer of the TIFF documents when they’re stored on disk.  OpenMigrate removes the proprietary headers and footers as part of the migration.
  • Watermarks – When converting to PDF, there is an opportunity to add overlays on the documents.  It’s also possible to “burn in” certain metadata into the document, such as created date, author, etc.  OpenMigrate can add these overlays during migration or OpenOverlay can add them when viewing the documents/images in the new ECM system.
  • Optical Character Recognition – The majority of OCR tools work with TIFF.  If there is useful textual data contained within the document, the migration might be an opportunity to convert the TIFF file to not just PDF Image format, but to a PDF Text document that allows for full-text searching within many ECM systems.

Microsoft Office (Word, Excel and PowerPoint)

For the majority of legacy ECM migrations, TSG will recommend migrating Word documents in their native format.  Sometimes clients will look to update the format, but this is rare.  While the migration will typically be standard, many clients will use the introduction of the new ECM system to introduce new Word or other templates to address the “Create New Document” requirement.

One issue for large migrations is the renditions, typically PDF.  Should the migration move the renditions from the legacy system or leverage the PDF renditioning capabilities of the new system to create new renditions of the Office documents?  TSG typically recommends generating PDF renditions for viewing Office documents as it is the best way to quickly view a document without having to launch Office or give the user the impression that they can edit the document.  The PDF format also has benefits in that annotations can be stored in separate XFDF files and merged into the PDF viewing or PDF document itself at a later time.  TSG leverages this capability for our OpenAnnotate PDF annotations.

Deciding to migration renditions versus generate all new renditions should be decided based on requirements as well as timings.  Migrating PDF renditions from the legacy system will require more migration effort but considerably less transformations in the new system.  Transforming in the new system will result in consistency in the PDF renditions but could overwhelm the transformation servers during a large migration effort.  TSG would recommend each client weigh their own requirements and consider many options.

Video and Audio File Formats

There are lots of proprietary formats in video.  TSG typically recommends migrating all formats as native files but consider creating MP4 or MP3 renditions as part of the migration process for listening/watching in more open file formats in-browser without the need for downloading plugins or players. 

Image File Formats

Similar to audio and video formats, there are a number of different formats for storing images; GIF, JPEG, PNG, and BMP just to name a few.  TSG recommends migrating images from legacy systems in their native formats in order to preserve the original files in an unaltered state.

TSG also recommends generating renditions for image files if the native formats cannot be viewed in-browser without plugins.  Image renditions can be generated by OpenMigrate as part of the migration from the legacy system, or as a separate process after the migration is complete.

Some user interfaces, like TSG’s OpenContent Management Suite, offer thumbnail views that provide an enhanced user experience in systems where large amounts of images are processed. Similar to other image renditions, thumbnails can also be generated using OpenMigrate as part of the migration from the legacy system, or as a side process.

COLD File Formats

COLD or “Computer Output to Laser Disc” are typically formats that are “printed” to documents directly from another system.  See our previous post for addressing the specific FileNet COLD migration format.  As each COLD format tends to be unique and require a specific viewer, TSG recommends all COLD formats be converted to PDF to provide support from a variety of different viewers and add annotation capabilities.

Annotations Formats

Along with legacy migrations comes the question of what to do with legacy viewers and annotations.  Many legacy ECM systems leverage annotations tools that store annotations in proprietary formats. Just like when moving the files, TSG recommends converting these annotations to the XFDF standard format to support the PDF transformations described above.  See our related posts about FileNet Annotations as well as Daeja Annotations.

PDF or PDF/A

PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features unsuitable for long-term archiving, such as font linking (as opposed to font embedding) and encryption.  The ISO requirements for PDF/A file viewers include color management guidelines, support for embedded fonts, and a user interface for reading embedded annotations. Clients should consider leveraging PDF/A when migrating documents that are purely archival but should consider the impacts of a larger document format caused by the embedded fonts.

Summary

Migrating from legacy ECM systems should also involve migrating from legacy file formats to more modern and less proprietary formats. TSG has experience with a large number of formats and would recommend clients carefully address the upgrading of the formats along with their legacy ECM system migration.

Filed Under: FileNet, Migrations, OpenMigrate

Reader Interactions

Trackbacks

  1. Migrations – Why do they fail? (12 Worst Practices) — Technology Services Group says:
    January 15, 2020 at 8:22 pm

    […] While you can find a lot of references here to migration best practices (One Step vs Two, File Formats Lessons, Migrating 11 Billion Documents) , we thought for this post we would be slightly more aggressive […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • FileNet Migration – Not as hard as you think?
  • FileNet Support – Migrating to mitigate the risk of an unsupportable product
  • FileNet Migrations – Best Practices for Large Migrations
  • FileNet COLD Migration – Cracking the proprietary format issue
  • FileNet Migration – Recorded Alfresco/TSG Webinar – 05/29/2019
  • FileNet – How to retire in weeks rather than months
  • Migrating to Alfresco – Reducing Risk, Stress and Cost with a Rolling Migration
  • FileNet Migration – Best Practices and Client Experience
  • FileNet Migration to Alfresco with OpenMigrate
  • Alfresco Webinar – Benefits of a Rolling Migration versus a Big Bang – October 21st

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT