• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

FileNet Migration Findings

You are here: Home / FileNet / FileNet Migration Findings

August 30, 2016

Editor’s Note: This article was originally posted in August of 2010. We’ve been answering quite a few questions lately from businesses looking to migrate off of their older FileNet systems. Having been away from this area for a bit, I was looking back over some older blog entries I had written. I came across this write-up, which described a unique engagement where we ended up pulling some of the FileNet migration knowledge we had put INTO OpenMigrate, OUT OF OpenMigrate, and using it as the basis for a pretty complex—but successful—migration. Most of our projects do not require quite this much engineering, reverse-engineering, and nerdy problem-solving; but it’s nice to know we were up to the task when we needed to be.

I recently had the opportunity to work with a defense company who was looking to migrate data out of FileNet using our OpenMigrate solution. Compared to the other FileNet migrations we’ve done, at first this seemed much simpler, considering they only had 3 doc classes that they wanted to migrate, but pretty soon, we realized we had some challenges ahead of us.

Challenge One:  A Very, Very Old AIX Server

First we found that the FileNet server they were using was a very old AIX machine, and that the latest version of Java supported on that version of AIX was Java 1.1.  OpenMigrate (OM), on the other hand, had only been run on Java 1.4 and Java 1.5.  It would have been a stretch, but we could have tried updating OM to work with Java 1.3, but anything lower than that would probably not have worked since OM is built on the Spring framework.  What we decided to do instead was take OpenMigrate’s FileNet logic and execute the steps manually.

The first step was to execute queries against the database to extract out all the metadata for each doc class into Excel spreadsheets (or we could have used a database) for the client.   Here are a few things we encountered that are good to remember when querying records in FileNet:

  • The F_DOCNUMBER column is populated by FileNet in sequential order, so a higher doc number signifies a record that was created later in time.

 

  • FileNet date fields, like F_ENTRYDATE, are integer Julian-type dates, the number of days since 1970. So, in our case, we had to convert the entry dates to a readable date format to figure out how many documents were in a given year in order to assist with capacity planning.  All we did was add the value of F_ENTRYDATE to the date 1/1/1970, in days, to figure out what date it represented. For example, the date 1/1/2010 would be represented in the F_ENTRYDATE column as 14610.

 

  • Some FileNet servers store multi-page documents as one document and some store them as multiple documents, one per page. In our case, by looking at some sample data, we found that FileNet stored the documents as multiple documents, one per page, and merged them together when serving it up to the user for viewing. The number of pages in the document is recorded in the F_PAGES column. The one exception is that if a document only has one page, FileNet does not store a 1 in that column as you would expect.  Instead it will store a NULL value in that column for any document that only has 1 page.

After all metadata had been retrieved, we generated Korn shell scripts to run on the FileNet server and download the FileNet content from the database to the filesystem. The scripts leveraged the FileNet system tools to store each page of the document as a separate TIFF file in a folder specific to each document.  If we had been able to use OM, it would have taken care of creating these scripts, running them, and deleting them once they were completed.

Challenge Two: Annotations

Now that we had all the metadata and content extracted, we had one more hurdle to face: annotations. The last requirement was to extract out all the annotations that were on each of the documents so that they could be maintained when moving into the new system, in their case, OpenText. We had hoped that the annotation information, such as the text, width, height, angle, color, etc, would just be stored in different columns within the same Oracle database table as the records of the documents. However, we soon found they were stored separately from the documents table, and were actually stored in their own annotations table, in the FileNet MKF proprietary database. This was a step back for us since it meant we couldn’t just update our original queries to pull this annotation information.  Instead, we had to launch the MKF tool in order to query for the annotations.

We went ahead with using this tool, but found that most of the relevant information we needed in order to reproduce the annotation in the new system, was stored as a hex string that could be up to 800 characters long. It was definitely not an easy task, but after much trial and error, we were able to crack the hex code enough to at least get the text of the annotation out of the hex string, which after talking to the business, seemed to be enough. There wasn’t much of a need to replicate annotations such as arrows and highlights in the new system.  Here’s an example of what one of the hex values in the annotation table looked like:

1f00104119f15c8f01d011a87a00a0246922a504000200010100040190023a0b0004014c00130d000101100006ff000000ffff2400010
123000700020002ff0000250006417269616c002600050c0100000003002b6e65656473207375627061636b657220666f722065787065
6469746520656c6220313033303031202000002b00020000

From here, we found that this long hex string could be decoded into key/length/value combinations. The first 4 characters signify the key, or field identifier, the next 2 characters, once converted to decimal, represent the length of the value, and the next number of hex digits that match this length, once converted to ASCII, is the value of the field.

Our analysis did not take us deep enough to find out what all the fields in the hex string represent, but we did find that 0300 is the field identifier for the annotation’s text.  The other field that we know for sure is in the hex string is the name of the font that is used for the text. It seems that the font name and the annotation text are the only variable length fields in the hex string. I would imagine the rest of the hex string somehow contains the positioning of the annotation, colors, etc, but it would require some further analysis to really confirm that.

In the end, we delivered the content and metadata, plus annotations, which the business needed to get out of the stone age and into a more modern CMS.  While we didn’t actually run OpenMigrate, we were able to leverage its “knowledge” to get the job done.


Free FileNet Migration Case Study



Filed Under: FileNet, OpenMigrate, Tech Tip

Reader Interactions

Comments

  1. Susan Yang says

    June 3, 2011 at 2:29 am

    I have just the same request to get out content from Filenet image Services to local file system. As you mentioned that you use Korn shell scripts to download content from Database to file system and we have so much file(*.dat) to handle , could you tell us the speed of getting content ? thanks.

    Reply
    • Todd Pierzina says

      June 13, 2011 at 8:50 pm

      Hi Susan,

      We’ve had varying degrees of success, ranging from roughly 1 doc per second, up to 10 docs per second. It all depends on the power of the hardware and the network infrastructure. The lag is never with the database, but always with getting the content off the platters.

      We always run some benchmark migrations as early as possible to assist with the migration planning–that has proved quite valuable in the past. Not only can we get some timings down, but we can get the business users some of their images and metadata to look at in order to help define exactly what needs to be migrated and what can be bypassed.

      Please drop me a line, tpierzina@tsgrp.com, if you’d like to discuss this further.

      Reply
  2. Brian says

    October 21, 2016 at 7:40 am

    Is there a faster way to get annotations off the system ?
    Currently we are using the command to get a range of annotations

    select annotations * where doc_id >10,000,000 and doc_id < 20,000,000
    because of the '*' I believe it is doing a table scan.

    Is there a way to use a range of annot_key ?

    Thanks

    Reply

Trackbacks

  1. Migrating from FileNet P8 4.0 to Alfresco (reading MSAR surface.dat files ?)CopyQuery CopyQuery | Question & Answer Tool for your Technical Queries,CopyQuery, ejjuit, query, copyquery, copyquery.com, android doubt, ios question, sql query, sqlite quer says:
    November 15, 2013 at 5:05 am

    […] https://www.tsgrp.com/2010/08/24/filenet-migration-findings/ […]

    Reply
  2. FileNet and CMOD – Will IBM finally sell them and potentially buy Box? says:
    February 20, 2019 at 7:17 am

    […] of 2008.  In our own experience over the last three years, anytime TSG sees a FileNet customers looking to migrate, typically the FileNet implementation has had only minor maintenance upkeep in the last 10 years if […]

    Reply
  3. FileNet and Content Manager (CMOD) Migration to Amazon Web Services – Why Now? — Technology Services Group says:
    January 3, 2020 at 10:46 am

    […] readers know, TSG has been assisting clients with migrating from Legacy ECM platforms including FileNet, Documentum, OpenText and others for years.  Recently we have noticed an uptick in interest, RFPs, […]

    Reply
  4. FileNet Migration – Not as hard as you think? — Technology Services Group says:
    January 23, 2020 at 2:48 pm

    […] from old, legacy FileNet systems.  We started blogging back in 2010 and included updates in 2016, as well as our more recent success with over 4 billion documents in 2020.  While TSG has […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • FileNet Migration – Not as hard as you think?
  • FileNet Support – Migrating to mitigate the risk of an unsupportable product
  • FileNet Migrations – Best Practices for Large Migrations
  • File Formats Lessons Learned – Legacy ECM Migrations
  • FileNet COLD Migration – Cracking the proprietary format issue
  • FileNet Migration – Recorded Alfresco/TSG Webinar – 05/29/2019
  • FileNet – How to retire in weeks rather than months
  • Migrating to Alfresco – Reducing Risk, Stress and Cost with a Rolling Migration
  • Migrating FileNet with Daeja Annotations to AWS S3
  • FileNet Migration – Best Practices and Client Experience

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT