• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

Documentum 6.5 Upgrade – Character Encoding Issues

You are here: Home / Documentum / D6 / Documentum 6.5 Upgrade – Character Encoding Issues

August 26, 2010

Special Note:  Anyone that is planning an upgrade from Documentum 5.3 to 6.5 should look closely at this note as some types of upgrades (clone or in-place) could result in content that was retrievable from 5.3 not being available in 6.5.

This post was developed based on recent work for a major pharmaceutical client.  The client, on Documentum 5.3, was developing a consumer interface application leveraging Lucene.  As we mentioned in a previous post, the client chose Lucene over FAST based on benchmarking results for over 150,000 documents.

Background

For the application, the client was leveraging OpenMigrate with DFC 6.5 to retrieve content and metadata for nearly 1,000,000 documents from their 5.3 docbase to be indexed in Lucene.  Per the product release notes, using DFC 6.5 to access a 5.3 repository is a supported configuration.  An issue was identified when around 5,000 documents failed to migrate.  In reviewing the error logs from OpenMigrate, the DFC call IDfSession.getObject() to retrieve documents from the repository resulted in errors.  After reviewing the stack trace, it was apparent that the error was being thrown from within the DFC code.  The team was surprised by the error since the documents were able to be retrieved without a problem using client applications working with a 5.3 DFC, such as Webtop and Samson.  The DFC error messages that were encountered are shown below:

[DFC_OBJPROTO_BAD_NUMBER_FORMAT] Invalid number format for string length in serialized object

[DFC_OBJPROTO_BAD_STRING_FORMAT] Unknown string format in serialized object

After some further investigation, the team noticed some similarities in many of the documents that were failing to migrate.  All of the documents contained metadata with special characters.  After duplicating the error in a development environment, the team removed the special characters from the metadata, retried the migration, and the documents were retrieved successfully with DFC 6.5.

DFC 6.5 and Character Encoding

Upon review with Documentum support, it was noted that DFC 6.5 enforces character encoding more strictly than DFC 5.3.  This explained why the documents could be retrieved successfully with 5.3 client applications but not with DFC 6.5.  The team wondered how these documents were ever stored in the repository with invalid character encoding.  Our best guess was:

  • The documents were moved into the repository as part of a migration effort that took place a long time ago.  Most likely the loose enforcement of character encoding by legacy versions of the DFC was the culprit.
  • Users may have set metadata values on documents by copying and pasting from other applications, such as Microsoft Word, that may have used a different character encoding.

Since the client wasn’t upgrading, only indexing the content in Lucene, the client decided to swap out the DFC 6.5 that OpenMigrate was using for DFC 5.3 in order to complete the migration.  Unfortunately using DFC 5.3 requires a more invasive installation process that the client was trying to avoid.  When the client upgrades to 6.5, the issues with the 5,000+ documents will be addressed.

Character Encoding and Affect on Upgrade

This particular client was fortunate enough to be able to “test drive” DFC 6.5 with their process that indexed to Lucene.  This migration uncovered an issue that would have been significantly more serious had the client upgraded their entire Documentum system to 6.5.  Had the upgrade been completed, users would not have been able to access these documents with the upgraded Documentum client tools such as Webtop, or any other custom applications utilizing the DFC.  Since the number of documents with the character encoding problem is relatively small in relation to the total number of documents in the system, they might have gone unnoticed during testing.  Because of the migration, the client is now able to come up with a proactive plan to rectify the issue prior to their full Documentum 6.5 upgrade.

Possible Resolutions

To identify the errors issues with existing data such as the character encoding problem described above prior to an upgrade, TSG would recommend several alternatives:

  1. Consider leveraging OpenMigrate or a similar application to “scan” your data with DFC 6.5 to determine if any encoding errors exist prior to the upgrade.
  2. During the upgrade, use OpenMigrate to migrate data into a clean repository instead of performing a typical in-place upgrade or dump and load.  Migrations are a great opportunity to “scrub” and validate existing data.  Because every document is touched during a migration, corrupt data can be more easily identified.
  3. Utilize database tools to help identify potential problems.  Oracle has a Character Set Scanner Utility that can scan an entire database to verify that all data stored in the database use the correct character encoding.

Check out TSG’s free Documentum upgrade planning guide for additional upgrade tips.

Filed Under: D6, D6.5, Documentum, Lucene, Migrations, OpenContent Management Suite, OpenMigrate, Product Suite, Tech Tip, Upgrades, Webtop

Reader Interactions

Comments

  1. Sorin Marinescu says

    December 15, 2010 at 10:05 am

    Hello,

    You suggest using OpenMigrate to “scan” the 5.3 repository with DFC 6.5 to determine if any encoding errors exist prior to the upgrade.

    I’ve been reading the documentation but I haven’t found a way to do that…

    Could you please offer more information about this?

    Thanks,
    Sorin.

    Reply
    • TParz says

      December 15, 2010 at 10:13 pm

      Sorin,

      We’ve actually recently packaged a pre-configured version of OpenMigrate to perform the metadata validation with DFC 6.5. It can be downloaded from TSG’s download site. You’ll find a link to download the Documentum Metadata Validator at the bottom of the page in the Useful Tools section.

      Tony

      Reply
      • Sorin Marinescu says

        December 16, 2010 at 2:52 am

        Hello Tony,

        Thank you for your prompt answer.
        I didn’t notice the Useful Tools section, you guys are great 🙂

        Regards,
        Sorin

        Reply

Trackbacks

  1. Documentum 6.6 Upgrade – Character Encoding Fail – Part II « TSG Blog says:
    November 10, 2010 at 2:09 pm

    […] is an update to the original article that was written in August.  While the post highlighted character encoding issues and DFC 6.5, we […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • Documentum – Top 12 Tips
  • Documentum Full Text Search with Lucene – Honoring ACL Security
  • TSG Open Source Product Plans
  • PDF Annotation Tools That Work Beyond Documentum 5.3
  • Documentum Search – Lucene, FAST, Verity, Google and upcoming DSS
  • Documentum Web Services – Documentum DFS and TSG OpenContent
  • Documentum Cross-Repository Searching – an integrated open source approach
  • Documentum Search – Why the Google appliance just doesn’t cut it
  • Documentum Workflow Manager, BPM, and Licensing
  • Documentum 6.6 Upgrade – Character Encoding Fail – Part II

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT