This week we are urgently reminding clients, as part of their upgrade evaluation, to look seriously for character encoding issues in regards to their current Documentum content and the affect on upgrades.
This is an update to the original article that was written in August. While the post highlighted character encoding issues and DFC 6.5, we are not quite sure readers fully realized the impact to their upgrade efforts.
There are two scenarios that could result in bad characters in the docbase
- Over time, users will “cut and paste” from Word or other applications into Webtop fields, Custom applications or other Documentum interfaces. Within the Browser (Internet Explorer, Firefox, Netscape for the old timers) the character string will look fine, but in reality, the field could contain “special charcters” that end up being passed through to the database.
- Migration efforts from previous upgrades/consolidations resulted in character encoding issues that were not identified.
Documentum, before version 6.5, allowed storage and retrieval of these characters without an error. As noted in the previous post, version 6.5 of the DFC does not support these formats and will throw errors on regtrieval such as
[DFC_OBJPROTO_BAD_NUMBER_FORMAT] Invalid number format for string length in serialized object
[DFC_OBJPROTO_BAD_STRING_FORMAT] Unknown string format in serialized object
Why is this a Big Deal?
The potential critical issues for clients would be
- An upgrade to 6.5/6.6 (either Migration, DB Clone or Upgrade in Place) that leaves these characters in the database.
- Any 6.5 interface (Webtop, xCP) throws an error when it tries to retrieve content with character encoding issues.
- xPlore will index (but very slowly) any content with Character Encoding Issues.
The tough part – garbage in/garbage out – the thought would be to clean up all the meta-data before either the upgrade or the use of DFC 6.5 or 6.6.
We should point out that we have only seen this issue for Oracle. We cannot either verify or deny that SQL Server clients would have the same issues.
Consistent with the previous post – we recommend the following:
- Consider leveraging OpenMigrate or a similar application to “scan” your data with DFC 6.5 to determine if any encoding errors exist prior to the upgrade. DFC 6.5 is compatible with 5.3
- During the upgrade, use OpenMigrate to migrate data into a clean repository instead of performing a typical in-place upgrade or dump and load. Migrations are a great opportunity to “scrub” and validate existing data. Because every document is touched during a migration, corrupt data can be more easily identified. We are working on adding a character encoding check for typical errors.
- Utilize database tools to help identify potential problems. Oracle has a Character Set Scanner Utility (CSSCAN) that can scan an entire database to verify that all data stored in the database use the correct character encoding.
As one last push – we are reaching out to Documentum to ask the simple question – “Hey – why not return the string with the bad character encoding rather than throwing the error – consistent with what pre-DFC 6.5 did?” . Given DFC eventually going away for DFS – it is worth asking.
Please comment below with any thoughts….