TSG recently completed the movement from an onsite Documentum installation to Documentum in the OpenText cloud for one of our long-time clients in a highly regulated industry. The client had considered Documentum SAAS from Dell VMWare cloud back when VMWare was part of EMC. This also wasn’t the newly announced OpenText partnership with Google or the SAAS Documentum LEAP product set but a more traditional movement of existing applications (Webtop, Documentum Server) to dedicated servers in the OpenText environment as part of a SAAS agreement with OpenText. This post will share some of the background, issues we faced as well as our lessons learned.
Documentum Installation Background
Our client is a long-time Documentum customer that uses our OpenContent Form and Workflow solution (formerly called the Active Wizard) for the control and change management of plant documents (mostly SOPs). The client has some complex change control and workflow requirements including the ability to turn around change requests very quickly (often in less than an hour) as well as queue up change requests for execution during a down period. The client used Documentum webtop for most new document creation but OCMS Form and Workflow was exclusively used for change control for existing documents.
Documentum SAAS – Simple Plan – Complex Execution
The plan for moving to the OpenText cloud was fairly simple. Leave OCMS/Active Wizard running on the current internal client application server and point to the new Documentum location in the OpenText cloud. While it sounds fairly simple, some complexities that evolved included:
- Legacy Code – the version of the Active Wizard running for the client was 10 years old and is currently scheduled for an upgrade after the migration to the OpenText cloud. As such, the older version of the Active Wizard was built on our early version of OpenContent that was SOAP based (now REST) back in the day. This made debugging difficult as well as some difficulties re-compiling on the latest DFC and Java versions.
- Upgraded Content Server – in addition to the move to the OpenText Cloud, the underline repository was upgraded introducing another source of potential issues and complexities.
- Oracle versus SQL Server – while it might make sense to say that Documentum was lifted and shifted to the cloud, the hosted version by OpenText leverages SQL Server instead of Oracle. For those experienced Documentum architects, many know that not all databases are created equal.
- Copy versus Migration – Given the change of database, the team was required to migrate documents rather than just copy files and database components. Adding to the complexity, the extended team decided to combine repositories as part of the migration effort. Any changed r_object_id’s would have downstream effects making the migration complex as well.
- WebSphere to Weblogic – On Premise servers were migrated as part of a corporate initiative. Combined with all the other moving parts, this change had one of the largest impacts creating many issues with Jar dependency issues, issues with SOAP requests, and load balancing configuration problems.
- Dev Environment – When building the new development environment, the decision was made to copy from the existing Dev environment which (as many Dev environments are) was neglected and non-functional. This presented further complications when it came to testing the applications with the new content server in the cloud.
Documentum SAAS – Significant Issues
During this project, some of the issues we encountered included:
- Significant Scope Creep – while the initial effort focused on a “just repoint the servers”. In addition to the complexities described above, the removal of all direct database connections from our applications, removal of super user access, and re-indexing of portal caches introduced complexity to the effort that required additional scope.
- Poor Testing Environment and Methodology – Given assumptions around a simple “lift and shift” approach, the testing lacked the typical new system development where test data, user requirements and iterative testing were leveraged to make sure the system worked as required. We found the Documentum/OpenText testing more adhoc and relied more on the user than any automated process or consistent testing approach. Given later performance issues, the inability of the team to replicate realistic volumes in User Acceptance hurt the team’s ability to test performance improvements in anything else than production.
- Poor Debugging Methodology – Early on, we noticed a leaning of the team, and particularly the OpenText team, to “blame TSG’s code” rather than look deeper for other environment issues. While TSG can be trapped as engineers to look internally to change code and queries, a debugging philosophy that doesn’t take environment, database, network and other factors into play to only look at code is incomplete. For one major issue where the system was unresponsive for workflow queries for 14 hours, after considerable discussion about code and other changes, the core issue turned out to be an unapproved database change that, once it was backed out, corrected the issue.
- Limited Documentum Server and Database access – Unlike typical Documentum installations, in the OpenText SAAS model, clients do not have visibility to the Documentum servers and database or Superuser privileges making it very difficult to monitor any performance outside of the end user response time from the browser. Add the combining of repositories (and user bases) and understanding why performance issues were occurring was very difficult. For our jobs re-indexing content to a portal, initial jobs were running at about 80 records per minute, but several hours into the job, migration activities slowed to 15 records per minute. Without access to the servers, it was difficult if not impossible for us to determine where the slowdown was occurring.
- Inconsistent Data – TSG was not responsible for the repository copy and migration for the project. In each environment, we encountered several issues that traced back to something not copied correctly or at all from the source repository. This included ACLs, User Accounts, Group memberships, and even Lifecycles not copied correctly.
- Inconsistent Architecture – Generally, it is best practice to have all environments (Dev, Quality, Production) as close to identical as possible to eliminate issues that are only reproducible in one environment. With this project, the differences were great enough to cause problems. Some examples include differences in server OS and load balancer in Quality and Prod but not in Dev.
- Too many Cooks – One area we struggle with was the division of responsibilities within the team. Typically TSG is responsible for our applications performance and can monitor/tune/adjust Documentum components as required. For the SAAS environment and given limited access, multiple resources from the client, TSG and OpenText needed to be involved leading to frustration of the client as well as TSG and OpenText. For the outage we described earlier, at least 14 different resources were on a call with considerable pointing of blame before the DBA entered the call and reversed the change to solve the issue.
Documentum SAAS – Lessons Learned
In hindsight, some of our lessons learned included:
- Simple is better than complex. As described above, in addition to moving to the OpenText Cloud, the project introduced an upgrade, change of the on prem application server (and physical servers), code updates, and a merging of repositories. Rolling these changes out sequentially would greatly reduce the risk and greatly improve issue resolution.
- Service Level Agreements (SLAs) should not replace server access – TSG and our client fell into the trap that we didn’t need server monitoring access as “the service level agreement states that OpenText will….” to justify why we didn’t need to monitor the back-end components. Regardless of SLA’s, we would advise clients to have access to at the very least direct database access (with a tool such as Toad or DBVisualizer) but ideally access to the database server itself. Having this will help understand performance and root cause analysis with facts rather than relying on having to communicate to SAAS personnel for anecdotal information.
- UAT Testing and Volume Testing – Particularly with the database change, we should have pushed for more consistent UAT testing and volume testing to understand performance and user load given the differences in the repository volume as well as the size of the repository and database.
- Network Monitoring – since SAAS offerings require packets to travel a much further distance, network performance and have a large impact on system performance. Having network administrators available to help trace packets and optimize performance can make a big difference.
Clients considering moving Documentum to the OpenText cloud should consider all of the complexities that might arise rather than thinking it is a simple lift and shift that saves on internal hosting costs. Issues of monitoring, SLA’s, testing, new databases as well as combining repositories all introduce complexities that might turn what sounds simple into complex. As we pointed out in our ECM Sales Myths post, SAAS is not always better than alternatives and will vary widely between SAAS providers. We would recommend clients also consider infrastructure as a service (IAAS) providers like Amazon Web Services as an alternative to the OpenText SAAS model for those clients that want to save the hosting cost but still want control of their Documentum environment. See our related article on hosting Documentum with AWS.