• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

ECM Large Repositories – Volume Testing With the TSG Test Harness

You are here: Home / Alfresco / ECM Large Repositories – Volume Testing With the TSG Test Harness

July 23, 2019

One of the more interesting achievements of the TSG 11 Billion Document benchmark was our ability to quickly load a large repository and test performance for search and retrieval performance as well as concurrent usage.  For those clients considering developing a large repository of documents, TSG now has both the experience as well as additional tools to help conduct a significant volume test at scale in a very short timeframe.  This post will describe how our tools can be leveraged for large repository testing for DynamoDB, Alfresco, Documentum, Hadoop or other repositories.

Volume Testing – What are the issues?

Volume testing ECM solutions is difficult and often not worth the money and effort spent.  Often times a volume test is difficult and expensive to set up, not representative of the actual production usage and can delay a project.  In working with clients on testing, we often identify three types of issues that volume and performance testing or other types of testing can reveal.

  • Type 1 Issue – this is an issue that the test identifies that would have affected production and can be fixed and resolved before production.  This is the type of issue a volume or performance test was designed to catch.
  • Type 2 Issue – this is an issue that the test identifies that would not have affected production but the team spends effort and time fixing. 
  • Type 3 Issue – this is an issue that the test does not identify but will affect production.  The team will have to address this issue quickly in production and will need the resources available to limit production issues and perceptions.

No matter how much testing is conducted, Type 3 issues will always exist. For an example of these types of issue, see our issue from one large client with Hazelcast and Alfresco where the issue was almost identified as a Type 1 Issue in testing but wasn’t fully revealed until production making it a Type 3 Issue as users came on to the system.

Regardless of the testing, we have found that all types of issues arise for ECM solutions as the production usage is so difficult to accurately predict and replicate in the testing environment.  TSG advises clients prepare for Type 1, 2 and 3 type issues for any large production deployment.

Other issues with Volume testing large repositories (100’s of millions or billions) include:

  • Finding Representative Data – Often the large repository will be loaded from a legacy ECM system (FileNet, ImagePlus…..).  The actual migration of the data might take considerable time given mapping, volume, retrieval, clean-up and other migration activities.  Production data can also be highly confidential and require special security not consistent with building out a quick performance test.
  • Production Environment Availability – Ingesting 100’s of millions or even billions of documents is time consuming and would typically require a large production environment.
  • Loading the Documents – Typical migration or bulk ingestion jobs require the jobs to actually load the documents.  Leveraging ECM APIs can be slow particularly when moving documents from current location, through application server to the eventual storage.

Given the above, TSG recommends conducting quick and efficient testing where possible and is looking to provide tools and services to assist clients in this endeavor.

TSG Benchmark Test Harness

With the DynamoDB Benchmark, TSG has developed a Test Harness with AWS and the TSG products (OpenMigrate and OpenContent) to allow clients to spin up and load large volume scenarios on AWS very quickly to volume and performance test their solution.  Components of the test harness include:

  • Sample Data – TSG has curated 11 Billion unique addresses that we can use to load representative document models.  The test data can be manipulated to populate document fields and keep the values unique without exposing production client values.
  • Loading of Data – TSG leveraged OpenMigrate to load both DynamoDB and Elasticsearch in AWS.  TSG could also leverage these approaches for Hadoop, Alfresco or Documentum. 
  • Linking of Documents – For all clients, the performance test is focused on testing the meta-data repository for search and retrieval and not the actual retrieval of a document.  The test harness can link to content without the delay of ingesting the content through the API.
  • Concurrent User Testing – TSG built jmeter test plans running them concurrently on AWS EC2 instances. Each test plan was built to replicate users performing case management actions against the claim data sets loaded in the 11 Billion benchmark.
  • Amazon Web Services – TSG’s partnership with AWS smooths the way to simulate massive scale for clients without having to procure production on-premise or within their cloud environments.  For our benchmark, we were able to procure a 96 CPU environment that could process 20,000 documents/second.

Leveraging our experience and the Test Harness, TSG can simulate production volumes and retrieval patterns with AWS quickly and safely without delaying the main development and migration activities.

Summary

Volume testing large ECM repositories can be difficult, time consuming and expensive and often not worth the effort as simulating production usage of ECM doesn’t always catch issues that arise based on actual production usage patterns.  For clients that are looking to quickly simulate a production environment, TSG can leverage our experience and tools from our 11 Billion Documentum benchmark to quickly simulate large volumes on AWS to avoid the delay and costs of a typical on-premise volume test.

Filed Under: Alfresco, Amazon, Documentum, ECM Solutions, Hadoop, Performance Tuning, Testing / Validation

Reader Interactions

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • ECM 2.0 – Can you build it yourself?
  • ECM 2.0 – What does it mean?
  • Redaction for AWS, Alfresco, Documentum and Hadoop – Bulk Redaction upon Ingestion or Migration
  • Case Analytics for Insurance
  • Redaction for AWS, Alfresco, Documentum and Hadoop – Folder Case Redaction
  • Redacting Roadmap – User Scenarios
  • Suggested Redactions for Documentum, Alfresco or Hadoop using OpenRedact
  • Top 3 Reasons to Publish Content out of an ECM
  • 3 Questions – What is a Consumer Portal?
  • Top 5 Differences between Records Management and Document Management

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT