• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

DynamoDB 11 Billion Document Benchmark – Summary of Postings

You are here: Home / Amazon / DynamoDB 11 Billion Document Benchmark – Summary of Postings

June 24, 2019

TSG initiated our 11 Billion Document DynamoDB benchmark on Friday, May 10th 2019 and ended all of our testing activities on June 20th 2019.  The benchmark was an unbelievable success with our team learning many lessons in regards to scaling AWS, DynamoDB, Elasticsearch and our OpenContent and OpenMigrate products.  This post will present a summary of the benchmark activities and lessons learned.

 DynamoDB Benchamark – What were the objectives?

TSG initially announced our development efforts for creating an ECM offering for DynamoDB back in October, 2018.  Based on the success of our Hadoop offering (also a NoSQL approach), developing a product for DynamoDB was greatly simplified with most of our development and testing efforts completed in a couple of months.  While we have already had some success with a multiple Hadoop clients, our team thought an internal benchmark partnering with Amazon would show off the true power and massive scale potential of DynamoDB and AWS. Our goal was to simulate all the components of a massively large repository to verify that our tools and approaches can scale for our large volume, case management clients.  The benchmark focused on typical requirements for massive large volume clients like our health and insurance claim repositories but also included accounts payable and human resource scenarios.

To read more about the scope, environment and test data creation, view our initial post at https://tsgrp.wpengine.com/2019/05/13/dynamodb-amazon-web-services-11-billion-document-benchmark/

DynamoDB Benchmark – Phase 1 – Migration

The first phase of the benchmark was aimed at building a large repository with our OpenMigrate ingestion tool and proving access for OpenContent Search, OpenContent Case and OpenAnnotate.  The initial ingestion phase concluded on May 17th with 11 Billion documents and ingestion speeds of 20,000 documents per second to DynamoDB and related folders indexed into Elasticsearch. 

We posted daily during the migration run.  For additional detail, view the following posts and videos:

  • DymanoDB – Repository Walkthrough
  • DynamoDB Document and Folder Details
  • DynamoDB – AWS Walk Through
  • DynamoDB – Ingestion Success!!! – Lessons Learned

For our development community, we also posted on a How to build your own ECM capabilities for massive scale and performance that offered a background on how to simplify and build capabilities from the bottom up rather than the top down approach of typical legacy ECM vendors.

DynamoDB Benchmark – Phase 2 – Building Additional Search Indices

The second phase of benchmark focused on building Elasticsearch indices as required for document search for the documents already in DynamoDB which successfully ended June 11th.  While the initial migration included Elasticsearch index for all of the 925,837,980 folders in the repository, we wanted to show the ECM 2.0 approach of creating indices for specific scenarios rather than one massive search index for the entire repository.  For this test we created a quick million document index for just accounts payable in about 33 minutes.  Lots of good lessons learned about AWS Lamda, DynamoDB streams and differences between scaling DynamoDB and Elasticsearch.

DynamoDB Benchmark – Phase 3 – Adding Documents

The third phase of the project focused on user addition of documents and finished on June 12. 

One of the major lessons learned was more knowledge of when to use Elasticsearch for case document viewing versus DynamoDB.  The team ended up deciding that DynamoDB probably makes more sense for smaller case folder scenarios but large document case folders might drive Elasticsearch.  OpenContent has been developed to allow either approach so clients can decide on their own which scenario works better.

DynamoDB Benchmark – Phase 4 – 11,000 Concurrent Users

The fourth and last phase of the benchmark focused on concurrent testing of user volumes and finished on June 20th. 

Goals of the concurrent user test were to replicate some of the different issues we have seen from clients in production when a large number of document management users are accessing the system. 

The benchmark test was patterned after the most common use case for our insurance clients – Claim Viewing.   For the test we ran a batch of 11,000 users performing the claim viewing scenario across a selection of medical and auto claims.  

We had lots of tuning and performance around our Jmeter testing tool, interface, DynamoDB and Elasticsearch.  Each of the points were resolved and retested in subsequent test runs.

Summary

In TSG’s “cranking it up to eleven” billion document benchmark, TSG has been able to prove the scalability and benefits of both Amazon and a NoSQL approach with DynbamoDB over traditional document management solutions based on relational database approaches.

During the benchmark, we have received some feedback on “why 11 Billion documents and 11 thousand concurrent users”? In deciding on the size of the benchmark, we wanted to exceed the numbers we have seen at clients (7 billion for one prospect) by a large margin. Compared to some of the other, older billion document benchmarks conducted by the ECM software vendors in the past, this benchmark tested all of the scenarios required by our large volume clients.

Combined with our rolling migration approach, TSG now has an extensive amount of experience and solutions to move large clients to alternative solutions with our products and people. Some examples of ways the benchmark is currently influencing our current clients this last week include:

  • As a test harness, we are proposing leveraging the migration approach and test data to quickly scale up client’s repositories to production volume before moving real data to test infrastructure and performance.
  • In our design, we are able to combine lessons learned on creating indexes and alternatives leveraging NoSQL to provide multiple alternatives for typical search scenarios.
  • We are saving the repository in S3 to allow clients to quickly restore and test scenarios against the repository.

Thanks again to everyone that helped us in the benchmark particularly Amazon Web Services, Deep Analysis and Doculabs. Download the whitepaper with Alan Pelz-Sharp from Deep analysis and let us know your thoughts below.

Filed Under: Amazon, DynamoDB, ECM Landscape, OpenContent Management Suite

Reader Interactions

Trackbacks

  1. The Deep Analysis Podcast – The 11 Billion File Benchmark says:
    June 25, 2019 at 9:52 pm

    […] in the next two to three weeks we’ll be done. (Editor note – we did finish – see Summary of Posts) We just finished the indexing of a billion documents in Elasticsearch for the Accounts Payable […]

    Reply
  2. DynamoDB and Hadoop/HBase as a Document Store – How Key Design can be used to reduce index requirements says:
    July 31, 2019 at 3:49 pm

    […] scale in mind, so storing and retrieving documents in a billion plus object repositories. In our 11 billion document benchmark, we noticed no performance degradation when we went from 1 million documents in the repository to […]

    Reply
  3. DynamoDB Benchmark – Building an 11 Billion Document DR Process says:
    July 31, 2019 at 3:50 pm

    […] 2019, Technology Services Group completed an unprecedented 11 billion document benchmark leveraging Amazon Web Services and specifically DynamoDB and Elasticsearch.  As with any of our enterprise class solutions, we didn’t view the benchmark as […]

    Reply
  4. ECM large repositories – Volume testing with the TSG test harness says:
    August 28, 2019 at 8:09 am

    […] of the more interesting achievements of the TSG 11 Billion Document benchmark was our ability to quickly load a large repository and test performance for search and retrieval […]

    Reply
  5. ECM 2.0 – Cloud for Metadata and On Premise Document/Object Storage says:
    October 28, 2019 at 12:47 pm

    […] Amazon Web Services – TSG recommends DynamoDB.  See our results from our 11 billion document benchmark. […]

    Reply
  6. Elastic Services for ECM – TSG OpenContent Roadmap — Technology Services Group says:
    February 2, 2020 at 10:00 am

    […] has seen increased interest in our Elasticsearch capabilities, particularly as it related to our 11 Billion Document Benchmark on AWS.  As part of our product roadmap this quarter, TSG is announcing the formalization of […]

    Reply
  7. Alfresco Acquires Technology Services Group — Technology Services Group says:
    March 17, 2020 at 2:56 am

    […] Founded in 1996 and an Alfresco partner since 2006, TSG employs 45 full-time software engineers, industry specialists, and solutions experts. In addition to content management services, TSG provides no-code interfaces for case management, annotation, as well as additional massive scale options that leverage NoSQL approaches.  […]

    Reply
  8. TSG joins the Alfresco Family — Technology Services Group says:
    March 23, 2020 at 8:00 am

    […] pursuing NoSQL offerings.   TSG has been recently recognized by the market after our 11 billion document benchmark conducted in 2019.  TSG has multiple successful clients that are benefiting from the benefits of NoSQL including […]

    Reply
  9. Content Service Platform Scaling - How Good Key Design and NoSQL can avoid the need for Elastic/Solr or other indexes — Technology Services Group says:
    April 15, 2020 at 4:16 pm

    […] Our most recent FileNet Migration had 4 Billion Documents and in 2019 TSG successfully benchmarked 11 Billion documents internally on DynamoDB. This post will highlight the ‘key’ design and access patterns that we see in many of our large […]

    Reply
  10. 11 Billion Documents, 12 Months Later - Thoughts and best practices 1 year after our industry leading document benchmark. — Technology Services Group says:
    May 21, 2020 at 8:26 am

    […] 2019 and ended all of our testing activities on June 20th 2019 and documented our findings in our DynamoDB 11 Billion Document Migration – Summary of Findings post.  The benchmark was an unbelievable success with our team learning many lessons in […]

    Reply
  11. Amazon S3 - Viewing content fast and securely in-browser with the Alfresco Enterprise Viewer — Technology Services Group says:
    July 9, 2020 at 10:59 am

    […] have been supporting AWS users for years and due to our prior work with S3 for our DynamoDB benchmark, the integration for a standalone S3 viewer was an easy next […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • ECM 2.0 – Can you build it yourself?
  • DynamoDB 11 Billion Benchmark 11 Thousand Concurrent Users Success!!! – Lessons Learned
  • DynamoDB 11 Billion Benchmark Add Documents Success!!! – Lessons Learned
  • DynamoDB 11 Billion Benchmark – Document and Folder Details
  • AWS with DynamoDB for Content Management – Reference Architecture & Cost Estimate
  • DynamoDB – 11 Billion Document Benchmark White Paper
  • The Deep Analysis Podcast – The 11 Billion File Benchmark
  • TECHNOLOGY SERVICES GROUP SUCCESSFULLY BENCHMARKS 11 BILLION DOCUMENT REPOSITORY WITH AMAZON WEB SERVICES – PRESS RELEASE
  • DynamoDB 11 Billion Benchmark Search Index Success!!! – Lessons Learned
  • DynamoDB 11 Billion Benchmark Ingestion Success!!! – Lessons Learned

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT