TSG initiated our 11 Billion Document DynamoDB benchmark on Friday, May 10th 2019 and ended all of our testing activities on June 20th 2019. The benchmark was an unbelievable success with our team learning many lessons in regards to scaling AWS, DynamoDB, Elasticsearch and our OpenContent and OpenMigrate products. This post will present a summary of the benchmark activities and lessons learned.
DynamoDB Benchamark – What were the objectives?
TSG initially announced our development efforts for creating an ECM offering for DynamoDB back in October, 2018. Based on the success of our Hadoop offering (also a NoSQL approach), developing a product for DynamoDB was greatly simplified with most of our development and testing efforts completed in a couple of months. While we have already had some success with a multiple Hadoop clients, our team thought an internal benchmark partnering with Amazon would show off the true power and massive scale potential of DynamoDB and AWS. Our goal was to simulate all the components of a massively large repository to verify that our tools and approaches can scale for our large volume, case management clients. The benchmark focused on typical requirements for massive large volume clients like our health and insurance claim repositories but also included accounts payable and human resource scenarios.
To read more about the scope, environment and test data creation, view our initial post at https://tsgrp.wpengine.com/2019/05/13/dynamodb-amazon-web-services-11-billion-document-benchmark/
DynamoDB Benchmark – Phase 1 – Migration
The first phase of the benchmark was aimed at building a large repository with our OpenMigrate ingestion tool and proving access for OpenContent Search, OpenContent Case and OpenAnnotate. The initial ingestion phase concluded on May 17th with 11 Billion documents and ingestion speeds of 20,000 documents per second to DynamoDB and related folders indexed into Elasticsearch.
We posted daily during the migration run. For additional detail, view the following posts and videos:
- DymanoDB – Repository Walkthrough
- DynamoDB Document and Folder Details
- DynamoDB – AWS Walk Through
- DynamoDB – Ingestion Success!!! – Lessons Learned
For our development community, we also posted on a How to build your own ECM capabilities for massive scale and performance that offered a background on how to simplify and build capabilities from the bottom up rather than the top down approach of typical legacy ECM vendors.
DynamoDB Benchmark – Phase 2 – Building Additional Search Indices
The second phase of benchmark focused on building Elasticsearch indices as required for document search for the documents already in DynamoDB which successfully ended June 11th. While the initial migration included Elasticsearch index for all of the 925,837,980 folders in the repository, we wanted to show the ECM 2.0 approach of creating indices for specific scenarios rather than one massive search index for the entire repository. For this test we created a quick million document index for just accounts payable in about 33 minutes. Lots of good lessons learned about AWS Lamda, DynamoDB streams and differences between scaling DynamoDB and Elasticsearch.
DynamoDB Benchmark – Phase 3 – Adding Documents
The third phase of the project focused on user addition of documents and finished on June 12.
One of the major lessons learned was more knowledge of when to use Elasticsearch for case document viewing versus DynamoDB. The team ended up deciding that DynamoDB probably makes more sense for smaller case folder scenarios but large document case folders might drive Elasticsearch. OpenContent has been developed to allow either approach so clients can decide on their own which scenario works better.
DynamoDB Benchmark – Phase 4 – 11,000 Concurrent Users
The fourth and last phase of the benchmark focused on concurrent testing of user volumes and finished on June 20th.
Goals of the concurrent user test were to replicate some of the different issues we have seen from clients in production when a large number of document management users are accessing the system.
The benchmark test was patterned after the most common use case for our insurance clients – Claim Viewing. For the test we ran a batch of 11,000 users performing the claim viewing scenario across a selection of medical and auto claims.
We had lots of tuning and performance around our Jmeter testing tool, interface, DynamoDB and Elasticsearch. Each of the points were resolved and retested in subsequent test runs.
Summary
In TSG’s “cranking it up to eleven” billion document benchmark, TSG has been able to prove the scalability and benefits of both Amazon and a NoSQL approach with DynbamoDB over traditional document management solutions based on relational database approaches.
During the benchmark, we have received some feedback on “why 11 Billion documents and 11 thousand concurrent users”? In deciding on the size of the benchmark, we wanted to exceed the numbers we have seen at clients (7 billion for one prospect) by a large margin. Compared to some of the other, older billion document benchmarks conducted by the ECM software vendors in the past, this benchmark tested all of the scenarios required by our large volume clients.
Combined with our rolling migration approach, TSG now has an extensive amount of experience and solutions to move large clients to alternative solutions with our products and people. Some examples of ways the benchmark is currently influencing our current clients this last week include:
- As a test harness, we are proposing leveraging the migration approach and test data to quickly scale up client’s repositories to production volume before moving real data to test infrastructure and performance.
- In our design, we are able to combine lessons learned on creating indexes and alternatives leveraging NoSQL to provide multiple alternatives for typical search scenarios.
- We are saving the repository in S3 to allow clients to quickly restore and test scenarios against the repository.
Thanks again to everyone that helped us in the benchmark particularly Amazon Web Services, Deep Analysis and Doculabs. Download the whitepaper with Alan Pelz-Sharp from Deep analysis and let us know your thoughts below.
[…] in the next two to three weeks we’ll be done. (Editor note – we did finish – see Summary of Posts) We just finished the indexing of a billion documents in Elasticsearch for the Accounts Payable […]