TSG started an 11 Billion Document Benchmark with DynamoDB last Friday to test and verify the power of Amazon Web Services as well as the TSG ECM products on an unprecedented scale. As of this morning, we are pleased to announce we have fully ingested our goal of 11 billion documents!!! This post will share some of the lessons learned this week.
Posts this week
- Monday detailed the reasons and expectations for the 11 billion document benchmark
- Tuesday showed the interface and migration process
- Wednesday discussed the document and folder details for a NoSQL database
- Thursday walked through the AWS console for the benchmark instance. This post will talk through the overall lessons learned from the benchmark.
This post will talk through the overall lessons learned from the benchmark.
Lesson 1 – Have a realistic Scope
It was important from the start for us to have a realistic and credible scope for our first massive ingestion effort. Rather than come out and say we are going to offer all ECM capabilities at a massive and unprecedented scale, we chose to pick a realistic large volume scenario (claim and case management) and build our first phase around that scope. By phasing the benchmark into steps, we were able to set concrete goals and design architecture around any issues we could foresee when scaling our OpenContent Management Suite on DynamoDB. Later phases will add additional capabilities and testing.
Lesson 2 – Iterate, iterate, iterate
Before starting the benchmark, we set a realistic goal for our ingestion – be able to move 1 billion documents a day into DynamoDB, and then started small and iterated toward that goal. In order to hit a billion documents in 24 hours, we would need to move about 12,000 documents a second. Our iterations looked something like:
- 1 OM on a t2.medium with default thread setting
- 510 docs/sec – 8,000 docs moved
- 1 OM on a t2.medium with OM thread performance tweaks
- 553 docs/sec – 8,000 docs moved
- 1 OM on a m5.24xlarge (96 CPUs) with OM thread performance tweaks
- 3,447 docs/sec – 8,000 docs moved
- 1 OM on a m5.24xlarge (96 CPUs) with OM thread performance tweaks
- 3,389 docs/sec – 70,000,000 docs moved
- (Continue to steadily iterate and increase documents moved until we hit the final test run before kicking off the benchmark)
- 2 OM on m5.24xlarge with OM thread performance tweaks and Elasticsearch indexing performance updates
- 22,367 docs/sec – 530,000,000 docs moved
Starting small and iterating up is a big reason why this benchmark was a success.
Lesson 3 – Reduce Bloat in the Metadata
Metadata is always important for document management solutions, and while we wanted to have enough metadata that our benchmark is realistic, is was important to reduce the bloat of the metadata and take advantage of the admin capabilities to map metadata to labels as we demonstrated in our document and folder overview. We would recommend clients critically look at their metadata models and see if they can identify any metadata bloat that could affect performance.
Lesson 4 – Challenge Assumptions in regards to Performance
The benchmark focused on using our product, OpenMigrate, for the ingestion/migration of the 11 billion documents. While TSG has done plenty of large migrations with OpenMigrate before, we had always been constrained by the performance of the ECM repository and underlying database. In our planning, we initially calculated 10 OpenMigrate instances running concurrently run to hit 12,000 documents per second. However, once we sat down and were able to performance tune OpenMigrate on the AWS – 2 EC2 m5.24xlarge instances (380 GB java heap – 96 CPUs) servers with DynamoDB rather than traditional SQL repositories, OpenMigrate surpassed our wildest expectations with 20,000 documents/second. We were able to complete this benchmark with only two instances of OpenMigrate with only a few small queue populator updates.
Lesson 5 – Elasticsearch needs tuning too
While this benchmark’s main focus is on NoSQL at scale using DynamoDB, Elasticsearch and search indexes in general play an important role is modern document management solutions. We would sometimes tunnel vision in only on DynamoDB and its performance for this benchmark, and forget that Elasticsearch needed some attention as well. Even though we were only indexing folders for this phase of the benchmark, Elasticsearch needed around the same IO per second as DynamoDB to keep pace in indexing. In our next phase were are planning on indexing all documents, we we expect the IO per second need for Elasticsearch to exceed DynamoDB’s.
Thank you so much for following along this week while we completed this phase of our benchmark. Please let us know what you think in the comments, and stay tuned for more information on the next benchmark phase!
[…] OpenContent Case and OpenAnnotate. The initial ingestion phase concluded on May 17th with 11 Billion documents and ingestion speeds of 20,000 documents per second to DynamoDB and related folders indexed into Elasticsearch. We took some time to decompress […]