• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

DynamoDB 11 Billion Benchmark Ingestion Success!!! – Lessons Learned

You are here: Home / Amazon / DynamoDB 11 Billion Benchmark Ingestion Success!!! – Lessons Learned

May 17, 2019

TSG started an 11 Billion Document Benchmark with DynamoDB last Friday to test and verify the power of Amazon Web Services as well as the TSG ECM products on an unprecedented scale.  As of this morning, we are pleased to announce we have fully ingested our goal of 11 billion documents!!! This post will share some of the lessons learned this week.

Posts this week

  • Monday detailed the reasons and expectations for the 11 billion document benchmark
  • Tuesday showed the interface and migration process
  • Wednesday discussed the document and folder details for a NoSQL database
  • Thursday walked through the AWS console for the benchmark instance. This post will talk through the overall lessons learned from the benchmark.

This post will talk through the overall lessons learned from the benchmark.

Lesson 1 – Have a realistic Scope

It was important from the start for us to have a realistic and credible scope for our first massive ingestion effort. Rather than come out and say we are going to offer all ECM capabilities at a massive and unprecedented scale, we chose to pick a realistic large volume scenario (claim and case management) and build our first phase around that scope. By phasing the benchmark into steps, we were able to set concrete goals and design architecture around any issues we could foresee when scaling our OpenContent Management Suite on DynamoDB. Later phases will add additional capabilities and testing.

Lesson 2 – Iterate, iterate, iterate

Before starting the benchmark, we set a realistic goal for our ingestion – be able to move 1 billion documents a day into DynamoDB, and then started small and iterated toward that goal. In order to hit a billion documents in 24 hours, we would need to move about 12,000 documents a second. Our iterations looked something like:

  • 1 OM on a t2.medium with default thread setting
    • 510 docs/sec – 8,000 docs moved
  • 1 OM on a t2.medium with OM thread performance tweaks
    • 553 docs/sec – 8,000 docs moved
  • 1 OM on a m5.24xlarge (96 CPUs) with OM thread performance tweaks
    • 3,447 docs/sec – 8,000 docs moved
  • 1 OM on a m5.24xlarge (96 CPUs) with OM thread performance tweaks
    • 3,389 docs/sec – 70,000,000 docs moved
  • (Continue to steadily iterate and increase documents moved until we hit the final test run before kicking off the benchmark)
  • 2 OM on m5.24xlarge with OM thread performance tweaks and Elasticsearch indexing performance updates
    • 22,367 docs/sec – 530,000,000 docs moved

Starting small and iterating up is a big reason why this benchmark was a success.

Lesson 3 – Reduce Bloat in the Metadata

Metadata is always important for document management solutions, and while we wanted to have enough metadata that our benchmark is realistic, is was important to reduce the bloat of the metadata and take advantage of the admin capabilities to map metadata to labels as we demonstrated in our document and folder overview. We would recommend clients critically look at their metadata models and see if they can identify any metadata bloat that could affect performance.

Lesson 4 – Challenge Assumptions in regards to Performance

The benchmark focused on using our product, OpenMigrate, for the ingestion/migration of the 11 billion documents. While TSG has done plenty of large migrations with OpenMigrate before, we had always been constrained by the performance of the ECM repository and underlying database. In our planning, we initially calculated 10 OpenMigrate instances running concurrently run to hit 12,000 documents per second. However, once we sat down and were able to performance tune OpenMigrate on the AWS – 2 EC2 m5.24xlarge instances (380 GB java heap – 96 CPUs) servers with DynamoDB rather than traditional SQL repositories, OpenMigrate surpassed our wildest expectations with 20,000 documents/second. We were able to complete this benchmark with only two instances of OpenMigrate with only a few small queue populator updates.

Lesson 5 – Elasticsearch needs tuning too

While this benchmark’s main focus is on NoSQL at scale using DynamoDB, Elasticsearch and search indexes in general play an important role is modern document management solutions. We would sometimes tunnel vision in only on DynamoDB and its performance for this benchmark, and forget that Elasticsearch needed some attention as well. Even though we were only indexing folders for this phase of the benchmark, Elasticsearch needed around the same IO per second as DynamoDB to keep pace in indexing. In our next phase were are planning on indexing all documents, we we expect the IO per second need for Elasticsearch to exceed DynamoDB’s.

Thank you so much for following along this week while we completed this phase of our benchmark. Please let us know what you think in the comments, and stay tuned for more information on the next benchmark phase!

Filed Under: Amazon, DynamoDB, ECM Landscape

Reader Interactions

Trackbacks

  1. DynamoDB 11 Billion Benchmark Search Index Success!!! – Lessons Learned says:
    June 12, 2019 at 7:01 am

    […] OpenContent Case and OpenAnnotate.  The initial ingestion phase concluded on May 17th with 11 Billion documents and ingestion speeds of 20,000 documents per second to DynamoDB and related folders indexed into Elasticsearch.  We took some time to decompress […]

    Reply
  2. DynamoDB 11 Billion Benchmark Add Documents Success!!! – Lessons Learned says:
    June 13, 2019 at 7:32 am

    […] OpenContent Case and OpenAnnotate.  The initial ingestion phase concluded on May 17th with 11 Billion documents and ingestion speeds of 20,000 documents per second to DynamoDB and related folders indexed into Elasticsearch.  We took some time to decompress […]

    Reply
  3. DynamoDB 11 Billion Benchmark 11 Thousand Concurrent Users Success!!! – Lessons Learned says:
    June 20, 2019 at 10:09 am

    […] OpenContent Case and OpenAnnotate.  The initial ingestion phase concluded on May 17th with 11 Billion documents and ingestion speeds of 20,000 documents per second to DynamoDB and related folders indexed into Elasticsearch.  The second phase of benchmark […]

    Reply
  4. DynamoDB 11 Billion Document Benchmark – Summary of Postings says:
    July 31, 2019 at 3:53 pm

    […] OpenContent Case and OpenAnnotate.  The initial ingestion phase concluded on May 17th with 11 Billion documents and ingestion speeds of 20,000 documents per second to DynamoDB and related folders indexed into […]

    Reply
  5. ECM 2.0 – Can you build it yourself? says:
    October 1, 2019 at 9:29 am

    […] Our 11 Billion document benchmark with AWS and DynamoDB leveraged all of the capabilities above.  […]

    Reply
  6. 11 Billion Documents, 12 Months Later - Thoughts and best practices 1 year after our industry leading document benchmark. — Technology Services Group says:
    May 21, 2020 at 8:27 am

    […] Alfresco Enterprise Viewer).  The initial ingestion phase concluded on May 17th with 11 Billion documents and ingestion speeds of 20,000 documents per second to DynamoDB and related folders indexed into […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • ECM 2.0 – Can you build it yourself?
  • The Deep Analysis Podcast – The 11 Billion File Benchmark
  • DynamoDB 11 Billion Document Benchmark – Summary of Postings
  • DynamoDB 11 Billion Benchmark 11 Thousand Concurrent Users Success!!! – Lessons Learned
  • DynamoDB 11 Billion Benchmark Add Documents Success!!! – Lessons Learned
  • DynamoDB 11 Billion Benchmark – AWS Walk-through
  • DynamoDB 11 Billion Benchmark – Document and Folder Details
  • A Big Data Approach to ECM – White Paper from Deep Analysis
  • ECM 2.0 – What does it mean?
  • 11 Billion Documents, 12 Months Later – Thoughts and best practices 1 year after our industry leading document benchmark.

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT