• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

DynamoDB 11 Billion Benchmark Add Documents Success!!! – Lessons Learned

You are here: Home / Amazon / DynamoDB 11 Billion Benchmark Add Documents Success!!! – Lessons Learned

June 13, 2019

TSG initiated our 11 Billion benchmark on Friday, May 10th.  The first phase of the benchmark was aimed at building a large repository with our OpenMigrate ingestion tool and proving access for OpenContent Search, OpenContent Case and OpenAnnotate.  The initial ingestion phase concluded on May 17th with 11 Billion documents and ingestion speeds of 20,000 documents per second to DynamoDB and related folders indexed into Elasticsearch.  We took some time to decompress and started the second phase of benchmark last week focused on building search indices as required for document search with the DynamoDB documents which successfully ended June 11th.  Today we have successfully completed the third phase of the project, adding documents.  This post will highlight the success of this third phase of the benchmark as well as present how the final phase testing a large number of users will proceed.

11 Billion Document White Paper

Adding Documents – Benchmark Approach

When setting up the benchmark, we specifically chose to separate the first large scale (11 billion documents – almost 1 billion folders) ingestion phase from the third phase of users adding documents to folders.  This approach is consistent of our larger clients as many of our large clients have chosen to expose our OpenContent interfaces on their existing content after a large or rolling migration before allowing users to add documents to the new repository.   (See related webinar with Tony Parzgnat on how a rolling migration can help retire FileNet early)

One of the key discussions for this phase focused on how to add/retrieve content from a folder.  As we mentioned in our initial post describing the benchmark, a key requirement was viewing case documents and that users would be able to “view a listing of all documents or videos in the folder”. 

As part of our benchmark, the team tested two approaches for displaying the contents of a folder.

  • JSON storage of document objects – The folder DynamoDB object contains all of the document ids in a repeating field.  This allows for fast viewing of the objects in the folder, a typical requirement for case management/folder viewing.  Benefits included fast, scale-able access to folder objects without a large Elasticsearch index. 
  • Elasticsearch for documents – In our first ingestion phase, Elasticsearch was only being used for access to folder objects.  Phase 2 of the benchmark tested indexing part of 11 billion documents to test leverage of Elasticsearch for displaying objects contained in a folder. 

After testing, the team determined that the JSON object store made the most sense for our sample set but that we would continue to offer both alternatives for customers based on the size of the number of documents in the folder.  (TSG has one client with 65,000 documents in a folder).

Add Document Results

Below is a video showing adding documents to multiple types of folder with a variety of ingestion options.  Video highlights adding and annotating content in the large repository.

Lessons Learned – Elasticsearch versus DynamoDB

We found the current pricing of Elasticsearch versus DynamoDB to be very different with Elasticsearch due to the size of cores need to support large ingestion and indices. In our benchmark, DynamoDB stored around 13 times the amount of nodes as Elasticsearch did, but Elasticsearch currently costed about 1.3 times more than DynamoDB over the course of the benchmark.

Unlike DynamoDB where we could scale up for ingestion and then drop read/write units once the large migration was complete, our approach required Elasticsearch servers to be maintained and operational for both ingestion and later access. DynamoDB read/write units are priced and maintained very differently than Elasticsearch EC2 instances. Maintaining the folder objects/documents in Elasticsearch was price prohibitive for a simple use case that can be accomplished with DynamoDB.

Due to the pricing and the added overhead, we thought leveraging DynamoDB for most folder viewing requirements made the most sense from a cost as well as scaling capability. 

Phase 4 – What’s Next and Last

We are looking to finish the benchmark within the next week or two.  Our last test will be a concurrent user test of 11,000 threads preforming standard document management including search, annotate and adding documents.

Stay tuned as we look to wrap up the benchmark.  Thanks again for all of your questions and thoughts.

11 Billion Document White Paper

Filed Under: Amazon, DynamoDB, ECM Landscape, OpenContent Management Suite, Product Suite

Reader Interactions

Trackbacks

  1. DynamoDB 11 Billion Benchmark 11 Thousand Concurrent Users Success!!! – Lessons Learned says:
    June 20, 2019 at 1:47 pm

    […] in DynamoDB which successfully ended June 11th.  The third phase of the project focused on user addition of documents and finished on June 12.  This last phase focused on testing a large number of users concurrently accessing the system […]

    Reply
  2. The Deep Analysis Podcast – The 11 Billion File Benchmark says:
    June 25, 2019 at 9:52 pm

    […] documents in Elasticsearch for the Accounts Payable scenario. We’re going to blog about that next week. And then the Phase 3 will include adding documents and then Phase 4 is going to be something our […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • ECM 2.0 – Can you build it yourself?
  • DynamoDB 11 Billion Document Benchmark – Summary of Postings
  • DynamoDB 11 Billion Benchmark 11 Thousand Concurrent Users Success!!! – Lessons Learned
  • DynamoDB 11 Billion Benchmark – Document and Folder Details
  • AWS with DynamoDB for Content Management – Reference Architecture & Cost Estimate
  • DynamoDB – 11 Billion Document Benchmark White Paper
  • The Deep Analysis Podcast – The 11 Billion File Benchmark
  • TECHNOLOGY SERVICES GROUP SUCCESSFULLY BENCHMARKS 11 BILLION DOCUMENT REPOSITORY WITH AMAZON WEB SERVICES – PRESS RELEASE
  • DynamoDB 11 Billion Benchmark Search Index Success!!! – Lessons Learned
  • DynamoDB 11 Billion Benchmark Ingestion Success!!! – Lessons Learned

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT