• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

DynamoDB 11 Billion Benchmark – Document and Folder Details

You are here: Home / Amazon / DynamoDB 11 Billion Benchmark – Document and Folder Details

May 15, 2019

TSG started an 11 Billion Document Benchmark with DynamoDB last Friday to test and verify the power of Amazon Web Services as well as the TSG ECM products on an unprecedented scale.  As of this morning we have migrated approximately 9 billion documents. This post will present some underlying detail of DynamoDB repository with a particular focus on document and folder objects.

Our post Monday detailed the reasons and expectations for the 11 billion document benchmark with a post Tuesday showing the interface and migration process.  This post will present additional detail on specifics of the document and folder details and how they tie into the OpenContent Management Suite’s ability to configure the user experience.

DynamoDB Object Model – NoSQL Approach

One of the big advances for big data is the introduction of NoSQL (Not Only Standard Query Language) as a data storage and retrieval approach.  Developed as an alternative to relational databases, benefits of this approach include:

  • Simplicity of design
  • Simpler “horizontal” scaling to clusters of machines (which is a problem for relational databases), and finer control over availability.
  • The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are different from those used by default in relational databases, making some operations faster in NoSQL.
  • Sometimes the data structures used by NoSQL databases are also viewed as more flexible than relational database tables.

Specifically for Document Management customers, there is a simple difference between the two approaches.

  • Relational Database – Would store the attributes in columns/rows of the relational database with a pointer to the document file location in a SAN or object store.
  • NOSQL – Would store the attributes in an entry with tags/metadata that describe the document along with the possibly the document content itself in the repository or in a SAN or object store.  Tags can be XML, JSON or a variety of other alternatives.

For our DynamoDB approach, we are using JSON with the following layouts for documents

As well as a similar layout for folders

(Note: these are example of our Claim Auto Document and Folder types from the benchmark. Other types may have slightly different metadata fields)

One advantage the OpenContent Management Suite (OCMS) has over traditional document management interfaces is the ability to configure all the portions of the interface without requiring any code.  While the JSON object can have all of the detail to describe each attribute, OCMS will map the name to a label for display in the interface, allowing names/interfaces to change and adapt for different users and languages without requiring the underlying repository to change.

DynamoDB – Folder Detail

One interesting component of the Folder model is including the detail of the document objects in the folder itself. (the rel_children_ss attribute from the folder picture above is a list of all document ids in the folder)

As part of our benchmark, the team is going to test two models for displaying the contents of a folder.

  • JSON storage of document objects – Currently the folder object contains all of the document ids in a repeating field.  This allows for fast viewing of the objects in the folder, a typical requirement for case management/folder viewing.  Benefits include fast, scale-able access to folder objects without a large Elasticsearch index.  Downsides would be having to update the folder object every time a folder is added/deleted and large folders (TSG has one client with 65,000 documents in a folder).
  • Elasticsearch for documents – Currently, Elasticsearch is only being used for access to folder objects.  Phase 2 of the benchmark will include indexing of all or part of 11 billion documents to test leverage of Elasticsearch for displaying objects contained in a folder.  Benefits include not having to update folder objects for adding or removing documents.  Downsides include having to maintain an index for all documents and additional Elasticsearch resources.

TSG is planning on testing and may provide both options to DynamoDB customers where one might make more sense depending on the customer’s use case.  Below is a daily video of our progress to date on the benchmark with some more detail on how the object model is configured in the OpenContent Management Suite.

Let us know your thoughts below and look for another entry tomorrow.

Filed Under: Amazon, DynamoDB, ECM Landscape, OpenContent Management Suite

Reader Interactions

Trackbacks

  1. DynamoDB 11 Billion Benchmark Ingestion Success!!! – Lessons Learned says:
    May 17, 2019 at 8:43 am

    […] Wednesday discussed the document and folder details for a NoSQL database […]

    Reply

Leave a Reply to DynamoDB 11 Billion Benchmark Ingestion Success!!! – Lessons Learned Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • ECM 2.0 – Can you build it yourself?
  • DynamoDB 11 Billion Document Benchmark – Summary of Postings
  • DynamoDB 11 Billion Benchmark 11 Thousand Concurrent Users Success!!! – Lessons Learned
  • DynamoDB 11 Billion Benchmark Add Documents Success!!! – Lessons Learned
  • AWS with DynamoDB for Content Management – Reference Architecture & Cost Estimate
  • DynamoDB – 11 Billion Document Benchmark White Paper
  • The Deep Analysis Podcast – The 11 Billion File Benchmark
  • TECHNOLOGY SERVICES GROUP SUCCESSFULLY BENCHMARKS 11 BILLION DOCUMENT REPOSITORY WITH AMAZON WEB SERVICES – PRESS RELEASE
  • DynamoDB 11 Billion Benchmark Search Index Success!!! – Lessons Learned
  • DynamoDB 11 Billion Benchmark Ingestion Success!!! – Lessons Learned

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2022 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT