• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

DynamoDB and Hadoop / HBase – What are some of the differences?

You are here: Home / Amazon / DynamoDB and Hadoop / HBase – What are some of the differences?

November 15, 2018

As an Amazon Web Services Partner, TSG has begun developing out our document management solution for DynamoDB, we have been surprised with some of the implementation differences between DynamoDB and Hadoop / HBase.  This post will dive into the differences in building we have encountered some examples while building out our ECM/Content Process Services solution.

Creating a Table – Hbase versus DynamoDB

In the HBase code, the table name and column families are passed in and then the table is created.

public static void createTable(Admin admin, String tableName, String[] tableFamilies) {
  try {
    HTableDescriptor tabledescriptor = new HTableDescriptor(TableName.valueOf(Bytes.toBytes(tableName)));
    for(String family : tableFamilies) {
      tabledescriptor.addFamily(new HColumnDescriptor(family));
    }
    admin.createTable(tabledescriptor);
  } catch (IOException e1) {
    throw new OCRuntimeException("Error creating table " + tableName, e1);
  }
}

While the Hbase is very succinct, it gives the false impression of being fairly simple.  Column families are a difficult concept to grasp and converting table names to bytes does not come naturally and seems like an awkward argument into an API.

While the DynamoDB code is more verbose, it is more intuitive.

// Key Schema
List<KeySchemaElement> keySchema = new ArrayList<KeySchemaElement>();
keySchema.add(new KeySchemaElement().withAttributeName(DynamoConstants.PROP_USER_NAME).withKeyType(KeyType.HASH));

// Attribute Definition
List<AttributeDefinition> attributeDefinitions = new ArrayList<AttributeDefinition>();
attributeDefinitions.add(new AttributeDefinition().withAttributeName(DynamoConstants.PROP_USER_NAME).withAttributeType("S"));

CreateTableRequest request = new CreateTableRequest().withTableName(tableName).withKeySchema(keySchema).withAttributeDefinitions(attributeDefinitions).withProvisionedThroughput(new ProvisionedThroughput().withReadCapacityUnits(5L).withWriteCapacityUnits(5L));

Initial steps are to create key schemas (primary keys), and attribute definitions for the table. Then, the table is created with those key schemas and attribute definitions to set throughputs.

Deleting Content – Hbase versus DynamoDB

In HBase, the code to delete an object is pretty straightforward but requires an added step of converting components to bytes because the API does not support strings.

Delete Delete = new Delete(Bytes.toBytes(groupToRemove));	
table.delete(Delete);

DynamoDB requires the primary key be used in the table as well as the id but the delete is straightforward.

DeleteItemSpec group = new DeleteItemSpec().withPrimaryKey(DynamoConstants.PROP_GROUP_NAME, groupToRemove);
groups.deleteItem(group);

While both solutions seem to be equal, DynamoDB is considerably easier when adding more content since its API for adding content is simple and concise. The content is also shown in the JSON format so users are able to see the content that they added in an easy and familiar way.

Scanning Tables – Hbase versus DynamoDB

Hbase scanning can be very confusing because of the way column families work and how they must be passed as a byte array into the functions that interact with HBase.

public void ScanTable(Table table) {

byte[] displayNameQualifer = Bytes.toBytes(HBaseConstants.PROP_USER_DISPLAY_NAME + HBaseConstants.PROPERTY_TYPE_STRING);
byte[] propertiesColumnFamily = Bytes.toBytes(HBaseConstants.COL_FAM_PROPERTIES);
  
Scan scan = new Scan();
scanner = table.getScanner(scan);
List<UserBean> userBeans = new ArrayList<UserBean>();
  for (Result item : scanner) {
    String userName = Bytes.toString(item.getRow());
    Get user = new Get(Bytes.toBytes(userName));
    getResult = table.get(user);
 		byte[] usersBytes = getResult.getValue(propertiesColumnFamily, displayNameQualifer);
 		String displayName = Bytes.toString(usersBytes);
  }
}

Using DynamoDB, scanning is very simple, requires less code and the meaning of the code can be easily understood by a programmer.

List<Map<String, AttributeValue>> users = DynamoUtil.scanTable(DynamoConstants.TABLE_USERS, null, dynamoConfig);
for (Map<String, AttributeValue> user : users) {
  UserBean userBean = new UserBean();
  String displayName = user.get(DynamoConstants.PROP_USER_DISPLAY_NAME).getS();
}

Another benefit of DynamoDB, HBase can only scan with one primary key, making sorting slower than DynamoDB, which supports both a primary key and a sort key.

Search for Objects – Hbase versus DynamoDB

For this example, both databases are querying for an object with a group id. The HBase version of this code is considerably more verbose and the difficult to understand. The table name must be of type TableName and there are multiple HBase utility calls that must happen before retrieving the desired item.

table = hbaseConfig.gethConnection().getTable(TableName.valueOf(HBaseConstants.TABLE_GROUPS));
Get get = new Get(Bytes.toBytes(groupId));
Result getResult = table.get(get);

As an alternative to HBase, DynamoDB has a very straightforward method of querying tables. By obtaining the table from the client itself, an item can be fetched by establishing its primary key in a GetItemSpec and fetching the item from the table.

Table table = dynamoConfig.getDynamo().getTable(DynamoConstants.TABLE_GROUPS)
GetItemSpec getItemSpec = new GetItemSpec().withPrimaryKey(DynamoConstants.GROUP_ID, groupId);
return table.getItem(getItemSpec);

The DynamoDB version of item or object querying is much easier to understand and grasp and demonstrates the power of the database.

Summary

Overall, we have found interacting with DynamoDB has been much easier than interacting with HBase, mostly due to the readability of the code, the excellent documentation and the natural interaction with the tables.  For AWS customers, TSG would recommend DynamoDB as a powerful alternative to HBase.

Filed Under: Amazon, DynamoDB, HBase

Reader Interactions

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • DynamoDB and Hadoop/HBase as a Document Store
  • 11 Billion Documents, 12 Months Later – Thoughts and best practices 1 year after our industry leading document benchmark.
  • Amazon S3 – Viewing content fast and securely in-browser with the Alfresco Enterprise Viewer
  • Content Service Platform Scaling – How Good Key Design and NoSQL can avoid the need for Elastic/Solr or other indexes
  • Print to Repository – OpenContent Print Driver Support
  • ECM 2.0 – Vision & Review of 2019
  • ECM 2.0 – One-Step vs. Two-Step Migrations
  • ECM 2.0 – Can you build it yourself?
  • DynamoDB – 11 Billion Document Benchmark White Paper
  • DynamoDB Benchmark – Building an 11 Billion Document DR Process

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT