As an Amazon Web Services Partner, TSG has begun developing out our document management solution for DynamoDB, we have been surprised with some of the implementation differences between DynamoDB and Hadoop / HBase. This post will dive into the differences in building we have encountered some examples while building out our ECM/Content Process Services solution.
Creating a Table – Hbase versus DynamoDB
In the HBase code, the table name and column families are passed in and then the table is created.
public static void createTable(Admin admin, String tableName, String[] tableFamilies) { try { HTableDescriptor tabledescriptor = new HTableDescriptor(TableName.valueOf(Bytes.toBytes(tableName))); for(String family : tableFamilies) { tabledescriptor.addFamily(new HColumnDescriptor(family)); } admin.createTable(tabledescriptor); } catch (IOException e1) { throw new OCRuntimeException("Error creating table " + tableName, e1); } }
While the Hbase is very succinct, it gives the false impression of being fairly simple. Column families are a difficult concept to grasp and converting table names to bytes does not come naturally and seems like an awkward argument into an API.
While the DynamoDB code is more verbose, it is more intuitive.
// Key Schema List<KeySchemaElement> keySchema = new ArrayList<KeySchemaElement>(); keySchema.add(new KeySchemaElement().withAttributeName(DynamoConstants.PROP_USER_NAME).withKeyType(KeyType.HASH)); // Attribute Definition List<AttributeDefinition> attributeDefinitions = new ArrayList<AttributeDefinition>(); attributeDefinitions.add(new AttributeDefinition().withAttributeName(DynamoConstants.PROP_USER_NAME).withAttributeType("S")); CreateTableRequest request = new CreateTableRequest().withTableName(tableName).withKeySchema(keySchema).withAttributeDefinitions(attributeDefinitions).withProvisionedThroughput(new ProvisionedThroughput().withReadCapacityUnits(5L).withWriteCapacityUnits(5L));
Initial steps are to create key schemas (primary keys), and attribute definitions for the table. Then, the table is created with those key schemas and attribute definitions to set throughputs.
Deleting Content – Hbase versus DynamoDB
In HBase, the code to delete an object is pretty straightforward but requires an added step of converting components to bytes because the API does not support strings.
Delete Delete = new Delete(Bytes.toBytes(groupToRemove)); table.delete(Delete);
DynamoDB requires the primary key be used in the table as well as the id but the delete is straightforward.
DeleteItemSpec group = new DeleteItemSpec().withPrimaryKey(DynamoConstants.PROP_GROUP_NAME, groupToRemove); groups.deleteItem(group);
While both solutions seem to be equal, DynamoDB is considerably easier when adding more content since its API for adding content is simple and concise. The content is also shown in the JSON format so users are able to see the content that they added in an easy and familiar way.
Scanning Tables – Hbase versus DynamoDB
Hbase scanning can be very confusing because of the way column families work and how they must be passed as a byte array into the functions that interact with HBase.
public void ScanTable(Table table) { byte[] displayNameQualifer = Bytes.toBytes(HBaseConstants.PROP_USER_DISPLAY_NAME + HBaseConstants.PROPERTY_TYPE_STRING); byte[] propertiesColumnFamily = Bytes.toBytes(HBaseConstants.COL_FAM_PROPERTIES); Scan scan = new Scan(); scanner = table.getScanner(scan); List<UserBean> userBeans = new ArrayList<UserBean>(); for (Result item : scanner) { String userName = Bytes.toString(item.getRow()); Get user = new Get(Bytes.toBytes(userName)); getResult = table.get(user); byte[] usersBytes = getResult.getValue(propertiesColumnFamily, displayNameQualifer); String displayName = Bytes.toString(usersBytes); } }
Using DynamoDB, scanning is very simple, requires less code and the meaning of the code can be easily understood by a programmer.
List<Map<String, AttributeValue>> users = DynamoUtil.scanTable(DynamoConstants.TABLE_USERS, null, dynamoConfig); for (Map<String, AttributeValue> user : users) { UserBean userBean = new UserBean(); String displayName = user.get(DynamoConstants.PROP_USER_DISPLAY_NAME).getS(); }
Another benefit of DynamoDB, HBase can only scan with one primary key, making sorting slower than DynamoDB, which supports both a primary key and a sort key.
Search for Objects – Hbase versus DynamoDB
For this example, both databases are querying for an object with a group id. The HBase version of this code is considerably more verbose and the difficult to understand. The table name must be of type TableName and there are multiple HBase utility calls that must happen before retrieving the desired item.
table = hbaseConfig.gethConnection().getTable(TableName.valueOf(HBaseConstants.TABLE_GROUPS)); Get get = new Get(Bytes.toBytes(groupId)); Result getResult = table.get(get);
As an alternative to HBase, DynamoDB has a very straightforward method of querying tables. By obtaining the table from the client itself, an item can be fetched by establishing its primary key in a GetItemSpec and fetching the item from the table.
Table table = dynamoConfig.getDynamo().getTable(DynamoConstants.TABLE_GROUPS) GetItemSpec getItemSpec = new GetItemSpec().withPrimaryKey(DynamoConstants.GROUP_ID, groupId); return table.getItem(getItemSpec);
The DynamoDB version of item or object querying is much easier to understand and grasp and demonstrates the power of the database.
Summary
Overall, we have found interacting with DynamoDB has been much easier than interacting with HBase, mostly due to the readability of the code, the excellent documentation and the natural interaction with the tables. For AWS customers, TSG would recommend DynamoDB as a powerful alternative to HBase.
Leave a Reply