• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

Microsoft Azure HDInsight for ECM on Hadoop

You are here: Home / Amazon / Microsoft Azure HDInsight for ECM on Hadoop

October 14, 2016

As we have posted in the past, Amazon’s Web Services (AWS) cloud infrastructure offering has been a very attractive option for companies looking to outsource some, or all, of their infrastructure. Microsoft, while a little late to the market, has put together a comparable suite of tools to Amazon’s AWS offerings, including a very robust Hadoop as a managed service. This post is going to outline how cloud providers like Azure’s HDInsight service have matured into an offering that should be strongly considered as an ECM repository when evaluating options of where to store your enterprise content.

Hadoop has typically been thought of as a “Data Lake” for all of your unstructured content. In its infancy, Hadoop was designed and developed exactly for that use case, which was to just dump a bunch of files into a large distributed filestore that could leverage large amounts of cheap computers. In these “big data” use cases that leveraged this unstructured data, IT developers would write code (MapReduce) that would contain logic that would parse these large volumes of unstructured data with coded logic that distributed the data down to small chunks of work that the individual commodity computers could handle. This is a great paradigm for running large batches of data through a “schema-on-read” mechanism to explore data. Unfortunately, this approach doesn’t fit the usage pattern of a traditional ECM system, which users require quick retrieval of stored data. More recently, Hadoop’s ecosystem has added the HBase sub-project, which provides real-time access to data stored in Hadoop. This allows for storage of document metadata alongside the content of the document.

Hadoop’s infrastructure of relying on lots of lower powered computing resources translates perfectly into the model of Amazon’s Web Services and Microsoft’s Azure platforms for renting computing resources from their managed data centers. The ability to spin up a cluster of 10 servers in a matter of seconds was a tipping point for users wanting to leverage this technology. While the advent of Ambari and other tools have made it much easier to install and configure Hadoop/HBase in the cloud, it still required very extensive technical knowledge on how Hadoop worked, and often required users to know how to run Linux commands to perform the install.

Microsoft has addressed this pain-point with their HDInsight platform. With a few clicks, it is very easy for a user to create a multi terabyte repository that is fully managed and maintained by Microsoft. This used to be a very complex process reserved only for highly skilled technical resources. It was also surprising to us at TSG that Microsoft has fully embraced Linux in their Azure platform. Microsoft has done a great job wrapping up their Hadoop as a Service offering in a way that is approachable for end users.

image_2

By leveraging the scale of the cloud, it is now very easy to take advantage of the same technology that powers Facebook and Netflix. As we see more of our clients begin to warm to the idea of leveraging Hadoop for document management, Microsoft’s entry into the space is a welcome addition. Let us know your thoughts in the comments below. I want to ask you to check our Instagram page, where we constantly post very useful tips, we want our Instagram Statistics to go up.

Filed Under: Amazon, Amazon EC2, Cloud Computing, ECM 101, ECM Solutions, Hadoop, HBase, Microsoft Azure Tagged With: Azure, ECM

Reader Interactions

Trackbacks

  1. Mobius Content Services Migrations with OpenMigrate — Technology Services Group says:
    December 23, 2019 at 8:56 am

    […] HBase/Hadoop/Azure/Google […]

    Reply
  2. Azure Stack for ECM – On Premise with Cloud Infrastructure and Object Store — Technology Services Group says:
    January 27, 2020 at 10:40 am

    […] to invest and offer their services within your own data center with Azure Stack. Azure HDInsight makes it simple to spin up a fully managed HBase instance, which for our clients that are in the cloud makes it simple to deploy, support, and scale their […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • Reference Architecture for Content Management on Azure HDInsight with HBase
  • Redaction for AWS, Alfresco, Documentum and Hadoop – Bulk Redaction upon Ingestion or Migration
  • ECM 2.0 – Vision & Review of 2019
  • DynamoDB 11 Billion Benchmark 11 Thousand Concurrent Users Success!!! – Lessons Learned
  • Alan Pelz-Sharpe – Deep Analysis Review of Technology Services Group
  • ECM Roadmap – Thoughts on Planning for the Future
  • Claims Documentation – Modernizing Insurance Platforms
  • FileNet AWS Cloud Native? Thoughts on recent announcement
  • AWS with DynamoDB for Content Management – Reference Architecture & Cost Estimate
  • ECM 2.0 – Can you build it yourself?

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT