• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

Documentum, Alfresco or Hadoop – How to get more out of your Object Store

You are here: Home / Alfresco / Documentum, Alfresco or Hadoop – How to get more out of your Object Store

June 7, 2017

TSG conducted our annual client briefing on Monday June 5th.  One of the more interesting presentations and discussions was on how clients are finding innovative ways to leverage the capabilities of the various object storage devices and technologies.  This post will present some of our experiences and best practices in regards to object storage and ECM.

Object Storage and ECM

Most ECM products grew up in a pre-object storage world where the ECM system stored documents in mounted file stores or a SAN controlled by and only accessed by the ECM system.  For the majority of systems, the file names and storage areas were created and managed exclusively by the ECM software.  Typically the ECM interface or application would call an ECM API to create the database entries for meta-data as well as store the file in a mounted file store.  In the old client-server days, the API would then ship the document file to the server containing the ECM software.

As browser systems evolved, different interfaces would use default browser transmissions capabilities to ship the document file to the application server and then call the ECM API to store the document file in the mounted file store.  Lots of other activities might be kicked off based on the file store that could include PDF renditioning or full-text indexing.

Object Storage provides the ability to move beyond the mounted file system to a more secure and efficient object storage.  Rather than a file path and file name, a storage device or application can just store the files and return a pointer that could be placed in the ECM system.  Typically there are some extra added components (typically called connectors) that might be required for the ECM system to store and access the object store.

As object storage evolved, lots of capabilities are being included that can creatively be taken advantage of by the ECM system including:

  • De-duplication – rather than store two copies of the same file, if the object store realizes that the files are exactly the same, it can remove one copy to reduce storage while making it appear that both files still exist with multiple pointers.
  • High Speed Ingestion – As a separate layer or architecture, the separate hardware and software can allow speedy and parallel storage of files.
  • Object Store and File Path – the Object Store can leverage both object storage with simple pointers as well as emulate an actual filesystem with folders/paths and metadata.
  • Encryption – Object Stores can be set up with encryption embedded within the object store’s hardware, removing the performance impact of performing the encryption on the already overloaded ECM system.

Storing and Linking with ECM and Object Stores

For a typical migration or ingestion, most ECM programs call the ECM API to ingest the file and create the meta-data in the ECM repository.  With Alfresco, one concept supported by our migration tool, OpenMigrate, is the method of leveraging the high-speed ingestion of the object store and then adding the link into Alfresco rather than having the Alfresco API do the file storage.  By allowing the object store to handle the ingestion of the file itself, the process removes the overhead of the transmission of the file to the application server as well as by the Alfresco createNode API.   From a repository perspective, the entry in the Alfresco DB is exactly the same as if the API had been called.

With the Alfresco API, typical client environments are limited by network and other bandwidth, memory and CPU issues within Alfresco.  With the direct storage in the object store, we have seen our typically throughput grow from 20 documents/second to approach 250 documents/second or more.  We are working on ways to embed the linking approach within our OpenContent webservices and JavaScript to add the builk upload linking capabilities to our typical user upload functions.

Preserving existing integrations with the Object Store file Mapping features

Another great use of the Object Store is the ability to have it work both as a mounted file system as well as an Object Store.  For one of our clients, the ability to have existing integrations continue to use a pretty complex file system to store documents while allowing the object store to pass the object-id to ECM for linking provides a best of both world approach.  Existing integrations can continue to remain unchanged while the ECM repository is not clogged up with long and unnecessary file locations and file names.  As the integrations are changed to either leverage the file store or different directory/file naming, the ECM system will remain consistent with the object store ids.

Consumer and Browser integration to the object store rather than to the ECM system

Another innovative approach similar to the object linking involves having users or other systems store their content directly into the object store and posting the link either in the ECM system or another application.  The scenario for one client involves the upload of large video files to Amazon’s S3 object store.  The non-ECM application allows users from their smart phones or other devices to upload a video directly to Amazon to take advantage of Amazon’s CloudFront service to quickly store and stream the video without any ECM involvement. For our clients that are worried about network constraints of having hundreds of users stream video from within their ECM architecture, this is an easy way to leverage the scale of Amazon’s infrastructure to offload the storage and network strain off of the ECM repository.

From the ECM application, the browser interface queries the non-ECM system asking for any videos associated with this case and will stream them directly from S3.  In talking with the clients about what TSG is doing with video annotations, we are considering storing the annotations outside of the ECM system in the Object Store with meta-data added to the non-ECM system about the video annotations.

Summary

Initial ECM systems had to manage both the meta-data as well as the file storing requirements for ECM.  With the continue evolution, performance and capabilities of object stores including Amazon S3 and Hitachi Content Platform (HCP), TSG is recommending clients consider innovative ways to leverage these capabilities for increased performance and capabilities surrounding document ingestion and retrieval.

Let us know your thoughts below.

Filed Under: Alfresco, Documentum, ECM Landscape, Hadoop

Reader Interactions

Trackbacks

  1. Elastic Services for ECM – TSG OpenContent Roadmap — Technology Services Group says:
    February 12, 2020 at 2:50 pm

    […] would be accessed. Configurations could include whether the content is cached (published), accessed directly to the object store or accessed via the ECM […]

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • ECM 2.0 – Can you build it yourself?
  • ECM 2.0 – What does it mean?
  • Alan Pelz-Sharpe – Deep Analysis Review of Technology Services Group
  • 2017 ECM Thoughts and Predictions as well as recap of 2016 postings
  • Top 5 Differences between Records Management and Document Management
  • The Future of ECM
  • Enterprise Content Management – State of the Industry – 2015
  • Content Service Platform Scaling – How Good Key Design and NoSQL can avoid the need for Elastic/Solr or other indexes
  • ECM 2.0 – One-Step vs. Two-Step Migrations
  • Federated Content Management – Enterprise Search with a new moniker?

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT