• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

Hadoop PDF Annotations with OpenAnnotate

You are here: Home / Hadoop / Hadoop PDF Annotations with OpenAnnotate

May 19, 2015

The Hadoop Distributed File System (HDFS) provides the ability to store an enormous quantity of files with redundancy.  In our first release of OpenContent for Hadoop, we have included the ability to annotate PDF documents with OpenAnnotate and store and retrieve the PDF layers in Hadoop.  This post will describe the integration with Hadoop as the ECM repository, as well as highlight some benefits of using an annotation tool that uses open specifications.

PDF Annotations – What is involved?

Too often, users with Acrobat or other client based PDF tools look at annotations as just something that can be accomplished with their desktop tools.  In a Hadoop environment, the ability to store the annotation back in the Hadoop repository as a separate secure layer cannot be accomplished with Acrobat or other PDF client tools without software installation on the client machine.  The IT support of a client based tool was one of the reasons Documentum, an ECM vendor, discontinued supporting their own Acrobat annotation software.

OpenAnnotate  supports browser based adding of annotations leveraging the XFDF standard from Adobe.  With OpenAnnotate, Hadoop users can:

  • View a Document in the Browser Window.
  • Add their own annotations
  • See as other annotations are added as separate layers (real time collaboration)
  • Store their annotations back in Hadoop
  • Download embedded in a PDF document for printing/distribution

OpenAnnotate supports all modern browsers including IE9+, Chrome, Safari and Firefox.

Hadoop Repository – How are Annotations Stored?

XFDF files are stored in Hadoop as separate files for each reviewer that are assembled when a document is requested. This ensures that all users are only allowed to add and edit their own annotations. It also allows for users to be collaboratively working together on a document without having to worry who has the document “checked out”.

In this manner, each users’ separate XFDF file can also have separate security.  As a quick example:

  • User 1 might be able to see all annotations
  • User 2 might be able to see all users’ annotations except User 1’s
  • User 3 might not be able to see (or store) any annotations

XFDF versus other proprietary annotations

Many of the other available annotation tools do not follow the open XFDF specification and instead have their own proprietary formats.  In working with our clients, converting these proprietary formats as part of a migration can be very difficult.  With support from Adobe, XFDF is the recognized industry standard and has benefits not found in other formats including:

  • Ability to view the annotations with Acrobat Reader
  • Open specification which prevents vendor lock
  • Ease of migration to/from other tools

Summary

Hadoop users storing PDF documents should look to add PDF Annotations into their business process to leverage their content for collaboration and review.  For documents that are stored in Word or other print formats, please see our integration with Adlib to create PDF renditions of those documents in Hadoop.

Filed Under: Hadoop, OpenAnnotate, Product Suite Tagged With: Hadoop, Open Source, OpenAnnotate

Reader Interactions

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • Hadoop – OpenContent/HPI Product Plans
  • Hadoop Web Service REST API for Enterprise Content Management using TSG's OpenContent
  • TSG Announces Creation of Hadoop Practice
  • Zoom integration and the Alfresco Enterprise Viewer for Document Review
  • Alfresco Enterprise Viewer – Offline Annotation for Efficient Review
  • Alfresco – Do More with OpenAnnotate
  • Video and Audio Annotations – New Functionality in OpenAnnotate
  • Workshare Compare for More Efficient Review and Approval of Document Changes
  • Documentum, Alfresco or Hadoop Annotations : Feature Updates for OpenAnnotate 2.5
  • Documentum D2 – Announcing OpenAnnotate Support

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT