Hadoop Web Service REST API for Enterprise Content Management using TSG's OpenContent

Many of our ECM clients often develop their own Web Services layer to isolate their applications from the back-end repository as well as provide a vehicle to add in their own services to talk to other non-ECM systems. OpenContent was developed as part of our Documentum practice to give clients a standard web services architecture with an open source approach. OpenContent is now available for the Hadoop NoSQL database, HBase. This post will detail the web services available in our first release with examples and explanations.

CRUD Webservices through OpenContent

The main focus of the first release of OpenContent with Hadoop was enabling CRUD (‘Create’, ‘Read’, ‘Update’, ‘Delete’) operations for documents in Hadoop. The primary endpoints that represent these operations are as follows: ‘upload’, ‘properties/content’, ‘checkout/checkin’, and ‘delete’. In this section, we will explore the usage of these endpoints to create, read, update, and delete a document.

upload – The upload endpoint is used to create a document using the content of a file and some provided metadata about the document. The OpenContent upload endpoint uses a POST REST call. The body of the POST request is the file to add to our Hadoop ECM repository. This endpoint returns a string that represents the unique document ID of the document that was created in the Hadoop repository:

/OpenContent/rest/content/upload?objectType=sop&prop-title=ProcedureXYZ&prop-department=Labeling

Where:

objectType – parameter describing the type of the document that we wish to create in the Hadoop repository
prop-* – parameters that represent the attributes on the object type whose values we want to set

properties – The properties endpoint is used to retrieve the properties of a document from the Hadoop repository. This endpoint uses a GET REST call. This endpoint returns a JSON object that contains all of the properties of the requested document.

/OpenContent/rest/content/properties?id=9adc0f50-b1f9-4336-bf68-44e0c2d51f27

Where:

id – represents the unique ID number of the document whose properties will be returned

content – The content endpoint is used to retrieve the content of a document. This endpoint uses a GET REST call. The endpoint then streams back the content of the requested rendition of the document, or the “native” rendition if none is requested:

/OpenContent/rest/content/content?id=9adc0f50-b1f9-4336-bf68-44e0c2d51f27& contentType=pdf

Where:

id – represents the unique identifier number of the document whose content will be returned
contentType (optional) – rendition of the document to request the content of

checkout – The checkout endpoint is used to checkout a document for editing. This endpoint uses a POST REST call. This endpoint returns a boolean, which is true if the checkout was successful and false otherwise:

/OpenContent/rest/content/checkout?id=9adc0f50-b1f9-4336-bf68-44e0c2d51f27

Where:

id – represents the unique identifier number of the document to check out from the repository

checkin – The checkin endpoint is used to check in a document after the document has been checked out for editing. This endpoint uses a POST REST call. This endpoint returns a string that represents the id of the version of the document that was created:

/OpenContent/rest/content/checkin?id=9adc0f50-b1f9-4336-bf68-44e0c2d51f27&majorVersion=false

Where:

id – represents the unique identifier number of the document being checked back into the Hadoop repository
majorVersion – indicates whether or not to major version the document (moving from version 1.1 to 2.0 would be an example of major versioning whereas moving from version 1.1 to 1.2 would be an example of minor versioning a document).

delete – The delete endpoint is used to delete a document. This endpoint uses a DELETE REST call. This endpoint returns a boolean, which is true if the deletion was successful and false otherwise:

/OpenContent/rest/content/delete?id=9adc0f50-b1f9-4336-bf68-44e0c2d51f27&allVersions=false

Where:

id – represents the unique identifier number of the document to be deleted from the Hadoop repository
allVersions indicates that all of the versions associated with the document should be deleted. Otherwise, only the version indicated by the id will be deleted.

Search/Query Hadoop with a REST API

OpenContent provides a search and querying webservice so that front end applications can create a UI/UX that quickly returns document metadata to users. As referenced in a previous post, Hadoop is not designed to execute typical ECM queries (such as “show me all the documents where ‘department’ is ‘Labeling’). This is where the Lucene based Solr search appliance (itself a REST API) is an excellent complement to Hadoop. OpenContent leverages Solr to do the fulltext indexing and searching for documents stored in the Hadoop repository.

search – This endpoint performs queries against the data stored in Hadoop leveraging the Solr index that is created when a document is stored in Hadoop. The endpoint returns a “json” structure that can easily be traversed to create a rich UI for search results, such as TSG’s HPI interface:

/OpenContent/rest/search/{type}/paramName=department&paramValue=Labeling&fulltext=Product XYZ&sortAttr=objectName

Where:

type – object type of the documents to be included in the search
paramName / paramValue (optional) – pairs of metadata fields to search on and their values to match. There are more robust parameters that exist to support date ranges, greater/less than.
fulltext (optional) – text to search for in the fulltext indexed “content” of the document
sortAttr – attribute in the Hadoop repository to sort the search results by

Please let us know your thoughts on your approach to a REST API for storing/retrieving documents in Hadoop using the comments below.

Reader Interactions

Comments

Leave a ReplyCancel reply