Amazon Web Services provides not only cost benefits but also a variety of different capabilities that can enhance an Alfresco implementation compared to on premise implementations. This post will present the unique features from our “Alfresco – Moving to Amazon Web Services Whitepaper”.
Taking Advantage of Amazon Services for your ECM Implementation
As mentioned throughout this paper, hosting Alfresco on AWS provides many benefits over on-premise solutions. For innovative Alfresco customers, other benefits are available, particularly when it comes to the storage of the documents themselves and S3. This figure illustrates AWS services used by several of our clients. As described in earlier sections, there are minimal components for a successful deployment. This diagram shows how our clients are innovating on AWS.
This section will present some of those ideas for consideration.
- Glacier Archives & Vault Lock –With the large volumes of data stored in S3 there is the potential to save money by moving content to Glacier’s low-cost storage tier. Glacier data is stored within Archives that are then stored within Vaults. Beyond the simple storage of content, Glacier works with S3’s content lifecycle rules and Glacier Vaults can implement a Vault Lock policy to enforce compliance controls concerning the retention and disposition of documents.
- S3 API –S3 has a robust API for directly accessing stored objects. TSG has taken advantage of this capability within our OpenContent Management Suite to upload/download content directly from the S3 object store to increase performance, particularly for large files and video streaming, while reducing the load on the Alfresco server.
- Metadata on S3 – Amazon supports metadata on S3, up to a 2K limit. TSG still recommends all metadata be stored in Alfresco, but having some of the metadata on S3 allows for some creative solutions, including replication as well as potential for searching S3 directly for objects. An additional method to store metadata on S3 is as either JSON or XML files alongside content files in the S3 bucket. By storing the metadata in a separate file, it is available to additional AWS services such as Athena, CloudSearch, and RedShift.
- S3 Lifecyles – Lifecyles are an optional S3 feature for controlling the storage behavior of an object within an S3 bucket. For example, a lifecycle can specify that after 60 days an object should move to S3 Infrequent Access (S3-IA) storage and then after an additional 30 days move to Glacier and finally after 2,555 days (7 years) from its creation be destroyed. A good lifecycle can enforce compliance rules and save significant storage costs.
- AWS CloudFront – CloudFront is a Content Distribution Network (CDN) with capabilities to publish S3 objects to edge storage locations around the world. Cloudfront provides for streaming and quick access to the S3 object store without going through the Alfresco API to store or retrieve the object, a feature not easily replicated with an on-premise Alfresco solution. TSG has taken advantage of this capability within our OpenContent Management Suite to upload/download content directly from the S3 object store to increase performance, particularly for large files and video streaming, while reducing the load on the Alfresco server.
- AWS Elastic Transcoder – Amazon has been processing video for years. AWS Elastic Transcoder can handle a myriad of video and audio formats, transforming them from one file format to another. This is ideal for rendering videos into formats for streaming or annotating. It becomes possible to accept several common video formats from users and transcode them into a single format for consistency and use in OpenAnnotate Video as mp4 files. Elastic Transcoder uses AWS Simple Queue Service (SQS) to process transcoding jobs, moving the files to and from S3 buckets.
- AWS AutoScaling – Over time, the typical document management solution has a slow and steady increase of content and usage. However, for scenarios which require a large ingestion of content initially or a huge increase in users, Amazon EC2 provides the flexibility to scale up or down as needed. Alfresco’s Quick Start uses Chef to bootstrap and dynamically add and remove instances from the auto-scaling group.
- AWS CloudWatch – AWS CloudWatch is used to monitor the health and behavior of EC2 instances and other AWS services. Applications may also send metrics to CloudWatch so they can be observed. AWS AutoScaling can be triggered by CloudWatch metrics, for example, a decision to launch another EC2 instance can be made if the existing CPU usage exceeds 80% for 5 minutes. Multiple thresholds for behavior can be defined and alarms set to alert a Simple Notification Service (SNS) topic which might send an email or text message to alert an administrator. The CloudWatch service provides the toolset to establish proactive management of an AWS solution.
- Encryption – AWS provides encryption within several services. For Alfresco solutions, AWS offers encryption within the S3 object store, EBS volumes, and RDS databases. AWS Certificate Manager provides a hassle-free means for creating and managing SSL certificates. With AWS encryption, additional software components like Alfresco encryption are no longer required. Encryption keys can be controlled, rotated, and renewed by AWS or by the customer using AWS Key Management Service (KMS).
- AWS CloudTrail & VPC Flow Logs – These two AWS services provide the means to monitor and respond to low-level application and network traffic. AWS CloudTrail records AWS API-level traffic and ships the logs to S3. These logs can be used to track actions against services and to troubleshoot issues. The VPC Flow Logs are available in AWS CloudWatch and will monitor the IP traffic going into and out of the VPC. This data can help with troubleshooting configurations and knowing what traffic is coming into the instances and from where.
To download the full whitepaper that includes a step by step guide on how to install and migrate from Legacy or On-Premise to Alfresco in Amazon Web Services, select on the download button below.
[…] This post has surveyed the most common architecture used to deploy a high availability Alfresco system on AWS. There are many additional AWS services that support a high availability environment you can read more about them in our Alfresco on AWS white paper. […]