We have recently had some interesting discussions with clients in regards to storage of documents and objects on premise with cloud options for the meta-data. One of the advantages of the TSG solutions is that we can support a variety of alternatives for both on premise as well as cloud based document and object storage. This post will discuss the various alternatives.
Cloud Based Document and Object Storage – What are the concerns?
When it comes to storing documents and other objects in the cloud, typically clients can have the following concerns:
- Performance – depending on cloud connectivity and use case, on premise solutions that are located closer to the user will typically perform faster than options in the cloud.
- Ownership – given a focus on privacy, many clients are concerned about allowing certain content to be stored in the cloud.
- Cost – given the new options and pricing of on premise large storage options, it is not always a cost-effective solution to put all content into the cloud, particularly content that will be accessed often as most cloud vendors have a low initial storage cost but charge monthly for storage and for each retrieval of your own content
- Migration – for large repository customers, there will be a need to move large amounts of content to the cloud either leveraging bandwidth or with some bulk solution like Amazon Snowball. This process can be expensive and time consuming.
While TSG will work with clients to reduce the concerns and costs, we have seen a comeback of sorts from the on-premise object storage vendors with pricing and hyper-convergent options that don’t always make the cloud the cheaper option. We are also seeing clients considering cloud options on premise. See solutions like AWS Storage Gateway or AWS Outpost that look to take advantage of cloud capabilities on premise.
Cloud for Meta-Data storage – what are the options
As we have mentioned before in regards to ECM 2.0, typical legacy ECM repositories rely on a relational database for storing metadata, relationships, security and other data components surrounding document storage. For modern approaches, NoSQL options are available like TSG’s efforts with HBase/Hadoop and DynamoDB. The ability to store the database component of the system in the cloud provides for a number of different hosted options that take advantage of the ease of management and scaling of the cloud without the process to move and pay for all the content to be stored in the cloud.
For legacy relational database approaches, we would recommend clients consider the following options:
- Amazon Web Services – Amazon provides a Relational Database Service (RDS) for running multiple types of databases in a managed service environment. Both open source ( PostgreSQL, MySQL, MariaDB) as well as Oracle and SQLServer are available. AWS also has two flavors of Aurora (MySQL and PostgreSQL). Moving to a different database vendor is also easier with AWS’ Database Migration Service.
- Microsoft Azure – Microsoft provides their Azure SQL Server options for a managed MS SQL instance on Azure. They also offer PaaS databases for MySQL and PostgreSQL.
- Google Cloud – GCP offers a fully managed SQL instance on MySQL, PostgreSQL and MS SQL.
For modern NoSQL approaches, we would recommend clients consider the following options.
- Amazon Web Services – TSG recommends DynamoDB. See our results from our 11 billion document benchmark.
- Microsoft Azure – TSG has had success with our clients leveraging Azure’s HDInsights platform to quickly spin up an HBase environment to store the metadata for TSG’s ECM running on top of a managed HBase instance.
- Google Cloud – TSG is current conducting a benchmark coordinated with Michigan State to test Google’s BigTable NoSQL database with TSG’s ECM on NoSQL database offering. Our initial experience is that it is consistent with AWS’s DynamoDB and Azure’s HDInsights promise of a simple, scaleable NoSQL database.
For either a relational DB, or a NoSQL DB that is being moved to the cloud, since all of the solutions just store a pointer to the actual content file, the migration path of moving the DB is typically just a shift of the database to the new database system. Since the APIs abstract the access of the files, regardless of where the content is stored, users will be able to get access to the content leveraging the same access patterns that they are used to, and with the same responsiveness as they are used to since the content is still located on-premise.
Those considering leveraging the cloud for their ECM 2.0 solution don’t necessarily have to migrate all components to the cloud if they are more comfortable and have cost-effective and high performance on-premise options. By leveraging solutions that can support both cloud and on-premise storage, ECM 2.0 clients can choose either storage options (or both) to best support their use case.
Let us know your thoughts below.