The Alfresco 4.0 Community Edition has been out since October, with an upcoming Enterprise release in the next month or so. Alfresco has dubbed Alfresco 4.0 as “the most significant release” of Alfresco to date. Instead of summarizing the major new functionality features and enhancements, below are some new features in the core Alfresco repository which will be provided in the upcoming release and how they compare to functionality provided on the Documentum platform:
File System Transfer Receiver / Social Content Publishing
Alfresco has always provided the ability to publish to a file system as part of the AVM. This functionality has finally been added to the core DM repository. For purposes such as WCM, this is great for publishing static images and content. This is an extension of the existing Transfer Service developed to publish to runtime Alfresco instances. Additionally Social Content Publishing provides a framework to publish to external content delivery services such as Flickr, Twitter, Facebook, YouTube, SlideShare, etc. If required, the framework allows to create and register custom publishing channels, whether they are internal or external in nature.
In the Documentum product suite, Interactive Deployment Services (formerly Site Caching Services) drives content publishing. By default, content is published to a file system and metadata to a set of database tables. The release of Interactive Deployment Services added an integration of xDB to allow for two way communication, similar to the use of an Alfresco Runtime repository.
Interactive Deployment Services is licensed as a separate repository component, whereas Alfresco has included this as part of the core repository functionality. The File System Transfer Receiver by default does not publish metadata to a set of database tables, but could easily be done by extending Transfer Services capability or leveraging an Alfresco Runtime instance. Publishing metadata to Solr or NoSQL databases are common approaches as well.
Solr Integration
Alfresco has now added Solr integration in the Alfresco platform. Lucene has long been an integral part of the core Alfresco repository, indexing content and metadata for search. In some situations, In-transaction indexing could hamper repository performance for bulk imports. Solr can now be deployed separate from the repository for better performance and scalability.
This reminds me of the progression of full text search in the “old” days when Documentum used to leverage Verity for full text search. Verity was deployed as part of the repository, and not separated out. It wasn’t until FAST was introduced that a separate server was required. With the release of xPlore, this is also the case, and makes sense given the memory and I/O resources required to index content. So far, xPlore is a huge improvement over FAST, in regards to performance and scalability.
From a feature perspective, both the xPlore and Alfresco Solr integrations are very similar. First, both can scale independently as the repository grows. Second, metadata, content (for Full Text), and security ACLs are also indexed to provide faster search performance. Third, they follow a model of eventual consistency, asynchronously indexing content which may vary depending upon the load of the indexer. Finally, faceted search capabilities are also provided, giving guided search based on defined attributes.
Although generally both xPlore and Alfresco’s Solr integreation are similar in terms of functionality provided, there are differences in the underlying architecture. xPlore is compromised of both Lucene and an integration with xDB (formerly the x-Hive database), whereas Alfresco is leveraging Solr, an open source search platform based on Lucene. xPlore therefore leverages xQuery, translated to Lucence queries, as its primary query language, and Alfresco leverages Lucene directly.
One difference we’ve noticed between the repositories is that, since Alfresco’s SOLR integration is “eventually consistent”, it is not trivial to guarantee the accuracy of search results relative to the data in the system at a given moment. While Documentum’s xPlore index is also designed to be eventually consistent, DQL provides a means of querying transactionally consistent metadata by querying directly against the database. This is something to consider for application developers, and can be worked around but not necessarily an issue for end users.
Attribute Encryption
In Alfresco, attributes may now be encrypted, providing an additional layer of security to prevent viewing of secure metadata unless you have the proper access. Alfresco provides mechanisms to generate the Keystore and apply them and register them with the repository. Documentum does not provide individual attribute encryption instead Documentum Trusted Content Services encrypts the file store where the physical content is located. Alfresco has not yet exposed encryption at the file store, but could be handled at the operating system or storage level.
Documentum offers Trusted Content Services, which has always included file store level encryption of content, but not at the metadata later. Alfresco has not yet exposed encryption at the filestore level as well. It seems that both database and filestore encryption could be accomplished to the OS or Database level as well, but would be nice to see both implemented in both repositories.
In the next post I’ll focus on Alfresco 4.0 Workflow and Share features and functionality.
Thank you for the post. Can you explain how xPlore uses Lucene, but then doesn’t run into scalability issues when multiple xPlore servers are implemented in a cluster? A straight Lucene implementation has to sync all of the indices stored in the cluster which which hampers scalability.
xPlore isn’t a straight Lucene implementation. It leverages and xml database (xDB) as well. In one example, you can create a multi-instance of xPlore and separate which content gets indexed in specific “collections”. So if you have 3-4 different Documentum applications running against a single docbase, search can be separated out for each content type dedicated to that application. xDB actually controls the indexing to Lucene, stores ACL info, and groups.
For me it’s not clear what do they mean by ” A straight Lucene implementation has to sync all of the indices stored in the cluster ” ?
We do not share lucene indexes between nodes. Each server-node works with a separate index.
Yes – that is how the current Alfresco repositories function in a clustered configuration. Each server manages its own Lucene index. On the Documentum xPlore side if you have a multi-instance, some type of NAS/SAN must be implemented and shared if the multi-instances are on different servers.
Thanks for the post. I found it very useful as some of the encryption details are not at all clear on Alfresco Wiki. I currently use Alfresco and have a new requirement to do file store encryption to prevent folks with file system access to read the contents from Alfresco content directories. If I understood your post correctly, Alfresco encryption cannot do this as it can only encrypt metadata, do you have any recommendation on whether it is better to look at alternate like Documentum or look at standalone file system encryption software and use that in conjunction with Alfresco? Your response is much appreciated.