I was talking with a client yesterday that is building a custom application and the topic of viewing performance came up, specifically, “which viewer is the fastest”? The client has looked at a number of common options. This post will discuss the many components of document viewing performance that are similar for Documentum or Alfresco.
Documentum or Alfresco Viewing Performance Background
Typically, to initiate a view, a user needs to perform a search, open a folder or review an inbox/alert listing. In all of these scenarios, a list of documents or the document itself is displayed. It is always assumed that if the user can see the search result, inbox item, or alert, then the user has read access to the document. This post will not address performance of search, folder or inbox listing but focus on what happens once the document link is clicked.
Once the document is selected, typically six steps occur:
- The request for the document is sent to the application server.
- The application server requests the document from Documentum or Alfresco. We’ll call this the “ECM server”
- The Documentum or Alfresco database is accessed to find the pointer to the content, typically in a SAN.
- The document is retrieved from the SAN. If the document needs to be converted to a viewing format (example Word to an Image File), this typically happens in this step. Most “off-the-self” annotation tools will convert the documents into a common format for their annotation server at this step.
- The application server streams back the document contents to the browser. Based on the type of content returned (based on mime type), the browser can determine how to view the content appropriately.
- The browser opens the content in the appropriate viewer.
Some easy buy versus build performance improvements include:
- Network speed – This is typically 95% of the performance bottleneck based on document size. Not as much as an issue now but early on we would see great applications limited in performance by bad networks. At one client, our application had to compete against other web traffic over a satellite link. It doesn’t matter how fast a car is if it is stuck in a traffic jam. Spending money on network bottlenecks always results in better performance.
- Memory – Improving the memory of the application server cache as well as Documentum or Alfresco for the database is also an inexpensive way to improve performance. For the fastest performance, we recommend using ehCache, an open source caching framework that caches the most recently used documents in memory and writes the least accessed documents to disc.
- Additional CPU Processing Power – improving processing power for the transformation engine will improve performance, especially if transformations happen on every view.
Predictive Retrieval – How to improve user viewing performance
Documentum viewing performance is all about improving each of the 6 steps listed previously. Outside of the network or the requesting client device, the best performance improvement focuses on Predictive Retrieval. Predictive retrieval is based on a business scenario, can the application retrieve the documents from Documentum or Alfresco, convert them to the correct format and place them on the application server BEFORE the user selects the document. To illustrate this point, compare a normal retrieval to a predictive retrieval.
Normal Retrieval | Predictive Retrieval |
|
|
|
|
|
|
|
|
|
|
The key to understanding the predictive retrieval is anticipating what will be viewed and having it ready on the application server before the user requests the document. In this manner, the user isn’t waiting for the repository actions or movement of the document from the SAN to the application server. Some common predictive retrieval strategies:
- Retrieve all documents (or newest ones) from a folder once the folder is requested.
- Retrieve all documents (or newest ones) from an inbox once the inbox is retrieved or at the beginning of each day.
- Retrieve annotation images when the document is being viewed.
- Retrieve all documents (or newest ones) from a search result when the search result is requested.
HPI and OpenContent – Document Viewing Performance
For our product, the High Performance Interface (HPI), we focus on PDF viewing for most document types if they are not image formats like JPG or GIF. To speed up viewing performance, we typically recommend that the repository be configured to rendition all documents into PDF upon import so that users aren’t waiting on the documents to be converted at view time.
Also, the PDF format has one advantage for viewing large documents over other document types is that documents can be optimized for web viewing called Fast Web View. This format allows for page-at-a-time downloading (byte serving) from web servers. Users can view the first page of a large document when the other pages are being retrieved.
HPI can also utilize the browser’s built-in caching abilities. For example, the first time a user requests a document, we can return the associated content. However, the next time the user requests the document, OpenContent can detect whether or not the document has changed. If not, OpenContent will return a 304 (Not Modified) response. The browser then displays the document from its cache. This approach avoids two of the “heavy lifting” steps. If the document was not modified, it is not retrieved again from Documentum or Alfresco, and there is no need to stream the document contents over the network to the browser.
OpenAnnotate – Document Viewing Performance
OpenAnnotate is a browser based viewing and annotation tool. To allow viewing and annotation of documents by the browser, PDF documents are converted to multiple one page PNGs before being retrieved by the browser. To improve performance, this conversion can be predicted and take place before the documents are requested by the user.
If the annotation isn’t predicted, OpenAnnotate will request conversion of all pages (in parallel) when the first page is requested. In this manner, similar to the Weboptimized PDF document, the user will have very quick paging when asking for all pages after page 1.
Additionally, OpenContent will cache these images on the application server. This way, when the document is requested again, either by the same user or a different user, the page images are reused. Only when the document is updated does the cache need to be refreshed.
Summary
In addition to normal ECM tuning, memory and network performance, ECM architects should look for ways to utilize predictive retrieval and conversion of documents from the repository to reduce the time users spend waiting for ECM activities. Performance can be improved by converting or renditioning into viewable formats before the user requests the view as well as caching within the browser or application server
If you have any other thoughts on performance, please provide below.