• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
TSB Alfresco Cobrand White tagline

Technology Services Group

  • Home
  • Products
    • Alfresco Enterprise Viewer
    • OpenContent Search
    • OpenContent Case
    • OpenContent Forms
    • OpenMigrate
    • OpenContent Web Services
    • OpenCapture
    • OpenOverlay
  • Solutions
    • Alfresco Content Accelerator for Claims Management
      • Claims Demo Series
    • Alfresco Content Accelerator for Policy & Procedure Management
      • Compliance Demo Series
    • OpenContent Accounts Payable
    • OpenContent Contract Management
    • OpenContent Batch Records
    • OpenContent Government
    • OpenContent Corporate Forms
    • OpenContent Construction Management
    • OpenContent Digital Archive
    • OpenContent Human Resources
    • OpenContent Patient Records
  • Platforms
    • Alfresco Consulting
      • Alfresco Case Study – Canadian Museum of Human Rights
      • Alfresco Case Study – New York Philharmonic
      • Alfresco Case Study – New York Property Insurance Underwriting Association
      • Alfresco Case Study – American Society for Clinical Pathology
      • Alfresco Case Study – American Association of Insurance Services
      • Alfresco Case Study – United Cerebral Palsy
    • HBase
    • DynamoDB
    • OpenText & Documentum Consulting
      • Upgrades – A Well Documented Approach
      • Life Science Solutions
        • Life Sciences Project Sampling
    • Veeva Consulting
    • Ephesoft
    • Workshare
  • Case Studies
    • White Papers
    • 11 Billion Document Migration
    • Learning Zone
    • Digital Asset Collection – Canadian Museum of Human Rights
    • Digital Archive and Retrieval – ASCP
    • Digital Archives – New York Philharmonic
    • Insurance Claim Processing – New York Property Insurance
    • Policy Forms Management with Machine Learning – AAIS
    • Liferay and Alfresco Portal – United Cerebral Palsy of Greater Chicago
  • About
    • Contact Us
  • Blog

Content Deletion in Alfresco – More than Meets the Eye

You are here: Home / Alfresco / Content Deletion in Alfresco – More than Meets the Eye

April 28, 2014

An important and often overlooked component of implementing a content management system is the lifecycle of content once it has been deleted in the repository.  There are a number of things to consider when coming up with a plan for deleting content, including recoverability, performance, and system resource usage.  This article outlines the way that Alfresco handles content deletion and will hopefully bring to light some key decision points that are often ignored during the initial implementation of an Alfresco repository.


It’s worth noting that this article was inspired by a client that has a relatively large-scale Alfresco implementation with a repository of over 10 million documents that occupy multiple terabytes of storage space.  Over time, the client began to struggle with the scalability and performance of a very large database, as well as concerns about the rapidly growing demands for additional storage.  After some investigation, it was discovered that the repository contained an excessive amount of content that had been deleted and was no longer visible from the user interface, but was still consuming valuable database and file system resources.

It probably doesn’t come as a surprise that when content is deleted from the user interface in Alfresco, the metadata and binary content are not immediately deleted from the database and file system.  This provides a safety net so that if content is deleted by mistake, it can be recovered.  It’s important for repository administrators to understand what happens to this deleted content over time and make adjustments as needed in order to ensure that the system will continue to perform well in the future, while using minimal system resources (memory, hard drive space, etc.). Below is a list of the stages of content deletion in an Alfresco repository:

Stage 1 – Move to Trash

  • User deletes content from the user interface (e.g. Alfresco Share).
  • Content is moved to the trash.  In Alfresco 4.1.x and earlier, content could only be restored from the trash by an admin user.  In Alfresco 4.2 and later, users are able to see their own trashcan and can restore any content that they’ve deleted.
  • Behind the scenes, content is moved from the main content store to the archive store.
  • Metadata is still stored in the database, and content is still stored on the file system in the contentstore directory.
  • Content remains in the trash (archive store) until the trash is emptied.  By default, there is no process that automatically purges content from the archive store.

Stage 2 – Empty Trash

  • Content can be individually removed from the trash, or the trash can be emptied entirely.
  • If the trash is not managed properly, it can grow in size very quickly.  It’s important to note that for performance reasons, emptying the trash from the user interface will only delete the first 1000 items in the trash.
  • When the trash is emptied, the metadata is still stored in the database, but is marked as deleted.  Binary content is still stored on the file system in the contentstore directory.
  • Content remains in the database and on the file system until jobs run to purge the content.

Stage 3 – Content Store Cleaner Job Runs

  • Alfresco has a Content Store Cleaner job that runs daily.  The purpose of this job is to move the binary content that has been deleted from the trash (archive store) from the contentstore directory on the file system to the contentstore.deleted directory.
  • The Content Store Cleaner job runs daily at 4:00 a.m. by default.  This schedule can be adjusted via configuration if desired.  The job should run during off-peak hours.
  • The Content Store Cleaner job does not move content to the contentstore.deleted directory until 14 days after it was removed from the trash.  This provides another safety net in case content was inadvertently deleted.  The 14 day window can be adjusted via configuration as well.
  • Once content is move to the contentstore.deleted directory, it remains there permanently.  If desired, this directory can be safely purged manually by a system administrator.

Stage 4 – Database Node Cleaner Job Runs

  • Alfresco has a Database Node Cleaner job that runs daily.  The purpose of this job is to remove all traces of a piece of content from the database once it’s been deleted from the trash.
  • The Database Node Cleaner job runs daily at 9:00 p.m. by default.  This schedule can be adjusted via configuration if desired.  The job should run during off-peak hours.
  • The Database Node Cleaner job does not remove an item until 30 days after it was removed from the trash.  This 30 day window can be adjusted via configuration as well.
  • Once the metadata is removed from the database, the metadata can be considered to be permanently deleted.  The only way to recover the metadata would be to restore the database from backup.

As you can tell, the process for deleting content in Alfresco is way more than meets the eye.  Most of the time, the default settings and configurations work well, but there are a couple of sticking points that should be considered:

  • Content must be manually deleted from the trash in order for the other cleanup processes to kick in.  Many clients are not even aware of the trash, since it was only available to admin users in Share until version 4.2 was release.  If content is deleted frequently in your repository, it might be a good idea to implement a scheduled job to automatically purge content from the trash after a certain number of days to prevent buildup.
  • Content must be manually deleted from the contentstore.deleted directory on the Alfresco server in order to recover the hard drive space that is consumed by deleted content.  It is safe to delete the contents of the contentstore.deleted directory, provided there are no business rules that required deleted content to be retained for an extended period of time.

Hopefully this article has shed some light on how Alfresco handles deleted content by default, and what adjustments and manual intervention can be done to modify the default process.  Feel free so share your thoughts below.

Filed Under: Alfresco, Tech Tip

Reader Interactions

Comments

  1. Enrique Garcia says

    December 23, 2016 at 1:26 am

    Is there another way to empty trash can avoiding the limit of 1000 items?

    Reply

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Search

Related Posts

  • Alfresco Performance – Making Property Queries Faster
  • Alfresco and Angular 2.0: Technical First Look
  • Avoiding Alfresco Performance Problems from Day 1 – Keeping My ECM Healthy
  • Alfresco’s Transactional Metadata Query System – Mystery Results
  • Upgrading to Alfresco 5 – Keys to Success
  • Harnessing the Power of Alfresco Data Lists for Cascading Value Assistance
  • Auto-Numbering Content in Alfresco
  • Auto-Filing Content in Alfresco
  • Alfresco Post-Installation Configuration
  • Alfresco Data List-Driven Value Assistance

Recent Posts

  • Alfresco Content Accelerator and Alfresco Enterprise Viewer – Improving User Collaboration Efficiency
  • Alfresco Content Accelerator – Document Notification Distribution Lists
  • Alfresco Webinar – Productivity Anywhere: How modern claim and policy document processing can help the new work-from-home normal succeed
  • Alfresco – Viewing Annotations on Versions
  • Alfresco Content Accelerator – Collaboration Enhancements
stacks-of-paper

11 BILLION DOCUMENT
BENCHMARK
OVERVIEW

Learn how TSG was able to leverage DynamoDB, S3, ElasticSearch & AWS to successfully migrate 11 Billion documents.

Download White Paper

Footer

Search

Contact

22 West Washington St
5th Floor
Chicago, IL 60602

inquiry@tsgrp.com

312.372.7777

Copyright © 2023 · Technology Services Group, Inc. · Log in

This website uses cookies to improve your experience. Please accept this site's cookies, but you can opt-out if you wish. Privacy Policy ACCEPT | Cookie settings
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT