What businesses need to know about data decay
Data decay is the aging and obsolescence of data in such a way that makes it no longer usable due to loss of its integrity, completeness, and accuracy. Data that can no longer be easily understood, cannot be effectively leveraged and, therefore, lacks value.
In the next five years, it is expected that more than 180 zettabytes of data will be created, which means that data decay is certain to happen at an even faster rate than today.
We live in a world fueled by data, and with that, the worth of many organizations is dependent on the data they collect. To that end, business success often hinges on the risks and damages caused by poorly maintained data — and the resulting data decay. This is ultimately why it is critical for company leaders to understand data decay and how to manage it.
How does data decay occur?
There are several scenarios that can lead to data decay. The most common occurrence is when customer records – such as sales, marketing and CRM data – are not maintained. In systems that are constantly changing and evolving to meet business needs, linkages and completeness of data sets can quickly become broken and out of date if not properly maintained. Typically, there is no single source of data in any organization but instead data repositories span multiple platforms, formats, and views.
Another factor leading to data decay is the human element. Often at some point in the journey, data is manually entered. The moment a mistype or incorrect information is entered into a system, data inconsistency, poor data hygiene and decay can occur.
Enterprises are copying data at an average of 12 times per file, which means that a single mistake can have a compounded impact with exponential damages.
Furthermore, all data has a lifecycle — meaning data is created, used and monitored and, at some point, it becomes no longer appropriate to store and must be securely disposed of.
Why should businesses care about decayed data?
Data decay is often a symptom of poor data management, and little or no data lifecycle processes in place. The resulting dangers include poor visibility into all data across the business.
From a security and regulatory standpoint, the ability to ensure the safety and protection of data assets under an organization’s care is built on having the fundamental knowledge of where all data is located, regardless of whether it is poorly maintained or actively updated and integrated within primary business applications.
Finding and remediating decayed data
Finding data, whether it is in a decayed state or still intact, requires the ability to perform discovery across each data repository used within the business. This will include unstructured data locations such as files, emails, cloud storage and big data repositories. Structured locations (e.g., live databases and live cloud data analysis platforms) are more likely to be vulnerable to data decay.
Another approach to avoid data decay is to seek out ROT (redundant, obsolete, and trivial) data, which is typically symptomatic of older, underused data that is no longer important to the business.
In these circumstances, the best strategy is to delete such data permanently and irreversibly. Doing so will further reduce the risk of violating major privacy and security legislation which requires organizations to ensure that data is only stored if a business justified need exists.
What’s the best way to prevent data decay?
Data decay is bound to happen for almost every organization. Aging and hoarding an over-abundance of files is not uncommon, but businesses should still take proactive steps to prevent decay. The following processes are suggested:
- Reducing manual processes through automation
- Ensuring all data creation occurs at its source (e.g., from the customer) with strong input validation and, where possible, independent verification (e.g., address database, checksum validation of government IDs, etc.)
- Validating strong, secure linkage between all record sets with data integrity checks occurring on a regular basis within all data stores
- Continuous monitoring of all data locations to ensure that the personnel responsible for verifying data integrity and volume know where the data is in the first place
The value of data insights is driving business growth in every industry. However, the ability to manage that data and ensure its confidentiality, integrity and availability remains one of the greatest challenges that organizations face.
With data used and stored across various endpoints, servers, emails, business applications, third parties, and cloud storage, the likelihood of data decay or data loss is a real threat.
When embarking on the journey of improving data quality, data management, or data lifecycle initiatives, an often overlooked but fundamental step is to first understand all data that exists. To achieve this visibility, organizations should apply data discovery tools to all data repositories. Then, data should be organized and catalogued into segregated groups according to sensitivity level. By accomplishing this level of visibility, organizations will be better equipped to avoid the risk of data decay through ongoing awareness and subsequent management of all data.