How much of the data created and replicated should be stored?
The amount of data created and replicated experienced unusually high growth in 2020 due to the dramatic increase in the number of people working, learning, and entertaining themselves from home, according to IDC.
However, less than 2% of this new data was saved and retained into 2021 – the rest was either ephemeral (created or replicated primarily for the purpose of consumption) or temporarily cached and subsequently overwritten with newer data.
“In 2020, 64.2ZB of data was created or replicated, defying the systemic downward pressure asserted by the COVID-19 pandemic on many industries and its impact will be felt for several years,” said Dave Reinsel, Sr VP, IDC’s Global DataSphere.
“The amount of digital data created over the next five years will be greater than twice the amount of data created since the advent of digital storage. The question is: How much of it should be stored?”
Global data creation and replication is forecast to experience a compound annual growth rate (CAGR) of 23% over the 2020-2025 forecast period.
Other key findings
- IoT data (not including video surveillance cameras) is the fastest-growing data segment, followed by social media.
- Data created in the cloud is not growing as fast as data stored in the cloud, but it is still growing faster than the aggregate DataSphere.
- Data creation at the edge is growing almost as fast as that in the cloud.
- The enterprise DataSphere will grow two times faster than the consumer DataSphere due to the increasing role of the cloud for storage and consumption.
Driven by the steady growth in the amount of data created and replicated, the unabated expansion of the StorageSphere is expected to produce a five-year CAGR of 19.2% in the installed base of storage capacity across the globe. While not all data created or replicated is saved (or needs to be saved), growth of data creation does ultimately drive growth of the StorageSphere installed base.
“The Global StorageSphere installed base of storage capacity reached 6.7ZB in 2020, and is steadily growing, but at a slower annual growth rate than that of the Global DataSphere, meaning we are saving less of the data we create each year,” said John Rydning, research VP, IDC‘s Global DataSphere.
“Organizations should consider preparing now to store more data as they seek to achieve digital transformation milestones and improve business metrics by accelerating innovative data analytics initiatives.”
Three reasons why the world should store more of the data it creates
First, data is crucial to any organization’s efforts to establish digital resiliency – the ability for an organization to rapidly adapt to business disruptions by leveraging digital capabilities to not only restore business operations, but also capitalize on the changed conditions. Data enables digital resiliency because business is dependent on data.
Second, digitally transformed companies use data to develop new and innovative solutions for the future enterprise. Companies are quickly discovering that having more data not only helps affirm the direction they are heading, but also creates opportunities to launch new revenue streams in their seemingly saturated product portfolios.
Third, companies must monitor the pulse of their employees, partners, and customers to maintain the high levels of trust and empathy that ensure customer satisfaction and loyalty. Data is the source for this pulse.
Many organizations believe there is latent, potentially unmined value from analyzing older data. Yet the cost to store more data holds organizations back from modifying their data retention policies that would lead to retaining data longer. This is a factor that is expected to continue to be a headwind for faster expansion of the Global StorageSphere until organizations begin to show a positive ROI on data analytics initiatives, especially with older data.
Proven ROI on analytics initiatives would buttress the need for storing more data or retaining data longer.