Storing and Protecting Data
Given massive data growth across all industries, Information Lifecycle Management or ILM has become accepted as a critical business goal many organisations hope to achieve over time. Most organisations recognise that they cannot simply continue to store and then blindly manage data of all types on primary storage. That data which has immediate relevance to active business processes merits a place on high-performance/high-availability primary storage. It also warrants special attention with frequent or continuous data protection and business continuance processes.
Most data, however, is not immediately relevant to ongoing operations and does not need to be highly available or immediately recovered in response to failures or disaster. Last year’s ILM Survey by eMedia and BridgeHead Software suggests that 80% of data has not been accessed within the last 90 days and at least 60% will not be accessed ever again. Obviously, this data need not be stored on the most expensive storage technology and it need not consume the expensive equipment and operational costs of continuous or frequent data protection/recovery infrastructure. Clearly, as the odds that information is not going to be accessed increase over time, the underlying data should be migrated to progressively less expensive media. The question is how does an IT organisation determine which data should be migrated? And, if data is to be migrated to secondary storage, how should it be stored, protected, or secured? Is the data required by law or corporate practices to be available for long periods of time? Based on the answers to these questions different storage policies will have to be selected.
The difficulty in implementing ILM is that it requires the entire organisation to be disciplined, systematically classifying information so that IT management of the underlying data can be clearly defined and automated. Few organisations have even remotely reached this level of information management. In the meantime data is still growing exponentially and IT has had to develop an alternative approach. This process is called Data Lifecycle Management or DLM. DLM is an automated approach towards optimising the placement and data management techniques used for data throughout its lifecycle. It operates on what we already know about data from its attributes and textual or other analytically induced content. From the resulting data classification, policies can be created to automate the repositioning of data and to correctly apply other data management rules for creating the appropriate number of spare data copies to ensure data protection, business continuance, long-term retention, and compliance.
Protected Data Lifecycle Management takes DLM one step further and is a more comprehensive and disciplined approach to managing the data lifecycle. The goals of Protected DLM include:
1. Protect data throughout its lifecycle – whether online or in the archive, data must be protected. Traditional HSM products may relocate data to less expensive storage. However, they still require routine backup of the repository and therefore do not save much in the way of storage management costs. With Protected DLM, the archive is written with multiple copies potentially to multiple media types and locations, automatically backing itself up and providing rapid accessibility for disaster recovery scenarios.
2. Secure data that is copied into an archive – prevent unauthorised access, encrypt it, and place it on a secure medium such as WORM.
3. Manage data retention and destruction – automatically select what needs to be retained and apply a retention policy that ensures the data is both accessible during its lifecycle and that all instances of it are immediately destroyed upon expiration (after all data is not only an asset, but after its useful lifecycle, often a liability).
4. Assimilate data for corporate governance – most data – particular end-user data – is not truly under corporate control. A DLM archive should provide index and searching on both attributes and contents to give the organisation the ability to rapidly find the underlying information assets within the archive.
5. Guarantee data authenticity – Keeping data secure in a non-editable, non deletable environment with proof of authenticity via digital hashing algorithms applied upon retrieval.
6. Ensure regulatory compliance – regulated data requires set levels of retention, accessibility, access control, and authentication. If removable media is used (as it often must be to meet certain regulatory requirements), the physical location of media should be managed. Also, regulations often call for retention far beyond the lifetime of the media, requiring a reliable strategy for data migration over time to updated media.
Protected DLM begins with the automated analysis of data structures to identify data that should be copied or repositioned to a secondary storage archive. The data can involve any number of formats and applications from raw, unstructured user files and email to generic databases and specialised applications. Since the automated decision to move data off primary storage may not always be correct, the DLM system must be able to provide accessibility to the data in the event it is wanted. Archiving products usually make this possible at various levels including transparent access via stubbing or placeholders within the file system or database application or alternative access through a specialised interface to the archive repository. Either way, if data that has been repositioned to alternative storage is needed, it should be easy, if not transparent, to bring back.
The Protected DLM model intelligently integrates archiving with the critical functions of backup. The process highlights the distinction between archiving and backup and the need for both technologies to address different business problems. The purpose of backup is to create copies of the online environment that can be recovered rapidly in the event of failure or data loss. Backup is oriented towards storing and moving large amounts of data and it does not purport to make data in backup savesets immediately available. The purpose of archiving is to provide an alternate, secure place for data that must be kept for long periods of time. Archiving provides a granular level of management over data that backup does not. Not only can each data entity put in the archive be retained, migrated, and stored according to its own rules, but the archive ensures that the data can be quickly located and restored. With Protected DLM, archived data does not need to be backed up routinely because the archive consists of multiple repository copies, some of which can be removed or located offsite alongside backup tapes.
The differences between backup and archiving are not stressed here to discredit either approach, but rather to emphasise the importance of both.
This is why Protected DLM is fundamentally different to both traditional HSM utilities and data classification products. Protected DLM integrates data protection, business continuance, and disaster recovery strategies into the long-term retention and management of data as its lifecycle requirements cause it to be copied into and subsequently repositioned entirely to a secondary storage archive. It does this by allowing archives to be defined as multiple copies on multiple media types and it uses a distributed architecture to allow these copies to be written and managed at different network locations. Protected DLM represents the full integration of archiving with other vital storage management processes into a single enterprise-wide facility for ensuring that data is available for both operational and disaster recovery, that it is protected and compliantly retained for suitable periods, and that the most cost effective storage technology can be leveraged to minimise storage and storage management costs.