Closing the data divide: How to create harmony among data scientists and privacy advocates
Balancing data privacy within an organization is no easy task, particularly for data scientists who need quick access to data, and security and governance teams whose job it is to protect it. Too many of our customers have told us they are being inundated with tickets for requesting data access, which is a task that sounds simple and easy to accommodate but, in practice, is not.
In typical cloud data architectures, there is no magic button for IT or data architects to gain instant access to the different data sets that are created by users across the enterprise and often distributed across different cloud services. As our customers will attest, this is a real need that forces organizations to search for a solution that won’t compromise data privacy and security requirements.
The real problem in the quest for both data usability and privacy is the difficulty of deploying a scalable compliance solution across multiple cloud services. Migration to the cloud is an unstoppable evolution of data storage, analytics, and reporting. Yet, this plethora of cloud data services creates an untenable nightmare for those looking to keep tabs on — and maintain access control over — the data being created, stored, and shared.
According to Gartner, the world-wide public cloud services market is expected to grow 18.4% in 2021 to total $304.9 billion, up from $257.5 billion in 2020. McKinsey & Company mirrors the statistic, saying they expect about 35 percent of all enterprise workloads to be on the public cloud by 2021, and anticipate 40 percent of companies will use two or more infrastructure-as-a-service (IaaS) and software-as-a-service (SaaS) providers.
With the escalating scrutiny of regulatory authorities around the world moving towards increased control of personally identifiable information (PII) data, enterprises must embark on a journey that not only migrates data to the cloud but also satisfies the privacy and security requirements outlined by data governance teams and chief security officers.
Clearing a path to fast, scalable data privacy and security
In order to effectively leverage and protect all data assets, careful consideration must be made to ensure data privacy and security policies are implemented in a consistent manner across the enterprise. This is because, regardless of the infrastructure, the same regulations apply to the data itself. For example, social security numbers cannot be divulged, only specific portions of a credit card number can be displayed, and names/addresses of customers cannot be conjoined in a way that would allow unscrupulous hackers to easily steal someone’s identity.
Unfortunately, IT and security teams are often left with a piecemeal approach that straddles different methods and user interfaces to implement data security and compliance policies across the myriad of cloud services. Keep in mind, the speed at which an organization deploys a data access control solution is important; however, a fast rollout must never exceed its ability to handle petabytes of data. A rollout that is quick yet stops working once there is too much data is a futile and often expensive effort to correct.
The longevity of a data access control solution’s success often lies in its ability to scale due in part to the sheer volume of data migrating to the cloud. Intuitive and visually appealing user interfaces are equally needed aspects of a complete solution.
When it comes to selecting a data access control solution to manage the privacy of customer account data – payment card and healthcare data, social security numbers, membership points, credit scores, and bank account info — IT and security teams must ask themselves how much risk is acceptable. For instance, a solution that works in pilot projects may not work as effectively in production scale environments.
To ensure success at the onset, here are some key questions to ask when selecting a data access control solution include:
- How long does it take to deploy across all cloud services?
- What deployment options are available? Any limitations on future usability?
- Does the performance of the solution change based on data volume?
- Can all data types be secured by the solution?
- Once implemented, are data scientists able to execute queries of data at performance expectations?
The key to closing the data divide lies in adequate planning and verification of a solution’s capabilities as data teams need access to information in a timely manner. They also need to run queries without performance limitations from the data privacy and security solution itself.
Failure to manage the needs of data scientists with those in charge of data privacy and security can prohibit the organization from uncovering “the next best decision” or from gaining long-term benefits due to a lack of scalability.