Google on scaling differential privacy across nearly three billion devices
In this Help Net Security interview, Miguel Guevara, Product Manager, Privacy Safety and Security at Google, discusses the complexities involved in scaling differential privacy technology across large systems. He emphasizes the need to develop secure, private, and user-controlled products while effectively addressing the complexities of integrating such technologies into existing systems.
Guevara also outlines the rigorous processes of optimizing these technologies to ensure user data is protected without sacrificing functionality.
Google recently achieved the largest known differential privacy application across nearly three billion devices. Can you elaborate on the challenges of scaling differential privacy technology to a massive level and how Google addressed issues like computational costs and scalability?
As developers, it’s our responsibility to help keep our users safe online and protect their data. This starts with building products that are secure by default, private by design, and put users in control. Everything we make at Google is underpinned by these principles, and we’re proud to be an industry leader in developing, deploying, and scaling new privacy-preserving technologies (PETs) that make it possible to unlock valuable insights and create helpful experiences while protecting our users’ privacy.
Differential privacy is one of the key PETs we’ve invested in over the past decade. We’re proud to have achieved this most recent feat, but it wasn’t without its challenges along the way. One of the primary roadblocks in building the infrastructure is doing it in an efficient and scalable manner. We went through several attempts on the architecture we were implementing until we landed on one that was efficient enough for our purposes. Many times with deploying PETs, we are hitting new ground, and there is no prior art to guide this type of on-the-fly development.
That’s why iteration is key, along with testing our architectures along the way to find the optimal solution. We did a lot of testing to make sure that the infrastructure was able to handle such a massive amount of data generated from almost 3 billion devices. Differential privacy on-device adds some challenges. To preserve differential privacy, we need to create synthetic observations. This can increase on-device processing and result in increased battery and memory usage. So we needed to make sure that we optimized resources across the device fleet, and were also capable of receiving an increased data load as a result of the synthetic observations.
Differential privacy is known to be challenging to integrate due to its technical complexity. What were the primary technical hurdles Google faced in integrating differential privacy across products like Google Home and Google Search, and how were they overcome?
There are various settings of differential privacy: the local, central and shuffler model. Each of them has different complexities. Integrating differential privacy with Trends in Google Search was challenging because we needed to build differential privacy into an existing system with constraints on how much change we could introduce into that system. Hence, it took our team of researchers some time to work with the Trends team to come up with an algorithm that fit well.
Furthermore, whenever adding new features that are based on differential privacy, we need to find opportunity spaces where differential privacy provides a net user benefit. With Google Trends, we were able to identify one of these opportunities, enabling a new use case that wasn’t previously possible. Our team learned that many local reporters can struggle to find insights in Trends when searching for fairly niche results that didn’t meet prior thresholds. After talking with the Trends team, we were able to come up with a solution that relied on differential privacy that would unlock these new use cases. This was a natural win-win where implementing differential privacy unlocked value for a new set of users – and that’s always our end goal when deploying PETs.
Google used differential privacy to improve the reliability of Matter-compatible devices in Google Home. Can you explain how differential privacy insights helped improve connectivity and user experience in these devices?
For the Google Home example, we used the infrastructure we built for shuffle differential privacy. This infrastructure is very versatile, in the sense that it can perform local, central and shuffle differential privacy, along with other privacy-preserving mechanisms when collecting data. The Home team had a challenge of identifying a number of Matter-related crashes. They relied on our shuffle infrastructure to obtain insights into those devices.
By relying on this infrastructure, the Home team was able to identify Matter devices that were trying to connect using Google Home but failing to do so. This insight into the devices that were crashing and how they were crashing allowed them to isolate the problems and quickly release fixes for them. It’s part of the behind-the-scenes magic we like to shine a light on for our PET work.
Google has made strides in open-sourcing privacy-enhancing technologies like Fully Homomorphic Encryption (FHE) and federated learning. How does Google envision these resources impacting the cybersecurity community, particularly for developers working with sensitive data?
We understand that the barriers of entry for some of these technologies are high. Our primary motivation is to reduce the cost for developers to experiment with these technologies. FHE is also a particular case in the sense that the community is nascent. With FHE, we want to provide the community with tools that can help accelerate FHE development and demonstrate its usefulness in a range of applications.
We also hope that by showing a variety of examples where we use these technologies, we can encourage others to try to implement these technologies for similar classes of problems. Our long term goal is to create a virtuous cycle where some of our implementations of PETs incentivize others to implement PETs in new fields, and that in turn gives us new ideas for using PETs in some of our features. As we’ve noted in the past, a rising tide raises all ships – and this is especially true for deploying PETs more widely to make the internet a safer place for all.
With the release of PipelineDP4j, differential privacy is now accessible to Java developers. What were the primary motivations behind this Java Virtual Machine (JVM) release, and how does it aim to broaden the adoption of differential privacy among developers?
We have a long history of open-sourcing differential privacy libraries. Over the years, our goal has been to be transparent with our algorithms so that they can be inspected by independent researchers, and reduce the barrier of entry for those trying to use differential privacy (and other privacy technologies). Making them freely available is our commitment to increasing adoption around the world, which is why we’ve focused on releasing our libraries in as many developer languages as possible.
We open-sourced PipelineDP a couple of years ago, in collaboration with OpenMined. PipelineDP is based in Python. We know that many developer workflows exist in Java, and we wanted to build a solution for those developers. We hope that more developers will be able to build applications with differential privacy and are excited about all the new use cases to come, especially as technologies continue to evolve everyday.