ChatGPT and data protection laws: Compliance challenges for businesses
In this Help Net Security interview, Patricia Thaine, CEO at Private AI, reviews the main privacy concerns when using ChatGPT in a business context, as well as the risks that businesses can face if they betray customers’ trust.
Thaine also discusses the steps businesses can take to ensure they are using ChatGPT in a way that respects customer privacy and aligns with data protection laws.
What are the main privacy concerns when using ChatGPT in a business context?
When using ChatGPT in a business context, one major concern revolves around the potential sharing of Personally Identifiable Information (PII). Although it may appear convenient to input customer service queries into the tool and receive personalized responses within seconds, this process involves transmitting personal customer details, such as names, addresses, and phone numbers, which might even include sensitive information like sexual orientation, to OpenAI.
While ChatGPT undoubtedly streamlines the drafting of email responses, it simultaneously exposes PII to a third party. This exposure could pose risks of compliance violations, customer privacy, and confidentiality, thus increasing the risk of data breaches and unauthorized access.
Another significant privacy concern involves the safeguarding of business secrets. For instance, if employees upload sensitive code or proprietary information to ChatGPT, there is a possibility of unintended disclosure. As generative AI models like ChatGPT can learn from user inputs, the system may inadvertently include proprietary data in its generated responses. This could jeopardize a company’s competitive advantage, compromise intellectual property, or violate confidentiality agreements.
What risks do businesses face regarding compliance with data protection laws when using ChatGPT?
ChatGPT is not exempt from data protection laws, such as the General Data Protection Regulation (GDPR), the Health Insurance Portability and Accountability Act (HIPAA), the Payment Card Industry Data Security Standard (PCI DSS), and the Consumer Privacy Protection Act (CPPA).
Many data protection laws require explicit user consent for the collection and use of personal data. For example, the GDPR mandates that businesses obtain consent for all processing activities involving users’ personal data. By utilizing ChatGPT and sharing personal information with a third-party organization like OpenAI, businesses relinquish control over how that data is stored and used. This lack of control increases the risk of non-compliance with consent requirements and exposes businesses to regulatory penalties and legal consequences.
Additionally, data subjects have the right to request the erasure of their personal data under the GDPR’s “right to be forgotten.” When using ChatGPT without the proper safeguards in place, businesses lose control of their information and no longer have mechanisms in place to promptly and thoroughly respond to such requests and delete any personal data associated with the data subject. Failure to comply with these requests can result in non-compliance issues and potential fines.
Moreover, businesses must consider the security risks associated with utilizing ChatGPT. Incidents like the bug that exposed ChatGPT users’ chat history, highlight the importance of data security and the potential impact on compliance. Data breaches not only compromise the confidentiality and integrity of personal data but can also lead to severe compliance violations and reputational damage.
What steps can businesses take to ensure they’re using ChatGPT in a way that respects customer privacy and aligns with data protection laws?
To ensure businesses are using ChatGPT in a way that is compliant and respects customer privacy, here are some guidelines:
Provide comprehensive employee training: Businesses must prioritize data privacy training for all employees who handle user data. This training should cover the safe and legal use of data, including specific considerations related to ChatGPT and its privacy concerns.
Obtain explicit customer consent: Prior to collecting any Personally Identifiable Information (PII) from customers and utilizing it within ChatGPT, businesses should obtain clear and explicit consent. Customers should be informed about the purpose of data collection, how their data will be used, and any third parties involved in the process. Providing customers with the option to opt out of data collection is also crucial to respect their privacy preferences.
Anonymize PII before processing: To safeguard customer privacy and minimize the risk of compliance violations, businesses should implement accurate data minimization measures. This involves removing PII before feeding it into ChatGPT. Data minimization is a requirement of the GDPR whereby companies are meant to collect and use only the necessary PII to accomplish a task.
Furthermore, wherever possible companies should aim to anonymize the data and bring in experts to assess the robustness of the anonymization process for their data, By removing personal identifiers from their data, companies can vastly limit the risk of customer information exposure, while still benefiting from the insights generated by the model.
Regularly review and update data protection policies: It is essential for businesses to maintain up-to-date data protection policies that explicitly address the use of AI models like ChatGPT. These policies should encompass data retention and deletion procedures, data minimization practices, incident response plans, and guidelines for handling customer inquiries or requests related to data privacy. Regular reviews and updates ensure that policies remain aligned with evolving privacy regulations and industry best practices.
By following these steps, businesses can demonstrate their commitment to customer privacy, and mitigate the risks associated with ChatGPT usage.
How important is it for businesses to obtain customer consent before collecting their PII and putting it through ChatGPT?
Obtaining customer consent before collecting their PII and utilizing it with ChatGPT is of utmost importance for businesses, as it is required by a number of data protection regulations worldwide, including the GDPR, which applies to EU citizens regardless of where they are in the world. Failure to comply may result in substantial fines, not to mention reputation damage.
Other select requirements from data protection regulations include:
Right to be forgotten: Data protection regulations also grant users the right to request that all instances of their personally identifiable information be deleted. Businesses should be able to promptly and appropriately respond to such requests, which means they need to keep the data within their organizations in order.
Data minimization: Data protection principles require companies to remove all PII that is unnecessary from the data they collect. Businesses must ensure they have a valid legal basis for collecting and processing specific data, avoiding unnecessary collection of PII and reducing privacy risks.
Pseudonymization or anonymization: Pseudonymization involves replacing identifying information from data with placeholders that can be linked back to the original data, while anonymization ensures that the data cannot be linked back to an individual. Pseudonymizing data where possible is a requirement of the GDPR.
How does using a synthetic PII generator contribute to data privacy when using ChatGPT?
Synthetic PII replaces real personally identifiable information with synthetic personally identifiable information. For example, “Maria went to the store” can become “Anna went to the store.” or “My account number is 894234” can become “My account number is 054274”
It helps in creating more natural looking data to train LLMs on, all the while making it possible to keep the maximum amount of the original context without sacrificing privacy. In addition, if any PII is missed and the PII that was caught is replaced with synthetic data, it becomes very difficult to distinguish between the original data and the fake data.
That said, we have found that replacing PII with markers like [NAME_1], [PHONE_NUMBER_1], etc. also allows to maximize the amount of preserved context and minimize privacy risk without sacrificing the utility of the data at inference time (i.e., upon response to the prompt).