VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More
AI and generative AI is changing how software works, creating opportunities to increase productivity, find new solutions and produce unique and relevant information at scale. However, as gen AI becomes more widespread, there will be new and growing concerns around data privacy and ethical quandaries.
AI can augment human capabilities today, but it shouldn’t replace human oversight yet, especially as AI regulations are still evolving globally. Let’s explore the potential compliance and privacy risks of unchecked gen AI use, how the legal landscape is evolving and best practices to limit risks and maximize opportunities for this very powerful technology.
Risks of unchecked generative AI
The allure of gen AI and large language models (LLMs) stems from their ability to consolidate information and generate new ideas, but these capabilities also come with inherent risks. If not carefully managed, gen AI can inadvertently lead to issues such as:
- Disclosing proprietary information: Companies risk exposing sensitive proprietary data when they feed it into public AI models. That data can be used to provide answers for a future query by a third party or by the model owner itself. Companies are addressing part of this risk by localizing the AI model on their own system and training those AI models on their company’s own data, but this requires a well organized data stack for the best results.
- Violating IP protections: Companies may unwittingly find themselves infringing on the intellectual property rights of third parties through improper use of AI-generated content, leading to potential legal issues. Some companies, like Adobe with Adobe Firefly, are offering indemnification for content generated by their LLM, but the copyright issues will need to be worked out in the future if we continue to see AI systems “reusing” third-party intellectual property.
- Exposing personal data: Data privacy breaches can occur if AI systems mishandle personal information, especially sensitive or special category personal data. As companies feed more marketing and customer data into a LLM, this increases the risk this data could leak out inadvertently.
- Violating customer contracts: Using customer data in AI may violate contractual agreements — and this can lead to legal ramifications.
- Risk of deceiving customers: Current and potential future regulations are often focused on proper disclosure for AI technology. For example, if a customer is interacting with a chatbot on a support website, the company needs to make it clear when an AI is powering the interaction, and when an actual human is drafting the responses.
The legal landscape and existing frameworks
The legal guidelines surrounding AI are evolving rapidly, but not as fast as AI vendors launch new capabilities. If a company tries to minimize all potential risks and wait for the dust to settle on AI, they could lose market share and customer confidence as faster moving rivals get more attention. It behooves companies to move forward ASAP — but they should use time-tested risk reduction strategies based on current regulations and legal precedents to minimize potential issues.
An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.
So far we’ve seen AI giants as the primary targets of several lawsuits that revolve around their use of copyrighted data to create and train their models. Recent class action lawsuits filed in the Northern District of California, including one filed on behalf of authors and another on behalf of aggrieved citizens raise allegations of copyright infringement, consumer protection and violations of data protection laws. These filings highlight the importance of responsible data handling, and may point to the need to disclose training data sources in the future.
However, AI creators like OpenAI aren’t the only companies dealing with the risk presented by implementing gen AI models. When applications rely heavily on a model, there is risk that one that has been illegally trained can pollute the entire product.
For example, when the FTC charged the owner of the app Every with allegations that it deceived consumers about its use of facial recognition technology and its retention of the photos and videos of users who deactivated their accounts, its parent company Everalbum was required to delete the improperly collected data and any AI models/algorithms it developed using that data. This essentially erased the company’s entire business, leading to its shutdown in 2020.
At the same time, states like New York have introduced, or are introducing, laws and proposals that regulate AI use in areas such as hiring and chatbot disclosure. The EU AI Act , which is currently in Trilogue negotiations and is expected to be passed by the end of the year, would require companies to transparently disclose AI-generated content, ensure the content was not illegal, publish summaries of the copyrighted data used for trainin, and include additional requirements for high risk use cases.
Best practices for protecting data in the age of AI
It is clear that CEOs feel pressure to embrace gen AI tools to augment productivity across their organizations. However, many companies lack a sense of organizational readiness to implement them. Uncertainty abounds while regulations are hammered out, and the first cases prepare for litigation.
But companies can use existing laws and frameworks as a guide to establish best practices and to prepare for future regulations. Existing data protection laws have provisions that can be applied to AI systems, including requirements for transparency, notice and adherence to personal privacy rights. That said, much of the regulation has been around the ability to opt out of automated decision-making, the right to be forgotten or have inaccurate information deleted.
This may prove challenging to deploy given the current state of LLMs. But for now, best practices for companies grappling with responsibly implementing gen AI include:
- Transparency and documentation: Clearly communicate the use of AI in data processing, document AI logic, intended uses and potential impacts on data subjects.
- Localizing AI models: Localizing AI models internally and training the model with proprietary data can greatly reduce the data security risk of leaks when compared to using tools like third-party chatbots. This approach can also yield meaningful productivity gains because the model is trained on highly relevant information specific to the organization.
- Starting small and experimenting: Use internal AI models to experiment before moving to live business data from a secure cloud or on-premises environment.
- Focusing on discovering and connecting: Use gen AI to discover new insights and make unexpected connections across departments or information silos.
- Preserving the human element: Gen AI should augment human performance, not remove it entirely. Human oversight, review of critical decisions and verification of AI-created content helps mitigate risk posed by model biases or data inaccuracy.
- Maintaining transparency and logs: Capturing data movement transactions and saving detailed logs of personal data processed can help determine how and why data was used if a company needs to demonstrate proper governance and data security.
Between Anthropic’s Claude, OpenAI’s ChatGPT, Google’s BARD and Meta’s Llama, we’re going to see amazing new ways we can capitalize on the data that businesses have been collecting and storing for years, and uncover new ideas and connections that can change the way a company operates. Change always comes with risk, and lawyers are charged with reducing risk.
But the transformative potential of AI is so close that even the most cautious privacy professional needs to prepare for this wave. By starting with robust data governance, clear notification and detailed documentation, privacy and compliance teams can best react to new regulations and maximize the tremendous business opportunity of AI.
Nick Leone is product and compliance managing counsel at Fivetran, the leader in automated data movement.
Seth Batey is data protection officer, senior managing privacy counsel at Fivetran.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!