April 20, 2024
How transparent are AI models? Stanford researchers found out.

VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More

Today Stanford University’s Center for Research on Foundation Models (CRFM) took a big swing on evaluating the transparency of a variety of AI large language models (that they call foundation models). It released a new Foundation Model Transparency Index to address the fact that while AI’s societal impact is rising, the public transparency of LLMs is falling — which is necessary for public accountability, scientific innovation and effective governance.

The Index results were sobering: No major foundation model developer was close to providing adequate transparency, according to the researchers — the highest overall score was 53% — revealing a fundamental lack of transparency in the AI industry. Open models led the way, with Meta’s Llama 2 and Hugging Face’s BloomZ getting the highest scores. But a proprietary model, OpenAI’s GPT-4, came in third — ahead of Stability’s Stable Diffusion.

CRFM Society Lead Rishi Bommasani and his team, including CRFM Director Percy Liang, evaluated 10 major foundation model developers, including OpenAI, Anthropic, Google, Meta, Amazon, Inflection, Meta, AI21 Labs, Cohere, Hugging Face, and Stability. The team designated a single flagship model for each developer and rated each based on how transparent they are about their models, how they’re built, and how they’re used. The team broke the scores down into 15 categories including data, labor, compute, and downstream impact. In a recent related effort, the team evaluated model compliance with the EU AI Act

An ‘expansive notion’ of transparency

Liang pointed out that the Index focused on a “much more expansive notion” of transparency than simply whether a model is proprietary or open.


AI Unleashed

An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.


Learn More

“It’s not that the open source models are gaining 100% and everyone else is getting zero, there is quite a bit of nuance here,” he explained. “That’s because we consider the whole ecosystem — the upstream dependencies, what data, what labor, what compute went into a building the model, but also the downstream impact on these models.”

LLM companies are not homogenous

While Amazon’s Titan model received the lowest scores, Bommasani explained that this doesn’t mean there is anything wrong with the model. “There is really no reason those scores couldn’t be higher, I think it’s just the matter of Amazon coming into this later than, say, OpenAI.” Up until now, there may not have been norms around some of the transparency categories, he added. “Hopefully once this is out, some people inside these companies will go hey, we really should be doing this because all of our competitors are — I hope this will become a basic thing that people come to expect.”

Overall, “the basic point is that transparency matters,” he continued, adding that transparency is not a monolithic concept. “The companies are not homogenous about what they’re doing,” he said. “It’s not like all of them are good at data and bad at disclosing some compute.” For example, he explained that Bloom, Hugging Face’s model, does risk evaluation. “But when they built BloomZ from it they didn’t carry over this kind of analysis of risk and mitigation,” he said.

A transparency ‘pop quiz’

Liang added that the Index is also a framework for thinking about transparency — and the results are simply a snapshot in time.

“This is 2023, where companies didn’t see this coming,” he explained. “This is actually kind of a pop quiz in some sense. I’m sure that over the next few months things will improve, there will be more pressure to be more transparent and naturally, companies will want to do more of the right thing.”

In addition, he pointed out that some changes would be easy to make. “Others are harder, but I think there’s just a low or medium-hanging fruit that companies really ought to be doing,” he said. “I’m optimistic that we’re going to see some change in the coming months.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Source link