Why compliance for LLMs is so important (and yet so hard)

ChatGPT was released in late November 2022. It quickly became the fastest product to reach one-million registered users. In a mere five days. 

Even if you don’t know the numbers, you already know the story.

AI was a dream and then a sudden reality. All corners of the world have been affected: from the Hollywood actors’ strike to ServiceNow attributing a large portion of their 27% YoY growth to GenAI. Understanding AI use was more than a mere “approaching need” in 2023. And it is already more than necessary in 2024.

AI is already here. Regulation–despite calls from every conceivable side–can only race to catch up.

AI in compliance

Contrast the meteoric rise of ChatGPT and all-things-AI with another corner of technology: information security.  Compliance orthodoxy is not known for its ability to move quickly. ISO 27001 was introduced in 2005. HITRUST rolled out in 2007. SOC 2 launched in 2010. Yet these standards are only just now becoming universal standards.

We take these frameworks for granted now as established standards, but consider what was happening in 2010 when SOC 2 came to market.

It emerged as a standard in response to increasing pressure to provide some methods of verifying good infosec practices. It was instituted by financial accountants (the AICPA) as an offshoot of financial controls, not as a direct response to the unfolding business environment at the time.

The business environment in 2010? Cloud. And Cloud–even then–was big business. Consider these stats: 

Before AI, Cloud was the most recent major shift in the tech industry, and it was well on its way to being the predominant paradigm for operating a technology business. Purchasing online was well established even before acronyms like SaaS and WFH had taken root.

SOC 2 was, and still is, an excellent compliance framework to ensure essential security in a Cloud-based world. However, as exemplified by the numbers above, it was already behind when it came to market. It wasn’t even truly adopted as the standard it is today until the mid 2010s. 

This story should sound familiar. The need for agreed upon compliance frameworks for AI has arrived, but the frameworks themselves have yet to arrive. We must ask ourselves why.

Technologic opacity, black boxes, and other hurdles 

One of the main culprits is technology opacity: LLM (large language model) technology is hard to understand, and in some cases literally incomprehensible by humans. The first companies to train AI models contribute opacity by remaining intentionally secretive about their underlying training data. Partially this secrecy is for standard industrial protection, but more uniquely to AI, there is a risk in using authors’ works without their consent. Companies that admit to using an author’s work could encounter copyright claims, bad publicity, or accusations of biased training data. 

A more fundamental concern is the black box problem with AI. The deep learning techniques used in training models mean that they are fundamentally beyond human comprehension; people cannot understand how connections are made nor why the connections are made. There are two ideas that are helpful in understanding this: complication and complexity. 

A system is complicated if there are lots of moving parts, but ultimately the inputs and outputs of the entire system can be grasped. In particular, the impact of changes to and within complicated systems are predictable. 

In contrast, a system is complex if the ways in which elements interact are unpredictable. Complex systems are characterized by small alterations leading to potentially dramatically different outcomes. 

Complicated systems, for example a 747, take a lot of time to understand and study, but can ultimately be described with deterministic-like predictive accuracy. Complex systems, however, for example the weather system the 747 is flying through, resist prediction and description because tiny changes to initial conditions or actions can lead to significant changes. 

Making complicated and complex compliant

“Traditional” digital systems are extremely complicated. Anyone who has worked with an app of even moderate size can tell war stories of factors coalescing to produce infernal bugs. But there is also the understanding that even the most difficult issue can be resolved through time, focus, and exploration. The system is, after all, understandable and deterministic. 

The construction of LLMs and use of AI is not just complicated, however, but also complex. There are infinite sliding factors: A small change in a prompt; training on slightly different sets of data; changing the temperature; submitting the same question multiple times. All of these changes result in dramatically different answers. 

These challenges around the black box of AI are precisely what makes it so difficult to establish effective regulations and compliance standards. Conversely, it’s also what makes the task so important. 

Cloud technology, in hindsight, feels quaint compared to the technology underpinning AI. Yes, it’s still very complicated, but it’s the complexity that resists regulation. SOC 2’s 80-some controls will probably not suffice to ensure this new  technology is employed safely and ethically.

For now, adapting the old standards is a good start: identify what past solutions will not work, document why, and create new standards where necessary.

Smart regulation or standards are hard in the best scenarios as they take experienced practitioners with deep knowledge of how the technology works. LLMs resist easy explanations of how they work and yet have long since passed the adoption threshold to require real regulation. The potential for complex systems to have outsized impacts–good and bad–mean that smart constraints are necessary.

Share this post with your network: