Claude failed to help brainstorm names for this feline. Photo courtesy of Nick J.

A couple of months back, my good friend Nick J. tried using generative AI to brainstorm names for his family’s new kittens. Rather than generating a list of names, Nick’s brainstorming buddy flagged the query as inappropriate due to a misunderstood context and denied Nick’s request. It was a simple ask that raised a red flag and highlighted the fact that AI can unexpectedly fail. At the time, I wasn’t too concerned, but it does open up a set of questions about reliability and oversight.

*Claude refuses to come up with names for Nick’s cats. This is a cat-tastrophe*

If at first you don’t succeed, you try again.

*Access to server denied. This is purr-plexing.*

Annoyance sets in:

*Is this a meow-stake in the LLM’s guardrails?*

As I finished reading Yuval Noah Harari’s Nexus: A Brief History of Information Networks from the Stone Age to AI and am in the middle of Mustafa Suleyman’s The Coming Wave: Technology, Power, and the Twenty-First Century’s Greatest Dilemma, their dystopian tone is a bit concerning. I’m a techno-optimist, but the nameless kitten kerfuffle and the potential doomsday scenarios do present safety, bias, and governance challenges for tech leaders.

For CDAOs and CIOs, this raises a couple of questions.

Given that we’re not privy to AI’s inner workings, how do organizations balance AI’s utility versus risks without sacrificing safety or reliability?
And philosophically, who decides what’s safe and what’s not?

In other words, who watches the watchman?

This blog will explore how to balance the competing demands of utility and safety in large language models (LLMs), drawing from OpenAI’s red teaming framework, governance best practices, and a few AI gotchas.

The Powers and Perils of LLMs

The utility side

It shouldn’t come as a shock that investment in AI is at an all-time high and LLMs are said to be the panacea of productivity. The social media banter posits that AI Agents will autonomously automate just about everything, finally realizing the longstanding dream of data-driven decision-making at scale.

As far as I can tell, AI is currently being used for:

Distilling large corpora of PDFs to some mediocre bullet points
Writing and debugging code
Frustrating actual people with new and improved customer service chatbots that are not all that helpful
Creating emoji-laden LinkedIn posts that are getting longer and longer

But there’s hope. LLMs are being used for some interesting use cases. In healthcare, they can assist with diagnostics, and in finance, they automate compliance processes. However, even as these systems excel in structured applications, the double entendres and unpredictability of human language introduces challenges.

The risk side

Human language isn’t just structured communication; it’s an amalgamation of context, culture, and nuance. And don’t forget, 90% of it is non-verbal. Consider the example from the MIT Technology Review article How OpenAI Stress-Tests Its Large Language Models which pointed out that a seemingly simple word like “eggplant” can trip up models because of its varied cultural meanings and slang connotations.^[1] If you ask for a picture of one, it’s ok to generate one if it’s part of a meal, but it may not be appropriate to generate one with a female holding one. But what if she was a farmer? If a model struggles with something as mundane as Solanum melongena, how can it handle an environment where questions are as varied as the people who ask them? And let’s not forget the bad actors who try to jailbreak these systems for fun.

This complexity creates specific risks for LLM deployments:

Bias: Training data reflects cultural and societal biases that affect outputs.
Confabulation: Models make up sh** and generate plausible but false information like quotes, facts, case studies, and nonexistent references.
Security Risks: Adversarial actors can exploit linguistic nuance to manipulate outputs, as seen in sophisticated prompt injection attacks.
Costs: Just because you have a generative AI hammer doesn’t mean everything is a nail.

The Generative AI Hammer. Is Everything a Nail?

How can CDAOs and CIOs ensure that their gen AI chatbots navigate the complexity of language while delivering on their utilitarian potential?

Who Is the Watchman? Defining Oversight Roles

The role of the CDAO

Data stewardship

One of the CDAO’s many responsibilities is to be the guardian of good data. They need to ensure the quality, representativeness, and fairness of the data used to train AI models. But, since the models are prohibitively expensive to train and maintain, they’ll need to work with vendors to understand the guardrails they have in place to prevent unwanted AI outputs. They’ll need to direct their technical teams to design evaluation metrics, drawing inspiration from OpenAI’s stress-testing process, to catch issues like the “eggplant problem” before they escalate.

AI governance and ethics

The CDAO is also a key player when it comes to AI governance and ethics. They’ll need to work with cross-functional teams to keep tabs on the ever-evolving landscape of AI regulations and ethics frameworks in order to implement governance frameworks to ensure that the pursuit of innovation doesn’t compromise values. They can follow OpenAI’s diverse and automated external red teaming exercises and overall testing approach to discover and address ethical blind spots in enterprise AI applications.^[2]^{,^[3]}

The role of the CIO

Technology integration

The CIO’s role is to make sure that LLMs integrate with the existing data estate and IT ecosystem without introducing instability or breaking the bank. They develop the infrastructure to support experimentation and sandbox environments. But, their job number one is to work with the CISO to make sure that core mission-critical systems are safe and secure. Balancing innovation with security and safety is certainly a balancing act, but one that’s essential for the sustainable adoption of AI.

Risk mitigation

The CIO is also the organization’s first line of defense against AI risks. Similar to the CDAO, they’ll need to be adept at the latest red teaming approaches to establish guardrails for operational use. In partnership with the business, they deploy access controls to prevent misuse and leverage content moderation APIs to filter problematic outputs. After all, you don’t want your AI quoting incorrect bereavement fares and causing a PR nightmare.^[4]

Collaboration: A shared accountability model

AI oversight is a team sport where the entire business needs to play. CDAOs, CIOs, and functional areas of the business must collaborate to balance objectives with operational realities. Business stakeholders and leadership set the strategic agenda. The CDAO can then focus on the ethics of using customer data to fine-tune an LLM or what data is appropriate to use in RAG implementations while the CIO makes sure that the infrastructure is secure and compliant.

Nick’s nameless kittens and the “David Mayer” debacle is a prime example of why shared oversight matters.^[5] Who decided that “David Mayer”, “Jonathan Turley”, “David Faber”, “Jonathan Zittrain”, and “Brian Hood” should not be uttered in ChatGPT-land? Now imagine if the stakes were higher, like in healthcare or finance. AI that makes up words in medical transcripts or a compliance breach could be more consequential.^[6]

In the end, the watchman’s role is not to put a framework in place for responsible, safe innovation. CDAOs, CIOs, and the business can unlock AI’s utility while mitigating its risks by closely collaborating with one another.

Balancing Safety and Utility in Practice

Red-teaming

Cross-functional internal teams are core to simulating LLM risks and testing for misuse. These teams act as the first line of defense by identifying potential vulnerabilities and edge cases. The OpenAI paper has some great tips and provides guidance on structuring and leveraging these teams.

However, internal teams are not enough as they often have inherent biases or blind spots that emanate from the culture within the organization. Engaging third-party experts brings a different perspective and helps uncover scenarios that may have been overlooked. This external validation augments the internal efforts and strengthens the overall risk mitigation strategy.

Practical governance frameworks

Step 1 – Identify high-risk use cases: Not all LLM applications carry equal risk. It’s crucial to prioritize governance efforts based on the potential impact of failures. Creating an Eisenhower matrix helps map out areas where LLM failures would have severe consequences, such as compliance breaches or customer-facing tool malfunctions. For example, creating marketing offers is certainly less risky than opening bank accounts for providing insurance policy information.

Step 2 – Establish automated testing protocols: Once the high-stakes use cases are identified, organizations must decide on the optimal testing approach. Depending on the specific use case, this could involve relying solely on internal red-teaming, collaborating with external experts, or, most likely, adopting a hybrid model. The choice depends on the organization’s resources, legal obligations, and risk tolerance.

Step 3 – Continuous monitoring and iteration: AI governance is a continuous process. As foundation models evolve and are updated, new risks emerge. This means that testing frameworks also need to be updated. Automated tools can help monitor new risks in real time and periodic feedback from internal and external red teams can provide qualitative insights.

Metrics for balancing safety and utility metrics

Measuring the effectiveness of LLMs involves evaluating key utility metrics such as accuracy, speed, and cost-efficiency. However, defining and quantifying these metrics is not straightforward and warrants a separate discussion. Do you have an automated system in place to fact-check everything that is spit out by an LLM? What is accuracy in this day and age? With the proliferation of misinformation and AI-generated drivel, this seems quite challenging.

Equally as problematic are safety metrics. Measuring confabulation rates, bias detection, and compliance adherence help ensure that LLMs are not generating inaccurate, offensive, or non-compliant content. However, measuring these things presents its own set of unique challenges. Take image generation, for example; how can you monitor if it reinforces stereotypes or infringes on someone else’s IP? The same goes for audio and video.

Balancing safety and utility in LLMs is a never-ending battle. It requires a proactive approach and constant vigil. CDAOs and CIOs can start by prioritizing use cases that present the most amount for risk, establishing testing protocols, and implementing an AI governance framework.

Conclusion

Circling back to Nick and his kitty conundrum, the quest to find the real “David Mayer”, and what types of images will LLMs serve up when asked about eggplants. It’s a simple reminder that AI is far from perfect and can stumble over seemingly simple tasks due to the ambiguity of language and misunderstood contexts. The fact that we are building in guardrails is a good sign and may serve as a foundation to stave off the dystopian future in the books I’m reading. But, the question still remains: for the frontier models, who is the ultimate decider or right and wrong? And how do those decisions affect your business?

The path forward isn’t about choosing between innovation and responsibility, but rather, it’s about finding the right balance. For CDAOs and CIOs, this means creating frameworks that capture AI’s goodness while mitigating its badness. It means fostering a culture of continuous improvement, where each “AI-gotcha” becomes an opportunity to refine systems and practices.

If we get this right, AI can become truly helpful. It can help us brainstorm, innovate, and solve problems in ways we’ve only imagined. Now that Nick has found the perfect Scottish Gaelic names for his cats Dolaigh and Dubhsinth and “David Mayer” is back in the ChatGPT-o-sphere, we can all rest easy.

Is your organization ready to lead responsibly in this AI-powered world?

Nick J’s cat Dubhsith – Photo courtesy of Nick J.

If you enjoyed this article, buy a book, like, comment, share, and follow me on Medium and LinkedIn.

^[1] Heaven, Will Douglas. “How OpenAI Stress-Tests Its Large Language Models.” MIT Technology Review, November 21, 2024. https://www.technologyreview.com/2024/11/21/1107158/how-openai-stress-tests-its-large-language-models/.

^[2] OpenAI. Diverse and Effective Red Teaming. 2023. https://cdn.openai.com/papers/diverse-and-effective-red-teaming.pdf.

^[3] OpenAI. OpenAI’s Approach to External Red Teaming. 2023. https://cdn.openai.com/papers/openais-approach-to-external-red-teaming.pdf.

^[4] Garcia, Marisa. “What Air Canada Lost in Remarkable ‘Lying’ AI Chatbot Case.” Forbes, February 19, 2024. https://www.forbes.com/sites/marisagarcia/2024/02/19/what-air-canada-lost-in-remarkable-lying-ai-chatbot-case/.

^[5] Metz, Cade. “David Mayer, a Driving Force Behind ChatGPT, Dies at 45.” The New York Times, December 6, 2024. https://www.nytimes.com/2024/12/06/us/david-mayer-chatgpt-openai.html.

^[6] Burke, Garance, and Matt O’Brien. “Researchers Say AI Transcription Tool Used in Hospitals Invents Things No One Ever Said.” Associated Press, November 2, 2023. https://apnews.com/article/ai-artificial-intelligence-health-business-90020cdf5fa16c79ca2e5b6c4c9bbb14.