Three AI Guardrail Strategies for Managing the Unpredictable in AI

Implement Guardrails Before Crossing the AI Chasm — Photo by Author David E Sweenor

Introduction

If you’re in the market for a new car, you may want to consider interacting with the dealer’s chatbot. One customer was interacting with Chevrolet’s chatbot from a dealer in California, and it recommended the customer buy the rival Ford F-150.^[1] In separate interaction, curiously enough, at the same dealership, customers negotiating with the chatbot were able to get the deal of a lifetime on a truck–they were able to persuade the chatbot to offer a hefty $58,000 discount on a new vehicle, lowering its cost to a mere $1.^[2] Sadly, this sales price wasn’t honored. These innocuous AI failures highlight the inherent risks in deploying generative AI technologies, but there are countless other examples that are much more serious and damaging.

For example, you may have caught the recent headlines of deep fake sexual images of Taylor Swift circulating the internet or the abhorrent bullying of high school teenagers with similar images.^[3]^{,^[4]} In fact, if you want to understand the scope, type, and prevalence of AI incidents, I’d encourage you to visit the AI, Algorithmic, and Automation Incidents and Controversies (AIAAIC) repository.^[5] From the perpetuation of stereotypes and biases, discrimination, and intellectual property (IP) infringement, the risks posed by AI can exacerbate society’s ills and continue to marginalize the disenfranchised. AI creates a number of new risks for organizations to manage, which I outlined in my article The 12 Hidden Risks of ChatGPT and Generative AI.

As an increasing number of companies deploy generative AI, they must take special precautions to continually prevent, detect, mitigate, and react to these risks–as they’re ever-present and ever-evolving. CIOs, CISOs, CDOs, and other business leaders face the dual challenge of harnessing generative AI’s upside while simultaneously preventing harm. The question that looms large is: can you responsibly harness the power of generative AI, or are you chasing a mirage of control in an unpredictable digital landscape?

The concept of AI guardrails is not a theoretical framework; it’s a practical necessity when implementing both traditional and generative AI technology. Savvy organizations understand the strategic need for AI guardrails. These guardrails represent the strategies and mechanisms needed so that AI technologies are developed, used, and deployed responsibly and effectively by your organization. AI guardrails help balance innovation with risk management–ensuring an ethical and trustworthy approach to AI adoption.

This article clarifies the concept of AI guardrails and provides actionable insights and strategies for their implementation. We will explore how these guardrails can be integrated into your existing business models and IT, ensuring that you are not only prepared for the challenges of today but also equipped for the opportunities of tomorrow. As the technological landscape continues to evolve, your role as a CIO is more crucial than ever. It’s time to see beyond the mirage and navigate your organization to the oasis of opportunity.

Types of AI Guardrails

What are AI guardrails?

AI guardrails are a set of frameworks and guidelines designed to ensure that AI is researched, developed, deployed, monitored, and used ethically and responsibly. They act as safety mechanisms, providing guidelines and limits to make sure that AI systems are developed and used in a manner that is lawful, ethical, and robust.^[6]

These guardrails are essential for maximizing the potential of AI and for preventing unintended consequences and harm. AI guardrails involve implementing policies, creating prohibited use policies, and embedding protections into AI features by default. AI guardrails help provide security and protection for AI usage by organizations. However, as AI systems become more sophisticated, the complexities, nuances, and ethical dilemmas associated with their use also become more pronounced. And it’s becoming increasingly clear that risks can never be 100% eliminated–especially with the commonly used foundation models in the market today.

Part of this truism is the result of how the models were trained. Since they were trained on the world’s data, which contains bias’, stereotypes, and intellectual property, these influences will always exist–no matter what you do to try and mitigate their effect.

Currently, the responsibility of implementing AI guardrails spans both the generative AI provider as well as the adopting organization–each has a role to play. Guardrails typically include a combination of technology-centric, policy-centric, and human-centric controls.

The next sections discuss each.

Policy-Centric Controls

Policy-centric controls are written into the terms and conditions of the service that you are using. OpenAI has a number of prohibited use clauses in its policy, which exemplify what a policy-centric control is.^[7] One example states: “Don’t misuse our platform to cause harm by intentionally deceiving or misleading others, including: Generating or promoting disinformation, misinformation, or false online engagement (e.g., comments, reviews)”

In addition to provider policies, organizations should also develop a set of policies that dictate how AI should be used. These policies ensure that AI practices align with legal requirements, ethical standards, business objectives, and your organization’s way of doing business. Policy-centric controls extend beyond compliance; they shape the culture and mindset surrounding AI in an organization.

Implementing policy-centric controls involves drafting clear and robust policies that cover data usage, privacy protection, user consent mechanisms, and transparency in AI decision-making processes. This also entails establishing comprehensive guidelines for AI ethics and conduct, ensuring that the deployment and utilization of AI technologies align with ethical and moral standards.

Due to generative AI’s capability to create text, code, and other content at an unbounded pace, new vulnerabilities emerge, exposing systems to increasingly complex cyber threats and attacks. Compliance becomes increasingly complex as AI systems navigate evolving legal and ethical landscapes. Hence, policy frameworks must be agile and comprehensive, covering these new risk areas and ensuring adherence to both current and forthcoming regulations to safeguard against these emerging threats and maintain legal and ethical integrity. Regularly reviewing and updating AI policies ensures they remain relevant and effective.

Suggested Actions

Regularly conduct AI policy audits to ensure alignment with current technological and legal landscapes.
Establish a cross-functional AI ethics committee to oversee policy implementation and adherence.
Engage in continuous learning and training programs for all employees to understand and comply with AI policies.

Policy-centric controls are a foundational AI guardrail element, providing a clear framework for responsible AI deployment and usage.

Technology-Centric Controls

Probably the most frequently discussed, the second area of focus are technology-centric controls. These are the technical mechanisms and tools implemented to ensure the integrity and security of AI systems. Most AI providers have built-in protections on both the inputs (prompts) and generated outputs to help prevent unwanted content. Recently, in response to the growing number of lawsuits, we’ve seen OpenAI try to block copyrighted characters.

Figure 1.1: ChatGPT is Preventing the Creation of Copyrighted Characters

However, researchers have shown that these are easily circumvented.^[8] Gary Marcus also demonstrated how easy it was to generate copyrighted images.^[9]

Additional controls include encryption for data security, algorithms for anomaly detection to prevent misuse, and robust access controls to safeguard sensitive information. Implementing these controls often involves deploying advanced cybersecurity measures and ensuring AI systems are designed with security in mind from the outset.

Staying Compliant with Regulatory and Corporate Guidelines

As companies embed generative AI functionality into their business applications, most providers provide a set of AI guardrails that can be built into the API calls. In fact, there are two categories that IT leaders need to consider: 1) Are you in compliance with the service usage guidelines, and 2) Are you in compliance with your corporate guidelines?

To comply with a service’s usage guidelines, start by examining some of the moderation features of the popular OpenAI ChatGPT service. These include categories like hate, harassment, and violence.

OpenAI’s Moderation Capabilities^[10]

hate: Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is harassment.
hate/threatening: Hateful content that also includes violence or serious harm towards the targeted group based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.
harassment: Content that expresses, incites, or promotes harassing language towards any target.
harassment/threatening: Harassment content that also includes violence or serious harm towards any target.
self-harm: Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
self-harm/intent: Content where the speaker expresses that they are engaging or intend to engage in acts of self-harm, such as suicide, cutting, and eating disorders.
self-harm/instructions: Content that encourages performing acts of self-harm, such as suicide, cutting, and eating disorders, or that gives instructions or advice on how to commit such acts.
sexual: Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
sexual/minors: Sexual content that includes any individual under the age of 18.
violence: Content that depicts death, violence, or physical injury.
violence/graphic: Content that depicts death, violence, or physical injury in graphic detail.

In addition to the above, post-processing capabilities are also needed to detect any potential hallucinations. The startup Galileo has this set of AI guardrail metrics.^[11]

AI Guardrail Capabilities from Galileo

Uncertainty: Measures the model’s certainty in its generated responses. Uncertainty works at the response level as well as at the token level. It has shown a strong correlation with hallucinations or made-up facts, names, or citations.
Groundedness: Measures whether a model’s response was based purely on the context provided. This metric is intended for RAG users and requires a {context} or {document} slot in the data\ and will incur additional LLM calls to compute.
Factuality: Measures whether the facts stated in the response are based on real facts. This metric requires additional LLM calls. Combined with uncertainty, factuality is a good way of uncovering hallucinations.
Context relevance: Measures how relevant the context provided was to the user query. This metric is intended for RAG users and requires a {context} or {document} slot in the data. If computing relevance with embeddings is desired, then they can be added to input data.
Private identifiable information (PII): This guardrail metric surfaces any instances of PII in a model’s responses.
Tone: Classifies the tone of the response into eight different emotion categories: joy, love, fear, surprise, sadness, anger, annoyance, and confusion.
Sexism: Measures how “sexist” a comment might be perceived ranging in the values of 0–1 (1 being more sexist).

Of course, organizations should certainly implement these guardrails. But the jury is still out on how effective they will be. In fact, researchers from Carnegie Mellon University discovered that they can be easily circumvented.^[12]^{,^[13]}

So, in addition to implementing AI guardrails at the API level, organizations should also consider implementing them at the application level as well. For example, some generative AI services offer plug-ins to popular end-user tools like Chrome, Figma, and others that highlight suspicious content within the end-user application. This is similar to how a spellchecker works, with text highlighted in different colors to represent different types of guardrails that may have been violated.

In the end, organizations need to think about how to monitor AI output at scale. As it becomes more and more pervasive, having a human-in-the-loop (HITL) who checks every piece of content cannot scale and is untenable, though it is important for organizations to make sure that an executive is responsible and accountable for compliance with corporate standards.

Suggested Actions:

Conduct regular security audits of AI systems to identify and address vulnerabilities.
Invest in state-of-the-art cybersecurity tools and technologies.
Foster collaboration between AI developers and cybersecurity teams to ensure security is a core component of all AI projects.
Invest in a data intelligence platform like Alation so you can map policies to specific data elements and enforce them at the point of consumption.

These technology-centric controls are essential in creating a secure and reliable AI environment, allowing CIOs to leverage AI capabilities with confidence.

Human-Centric Controls

Human-centric controls acknowledge the limitations of technology and the invaluable role of human judgment. They ensure that AI decisions and processes are continually overseen and guided by human expertise, particularly in areas requiring ethical considerations, medical decisions, and complex decision-making.

Given the internet scale of AI deployments, it is physically impossible for any company to have human-in-the-loop monitoring AI outputs. This has given rise to the popular “Thumbs-up”, “Thumbs-down”, and “Report an Issue” we see in many large-scale applications. It’s a weak nod to the problem, but I suppose it’s better than nothing. Perhaps a way to overcome this challenge is to take a playbook from manufacturing – implement a sampling methodology to continually and monitor your (data) products for quality control.

For human-centric controls, companies should establish guidelines on where human intervention is mandatory, such as in critical decision-making processes or when AI outputs have significant consequences. It also includes training AI systems with human-in-the-loop approaches to enhance learning and accuracy.

Balancing human oversight with AI autonomy is challenging, as excessive intervention can undermine the efficiency of AI, while too little can lead to ethical and practical risks.

Suggested Actions

Develop and implement a clear framework for when and how human intervention should occur in AI processes.
Invest in training programs for employees to understand AI outputs and intervene effectively.
Regularly review and update human-centric control mechanisms to adapt to new developments in AI technology.
Implement a sampling methodology, similar to manufacturers, to ensure quality control of your content and data outputs.

By integrating human-centric controls, CIOs can help ensure that AI systems function with a level of human understanding and ethical oversight, essential for responsible AI deployment.

Conclusion

In the end, there is no one type of implementation that will mitigate all of the risks and problems, but by thinking about the policy, technology, and human-centric aspects of AI, you’ll create and develop a stronger framework than solely relying on any one of them alone.

It’s clear that the journey towards effective AI guardrails is multifaceted. The role of CIOs and technology leaders in this journey is pivotal. By implementing guardrails, you can steer your organizations toward a future where AI becomes more trusted than it is today. The balance between harnessing AI’s capabilities and managing its risks is delicate but achievable.

In navigating the unpredictable terrain of generative AI, CIOs may find that the true illusion is not the control itself, but the belief that any single strategy offers a complete solution. As we’ve seen, the effective use of AI guardrails requires a blend of policy, technology, and human oversight. By engaging with these diverse controls, CIOs can transform the mirage of control into a tangible strategy, steering their organizations toward a future where AI is both an innovative force and a responsible ally.

If you enjoyed this article, please like it, highlight interesting sections, and share comments. Consider following me on Medium and LinkedIn.

If you’re interested in this topic, consider TinyTechGuides’ latest books, including The CIO’s Guide to Adopting Generative AI: Five Keys to Success, Mastering the Modern Data Stack, or Artificial Intelligence: An Executive Guide to Make AI Work for Your Business.

^[1] Howard, Phoebe Wall. 2023. “A Chevrolet Dealer Offered an AI Chatbot on Its Website. It Told Customers to Buy a Ford.” USA TODAY. December 19, 2023. https://www.usatoday.com/story/money/cars/2023/12/19/chevy-of-watsonville-chatgpt-use/71976591007/.

^[2] Masse, Bryson. 2023. “A Chevy for $1? Car Dealer Chatbots Show Perils of AI for Customer Service.” VentureBeat. December 19, 2023. https://venturebeat.com/ai/a-chevy-for-1-car-dealer-chatbots-show-perils-of-ai-for-customer-service/.

^[3] Conger, Kate, and John Yoon. 2024. “Explicit Deepfake Images of Taylor Swift Elude Safeguards and Swamp Social Media.” The New York Times, January 26, 2024, sec. Arts. https://www.nytimes.com/2024/01/26/arts/music/taylor-swift-ai-fake-images.html.

^[4] Hadero, Haleluya. 2023. “AI-Generated Nude Images of Teen Girls Spur Families to Push for Protections: ‘We’re Fighting for Our Children.’” Fortune. December 2, 2023. https://fortune.com/2023/12/02/ai-generated-nude-images-teen-girls-deepfakes-tech-safety-children-parents/.

^[5] “AIAAIC – AIAAIC Repository.” n.d. www.aiaaic.org. https://www.aiaaic.org/aiaaic-repository.

^[6] Sweenor, David. 2023. “Generative AI Ethics.” Medium. July 28, 2023. https://medium.com/towards-data-science/generative-ai-ethics-b2db92ecb909.

^[7] OpenAI. 2023. “Usage Policies.” Openai.com. March 23, 2023. https://openai.com/policies/usage-policies.

^[8] Qi, Xiangyu, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, and Peter Henderson. 2023. “Fine-Tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!” ArXiv.org. October 5, 2023. https://doi.org/10.48550/arXiv.2310.03693.

^[9] Marcus, Gary. 2024. “No, Multimodal ChatGPT Is Not Going to ‘Trivially’ Solve Generative AI’s Copyright Problems.” Marcus on AI. January 24, 2024. https://garymarcus.substack.com/p/no-multimodal-chatgpt-is-not-going.

^[10] “Moderation.” OpenAI Platform. Accessed October 14, 2023. https://platform.openai.com/
docs/guides/moderation/overview.

^[11] “Guardrail Metrics – Galileo.” 2023. Rungalileo.io. 2023. https://docs.rungalileo.io/galileo/how-to-and-faq/ml-research-algorithms/guardrail-metrics.

^[12] Zou, Andy, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. 2023. “Universal and Transferable Adversarial Attacks on Aligned Language Models.” arXiv:2307.15043. July 27, 2023. https://doi.org/10.48550/arXiv.2307.15043.

^[13] Kahn, Jeremy. 2023. “Researchers Find a Way to Easily Bypass Guardrails on OpenAI’s ChatGPT and All Other A.I. Chatbots.” Yahoo Finance. July 28, 2023. https://finance.yahoo.com/news/researchers-way-easily-bypass-guardrails-183009628.html.

Three AI Guardrail Strategies for Managing the Unpredictable in AI

Introduction

Types of AI Guardrails

What are AI guardrails?

Policy-Centric Controls

Suggested Actions

Technology-Centric Controls

Staying Compliant with Regulatory and Corporate Guidelines

Suggested Actions:

Human-Centric Controls

Suggested Actions

Conclusion

Related Posts

Garden Party: Using LLMs for OCR and Data Analytics

Generative AI Deployment Strategies

The Generative AI Hammer. Is Everything a Nail?