How To Build Effective Technical Guardrails for AI Applications

Contents

Top technical guardrails at different layers of AI application 1. Data layer (i) Input validation and sanitization (ii) PII and sensitive data protection (iii) Bias detection and mitigation (iv) On-time data availability (v) Data integrity 2. Model layer (i) User permissions based on role (ii) Bias audits (iii) LLM as a judge (iv) Continuous fine-tuning 3. Output layer (i) Content filtering for language, profanity, keyword blocking (ii) Response validation (iii) Confidence threshold and human-in-loop triggers (iv) Continuous monitoring and alerting (v) Regulatory compliance Balance AI with human expertise

with a bit of control and assurance of security. Guardrails provide that for AI applications. But how can those be built into applications?

A few guardrails are established even before application coding begins. First, there are legal guardrails provided by the government, such as the EU AI Act, which highlights acceptable and banned use cases of AI. Then there are policy guardrails set by the company. These guardrails indicate which use cases the company finds acceptable for AI usage, both in terms of security and ethics. These two guardrails filter the use cases for AI adoption.

After crossing the first two types of guardrails, an acceptable use case reaches the engineering team. When the engineering team implements the use case, they further incorporate technical guardrails to ensure the safe use of data and maintain the expected behavior of the application. We will explore this third type of guardrail in the article.

Top technical guardrails at different layers of AI application

Guardrails are created at the input, model, and output layers. Each serves a unique purpose:

Data layer: Guardrails at the data layer ensure that any sensitive, problematic, or incorrect data doesn’t enter the system.
Model layer: It’s good to build guardrails at this layer to make sure the model is working as expected.
Output layer: Output layer guardrails assure the model doesn’t provide incorrect answers with high confidence — a common threat with AI systems.

Image by author

1. Data layer

Let’s go through the must-have guardrail at the data layer:

(i) Input validation and sanitization

The first thing to check in any AI application is if the input data is in the correct format and doesn’t contain any inappropriate or offensive language. It’s actually quite easy to do that since most databases offer built-in SQL functions for pattern matching. For instance, if a column is supposed to be alphanumeric, then you can validate if the values are in the expected format using a simple regex pattern. Similarly, functions are available to perform a profanity check (inappropriate or offensive language) in cloud applications like Microsoft Azure. But you can always build a custom function if your database doesn’t have one.

Data validation:
– The query below only takes entries from the customer table where the customer_email_id is in a valid format
SELECT * FROM customers WHERE REGEXP_LIKE(customer_email_id, '^[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}$' );
—-----------------------------------------------------------------------------------------
Data sanitization:
– Creating a custom profanity_check function to detect offensive language
CREATE OR REPLACE FUNCTION offensive_language_check(INPUT VARCHAR)
RETURNS BOOLEAN
LANGUAGE SQL
AS $$
 SELECT REGEXP_LIKE(
   INPUT
   '\\b(abc|...)\\b', — list of offensive words separated by pipe
 );
$$;
– Using the custom profanity_check function to filter out comments with offensive language
SELECT user_comments from customer_feedback where offensive_language_check(user_comments)=0;

(ii) PII and sensitive data protection

Another key consideration in building a secure AI application is making sure none of the PII data reaches the model layer. Most data engineers work with cross-functional teams to flag all PII columns in tables. There are also PII identification automation tools available, which can perform data profiling and flag the PII columns with the help of ML models. Common PII columns are: name, email address, phone number, date of birth, social security number (SSN), passport number, driver’s license number, and biometric data. Other examples of indirect PII are health information or financial information.

A common way to prevent this data from entering the system is by applying a de-identification mechanism. This can be as simple as removing the data completely, or employing sophisticated masking or pseudonymization techniques using hashing — something which the model can’t interpret.

– Hashing PII data of customers for data privacy 
SELECT SHA2(customer_name, 256) AS encrypted_customer_name, SHA2(customer_email, 256) AS encrypted_customer_email, … FROM customer_data

(iii) Bias detection and mitigation

Before the data enters the model layer, another checkpoint is to validate whether it is accurate and bias-free. Some common types of bias are:

Selection bias: The input data is incomplete and doesn’t accurately represent the full target audience.
Survivorship bias: There is more data for the happy path, making it tough for the model to work on failed scenarios.
Racial or association bias: The data favors a certain gender or race due to past patterns or prejudices.
Measurement or label bias: The data is incorrect due to a labelling mistake or bias in the person who recorded it.
Rare event bias: The input data lacks all edge cases, giving an incomplete picture.
Temporal bias: The input data is outdated and doesn’t accurately represent the current world.

While I also wish there were a simple system to detect such biases, this is actually grunt work. The data scientist has to sit down, run queries, and test data for every scenario to detect any bias. For example, if you are building a health app and do not have sufficient data for a specific age group or BMI, then there is a high chance of bias in the data.

– Identifying if any age group data or BMI group data is missing
select age_group, count(*) from users_data group by age_group;
select BMI, count(*) from users_data group by BMI;

(iv) On-time data availability

Another aspect to verify is data timeliness. Right and relevant data must be available for the models to function well. Some models may need real-time data, a few require near real-time, and for some, batch is enough. Whatever your requirements are, a system to monitor whether the latest required data is available is needed.

For instance, if category managers refresh the pricing of products every midnight based on market dynamics, then your model must have data last refreshed after midnight. You can have systems in place to alert whenever data is stale , or you can build proactive alerting around the data orchestration layer, monitoring the ETL pipelines for timeliness.

–Creating an alert if today’s data is not available
SELECT CASE WHEN TO_DATE(last_updated_timestamp) != TO_DATE(CURRENT_TIMESTAMP()) THEN 'FRESH' ELSE 'STALE' END AS table_freshness_status FROM product_data;

(v) Data integrity

Maintaining integrity is also crucial for model accuracy. Data integrity refers to the accuracy, completeness, and reliability of data. Any old, irrelevant, and incorrect data in the system will make the output go haywire. For instance, if you are building a customer-facing chatbot, then it must have access to only the latest company policy files. Having access to incorrect documents may result in hallucinations where the model merges terms from multiple files and gives a completely inaccurate answer to the customer. And you will still be held legally liable for it. Like how Air Canada had to refund flight charges for customers when its chatbot wrongly promised a refund.

There are no straightforward methods to verify integrity. It requires data analysts and engineers to get their hands dirty, verify the files/data, and ensure that only the latest/relevant data is sent to the model layer. Maintaining data integrity is also the best way to control hallucinations, so the model doesn’t do any garbage in, garbage out.

2. Model layer

After the data layer, the following checkpoints can be built into the model layer:

(i) User permissions based on role

Safeguarding the AI Model layer is important to prevent any unauthorized changes that may introduce bugs or bias in the systems. It is also required to prevent any data leakages. You must control who has access to this layer. A standardized approach for it is introducing role-based access control, where employees in only authorized roles, such as machine learning engineers, data scientists, or data engineers, can access the model layer.

For instance, DevOps engineers can have read-only access as they are not supposed to change model logic. ML engineers can have read-write permissions. Establishing RBAC is an important security practice for maintaining model integrity.

(ii) Bias audits

Bias handling remains a continuous process. It can creep in later in the system, even if you did all the necessary checks in the input layer. In fact, some biases, particularly confirmation bias, tend to develop at the model layer. It is a bias that happens when a model has fully overfitted into the data, leaving no room for nuances. In case of any overfitting, a model requires a slight calibration. Spline calibration is a popular method to calibrate models. It makes slight adjustments to the data to ensure all dots are connected.

import numpy as np
import scipy.interpolate as interpolate
import matplotlib.pyplot as plt
from sklearn.metrics import brier_score_loss


# High level Steps:
#Define input (x) and output (y) data for spline fitting
#Set B-Spline parameters: degree & number of knots
#Use the function splrep to compute the B-Spline representation
#Evaluate the spline over a range of x to generate a smooth curve.
#Plot original data and spline curve for visual comparison.
#Calculate the Brier score to assess prediction accuracy.
#Use eval_spline_calibration to evaluate the spline on new x values.
#As a final step, we need to analyze the plot by:
# Check for fit quality (good fit, overfitting, underfitting), validating consistency with expected trends, and interpreting the Brier score for model performance.


######## Sample Code for the steps above ########


# Sample data: Adjust with your actual data points
x_data = np.array([...])  # Input x values, replace '...' with actual data
y_data = np.array([...])  # Corresponding output y values, replace '...' with actual data


# Fit a B-Spline to the data
k = 3  # Degree of the spline, typically cubic spline (cubic is commonly used, hence k=3)
num_knots = 10  # Number of knots for spline interpolation, adjust based on your data complexity
knots = np.linspace(x_data.min(), x_data.max(), num_knots)  # Equally spaced knot vector over data range


# Compute the spline representation
# The function 'splrep' computes the B-spline representation of a 1-D curve
tck = interpolate.splrep(x_data, y_data, k=k, t=knots[1:-1])


# Evaluate the spline at the desired points
x_spline = np.linspace(x_data.min(), x_data.max(), 100)  # Generate x values for smooth spline curve
y_spline = interpolate.splev(x_spline, tck)  # Evaluate spline at x_spline points


# Plot the results
plt.figure(figsize=(8, 4))
plt.plot(x_data, y_data, 'o', label='Data Points')  # Plot original data points
plt.plot(x_spline, y_spline, '-', label='B-Spline Calibration')  # Plot spline curve
plt.xlabel('x') 
plt.ylabel('y')
plt.title('Spline Calibration') 
plt.legend() 
plt.show()  


# Calculate Brier score for comparison
# The Brier score measures the accuracy of probabilistic predictions
y_pred = interpolate.splev(x_data, tck)  # Evaluate spline at original data points
brier_score = brier_score_loss(y_data, y_pred)  # Calculate Brier score between original and predicted data
print("Brier Score:", brier_score) 


# Placeholder for calibration function
# This function allows for the evaluation of the spline at arbitrary x values
def eval_spline_calibration(x_val):
   return interpolate.splev(x_val, tck)  # Return the evaluated spline for input x_val

(iii) LLM as a judge

LLM (Large Language Model) as a Judge is an interesting approach to validating models, where one LLM is used to judge the output of another LLM. It replaces manual intervention and supports implementing response validation at scale.

To implement LLM as a judge, you need to build a prompt that will evaluate the output. The prompt result must be measurable criteria, such as a score or rank.

A sample prompt for reference:
Assign a helpfulness score for the response based on the company’s policies, where 1 is the highest score and 5 is the lowest

This prompt output can be used to trigger the monitoring framework whenever outputs are unexpected.

Tip: The best part of recent technological advancements is that you don’t even have to build an LLM from scratch. There are plug-and-play solutions available, like Meta Lama, which you can download and run on-premises.

(iv) Continuous fine-tuning

For the long-term success of any model, continuous fine-tuning is essential. It’s where the model is regularly refined for accuracy. A simple way to achieve this is by introducing Reinforcement Learning with Human Feedback, where human reviewers rate the model’s output, and the model learns from it. But this process is resource-intensive. To do it at scale, you need automation.

A common fine-tuning method is Low-Rank Adaptation (LoRA). In this technique, you create a separate trainable layer that has logic for optimization. You can increase output accuracy without modifying the base model. For example, you are building a recommendation system for a streaming platform, and the current recommendations are not resulting in clicks. In the LoRA layer, you build a separate logic where you group clusters of viewers with similar viewing habits and use the cluster data to make recommendations. This layer can be used to make recommendations till it helps to achieve the desired accuracy.

3. Output layer

These are some final checks done at the output layer for safety:

(i) Content filtering for language, profanity, keyword blocking

Similar to the input layer, filtering is also performed at the output layer to detect any offensive language. This double-checking assures there’s no bad end-user experience.

(ii) Response validation

Some basic checks on model responses can also be done by creating a simple rule-based framework. These checks could include simple ones, such as verifying output format, acceptable values, and more. It can be done easily in both Python and SQL.

– Simple rule-based checking to flag invalid response
select
CASE
WHEN <condition_1> THEN ‘INVALID’
WHEN <condition_2> THEN ‘INVALID’
ELSE ‘VALID’  END as OUTPUT_STATUS
from
output_table;

(iii) Confidence threshold and human-in-loop triggers

No AI model is perfect, and that’s okay as long as you can involve a human wherever required. There are AI tools available where you can hardcode when to use AI and when to initiate a human-in-the-loop trigger. It’s also possible to automate this action by introducing a confidence threshold. Whenever the model shows low confidence in the output, reroute the request to a human for an accurate answer.

import numpy as np
import scipy.interpolate as interpolate
# One option to generate a confidence score is using the B-spline or its derivatives for the input data
# scipy has interpolate.splev function takes two main inputs:
# 1. x: The x values at which you want to evaluate the spline 
# 2. tck: The tuple (t, c, k) representing the knots, coefficients, and degree of the spline. This can be generated using make_splrep (or the older function splrep) or manually constructed
# Generate the confidence scores and remove the values outside 0 and 1 if present
predicted_probs = np.clip(interpolate.splev(input_data, tck), 0, 1)

# Zip the score with input data
confidence_results = list(zip(input_data, predicted_probs))

# Come up with a threshold and identify all inputs that do not meet the threshold, and use it for manual verification
threshold = 0.5
filtered_results = [(i, score) for i, score in confidence_results if score <= threshold]

# Records that can be routed for manual/human verification
for i, score in filtered_results:
   print(f"x: {i}, Confidence Score: {score}")

(iv) Continuous monitoring and alerting

Like any software application, AI models also need a logging and alerting framework that can detect the expected (and unexpected) errors. With this guardrail, you have a detailed log file for every action and also an automated alert when things go wrong.

(v) Regulatory compliance

A lot of compliance handling happens way before the output layer. Legally acceptable use cases are finalized in the initial requirement gathering phase itself. Any sensitive data is hashed in the input layer. Beyond this, if there are any regulatory requirements, such as encryption of any data, that can be done in the output layer with a simple rule-based framework.

Balance AI with human expertise

Guardrails help you make the best of AI automation while still retaining some control over the process. I’ve covered all the common types of guardrails you may have to set at different levels of a model.

Beyond this, if you encounter any factor that could impact the model’s expected output, then you can also set a guardrail for that. This article is not a fixed formula, but a guide to identify (and fix) the common roadblocks. At the end, your AI application must do what it’s meant for: automate the busy work without any headache. And guardrails help to achieve that.