Tracing OpenAI Agent Responses using MLFlow

Understanding the Target Audience

The target audience for this content primarily includes data scientists, machine learning engineers, and business managers interested in implementing AI solutions. Their pain points often revolve around the complexity of tracking and managing machine learning experiments, ensuring reproducibility, and debugging multi-agent systems. Their goals include optimizing AI workflows, enhancing collaboration between agents, and ensuring compliance with safety standards. They are typically interested in practical applications of AI, technical specifications, and best practices. Communication preferences lean towards clear, concise, and technical language, often supplemented with code examples and case studies.

Introduction to MLflow

MLflow is an open-source platform designed for managing and tracking machine learning experiments. When integrated with the OpenAI Agents SDK, MLflow automatically:

Logs all agent interactions and API calls
Captures tool usage, input/output messages, and intermediate decisions
Tracks runs for debugging, performance analysis, and reproducibility

This functionality is particularly beneficial when developing multi-agent systems where different agents collaborate or dynamically call functions.

Tutorial Overview

This tutorial will guide you through two key examples: a simple handoff between agents and the implementation of agent guardrails, all while tracing their behavior using MLflow.

Setting Up Dependencies

Installing the Libraries

pip install openai-agents mlflow pydantic pydotenv

OpenAI API Key

To obtain an OpenAI API key, visit OpenAI API Keys and generate a new key. New users may need to provide billing details and make a minimum payment of $5 to activate API access.

Once the key is generated, create a .env file and enter the following:

OPENAI_API_KEY = <YOUR_API_KEY>

Replace <YOUR_API_KEY> with the key you generated.

Multi-Agent System Example

Script: multi_agent_demo.py

This script builds a simple multi-agent assistant using the OpenAI Agents SDK, designed to route user queries to either a coding expert or a cooking expert. By enabling mlflow.openai.autolog(), all agent interactions with the OpenAI API are automatically traced and logged, including inputs, outputs, and agent handoffs. MLflow is configured to use a local file-based tracking URI (./mlruns) and logs all activity under the experiment name “Agent-Coding-Cooking.”

import mlflow, asyncio
from agents import Agent, Runner
import os
from dotenv import load_dotenv
load_dotenv()

mlflow.openai.autolog()  # Auto-trace every OpenAI call
mlflow.set_tracking_uri("./mlruns")
mlflow.set_experiment("Agent-Coding-Cooking")

coding_agent = Agent(name="Coding agent", instructions="You only answer coding questions.")
cooking_agent = Agent(name="Cooking agent", instructions="You only answer cooking questions.")

triage_agent = Agent(
    name="Triage agent",
    instructions="If the request is about code, handoff to coding_agent; if about cooking, handoff to cooking_agent.",
    handoffs=[coding_agent, cooking_agent],
)

async def main():
    res = await Runner.run(triage_agent, input="How do I boil pasta al dente?")
    print(res.final_output)

if __name__ == "__main__":
    asyncio.run(main())

Viewing the MLflow UI

To open the MLflow UI and view all logged agent interactions, run the following command in a new terminal:

mlflow ui

This command starts the MLflow tracking server, displaying a prompt indicating the URL and port where the UI is accessible, typically http://localhost:5000 by default. The interaction flow can be viewed in the Tracing section, providing insights into decision-making, handoffs, and outputs, which aids in debugging and optimizing agent workflows.

Tracing Guardrails Example

Script: guardrails.py

This example implements a guardrail-protected customer support agent using the OpenAI Agents SDK with MLflow tracing. The agent assists users with general queries but is restricted from answering medical-related questions. A dedicated guardrail agent checks for such inputs, blocking the request if detected. MLflow captures the entire flow, including guardrail activation, reasoning, and agent response, ensuring full traceability and insight into safety mechanisms.

import mlflow, asyncio
from pydantic import BaseModel
from agents import (
    Agent, Runner,
    GuardrailFunctionOutput, InputGuardrailTripwireTriggered,
    input_guardrail, RunContextWrapper)

from dotenv import load_dotenv
load_dotenv()

mlflow.openai.autolog()
mlflow.set_tracking_uri("./mlruns")
mlflow.set_experiment("Agent-Guardrails")

class MedicalSymptoms(BaseModel):
    medical_symptoms: bool
    reasoning: str

guardrail_agent = Agent(
    name="Guardrail check",
    instructions="Check if the user is asking you for medical symptoms.",
    output_type=MedicalSymptoms,
)

@input_guardrail
async def medical_guardrail(ctx: RunContextWrapper[None], agent: Agent, input):
    result = await Runner.run(guardrail_agent, input, context=ctx.context)
    return GuardrailFunctionOutput(
        output_info=result.final_output,
        tripwire_triggered=result.final_output.medical_symptoms,
    )

agent = Agent(
    name="Customer support agent",
    instructions="You are a customer support agent. You help customers with their questions.",
    input_guardrails=[medical_guardrail],
)

async def main():
    try:
        await Runner.run(agent, "Should I take aspirin if I'm having a headache?")
        print("Guardrail didn't trip - this is unexpected")
    except InputGuardrailTripwireTriggered:
        print("Medical guardrail tripped")

if __name__ == "__main__":
    asyncio.run(main())

Viewing the MLflow UI for Guardrails

To open the MLflow UI and view all logged agent interactions, run the following command in a new terminal:

mlflow ui

In this example, when the agent is asked, “Should I take aspirin if I’m having a headache?”, the guardrail is triggered. The MLflow UI clearly shows that the input was flagged, along with the reasoning provided by the guardrail agent for why the request was blocked.

Conclusion

This tutorial has demonstrated how to trace OpenAI agent responses using MLflow, showcasing both a multi-agent system and the implementation of guardrails. By leveraging MLflow’s capabilities, developers can enhance the reliability and safety of AI applications.

For further exploration, check out the original sources and documentation related to this project.