←back to Blog

Qualifire AI Open-Sources Rogue: An End-to-End Agentic AI Testing Framework Designed to Evaluate the Performance, Compliance, and Reliability of AI Agents

Qualifire AI Open-Sources Rogue: An End-to-End Agentic AI Testing Framework

Qualifire AI has introduced Rogue, a comprehensive framework designed to evaluate the performance, compliance, and reliability of AI agents. This open-source Python framework operates over the Agent-to-Agent (A2A) protocol, converting business policies into executable scenarios. It facilitates multi-turn interactions with target agents and generates deterministic reports suitable for continuous integration and compliance reviews.

Target Audience Analysis

The primary audience for Rogue includes:

  • AI Developers and Engineers: Focused on enhancing the performance and reliability of AI agents.
  • Compliance Officers: Concerned with ensuring that AI systems adhere to regulatory standards.
  • DevOps Teams: Interested in integrating AI testing frameworks into CI/CD pipelines.
  • Business Analysts: Seeking to understand the implications of AI behavior on business policies.

Common pain points for this audience include:

  • Difficulty in identifying vulnerabilities in AI agents during multi-turn interactions.
  • Challenges in ensuring compliance with evolving regulations.
  • Need for clear audit trails and evidence of AI performance.
  • Integration of AI testing into existing development workflows.

Goals for these personas typically include:

  • Improving the reliability and safety of AI agents.
  • Streamlining the testing process within CI/CD environments.
  • Ensuring compliance with industry regulations.
  • Enhancing the ability to audit AI behavior and decisions.

In terms of communication preferences, this audience favors:

  • Technical documentation that is clear and concise.
  • Hands-on tutorials and examples to facilitate understanding.
  • Direct engagement through forums or community discussions.

Why Rogue Matters for AI Development Teams

Rogue addresses the limitations of conventional quality assurance methods, which often fail to expose vulnerabilities in AI agents. By providing protocol-accurate conversations and explicit policy checks, Rogue ensures that developer teams can release AI agents with confidence.

Quick Start Guide

Prerequisites

  • uvx – Follow the installation guide if not installed.
  • Python 3.10+
  • An API key for an LLM provider (e.g., OpenAI, Google, Anthropic).

Installation

Option 1: Quick Install (Recommended)

Use the automated install script:

  • Terminal User Interface: uvx rogue-ai
  • Web UI: uvx rogue-ai ui
  • Command Line Interface: uvx rogue-ai cli

Option 2: Manual Installation

  1. Clone the repository:
    git clone https://github.com/qualifire-dev/rogue.git
    cd rogue
  2. Install dependencies:
    • If using uv: uv sync
    • If using pip: pip install -e .
  3. Optionally set up environment variables by creating a .env file in the root directory:
    OPENAI_API_KEY="sk-..."
    ANTHROPIC_API_KEY="sk-..."
    GOOGLE_API_KEY="..."

Running Rogue

Rogue operates on a client-server architecture:

  • Default Behavior: Running uvx rogue-ai starts the server and launches the TUI client.
  • Available Modes:
    • Server: uvx rogue-ai server – Runs only the backend server.
    • TUI: uvx rogue-ai tui – Runs only the TUI client (requires server running).
    • Web UI: uvx rogue-ai ui – Runs only the Gradio web interface client (requires server running).
    • CLI: uvx rogue-ai cli – Runs non-interactive command-line evaluation (ideal for CI/CD).

Example: Testing the T-Shirt Store Agent

This repository includes a simple example agent that sells T-shirts. To see Rogue in action:

  1. Install example dependencies:
    • If using uv: uv sync --group examples
    • If using pip: pip install -e .[examples]
  2. Start the example agent server in a separate terminal:
    • If using uv: uv run examples/tshirt_store_agent
    • If not: python examples/tshirt_store_agent
  3. Configure Rogue in the UI to point to the example agent:
    • Agent URL: http://localhost:10001
    • Authentication: no-auth
  4. Run the evaluation and observe Rogue testing the T-Shirt agent’s policies.

Practical Use Cases for Rogue

  • Safety & Compliance Hardening: Validate PII/PHI handling, refusal behavior, and secret-leak prevention.
  • E-Commerce & Support Agents: Enforce OTP-gated discounts and refund rules under adversarial conditions.
  • Developer/DevOps Agents: Assess code-mod and CLI copilots for workspace confinement and rollback semantics.
  • Multi-Agent Systems: Verify planner-executor contracts and evaluate interoperability.
  • Regression & Drift Monitoring: Conduct nightly suites against new model versions or prompt changes.

Conclusion

Rogue is an essential tool for developer teams looking to enhance the reliability and compliance of their AI agents. By turning written policies into actionable scenarios, Rogue provides clear, repeatable signals that can be integrated into CI/CD workflows, helping to catch policy breaches and regressions before deployment.

For more information, visit the Rogue GitHub repository.