Step-by-Step Guide to AI Agent Development Using Microsoft Agent-Lightning

This tutorial walks through the process of setting up an advanced AI Agent using Microsoft’s Agent-Lightning framework. The setup is conducted directly within Google Colab, allowing for experimentation with both server and client components in a unified environment.

Understanding the Target Audience

The target audience for this guide includes:

Business managers and decision-makers interested in AI implementation.
Developers and data scientists seeking to enhance their AI capabilities.
Researchers focused on AI applications in business contexts.

Pain Points: Limited understanding of AI frameworks, challenges in integrating AI solutions, and the need for practical, hands-on guidance.

Goals: To develop effective AI agents, streamline business processes, and enhance decision-making through AI technologies.

Interests: AI technology trends, practical applications of AI in business, and tutorials that provide clear, actionable steps.

Communication Preferences: Clear, concise instructions with an emphasis on practical applications and real-world examples.

Setting Up the Environment

To begin, we need to install the required libraries and import the necessary modules for Agent-Lightning. Additionally, we will securely set up our OpenAI API key and define the model for this tutorial.

!pip -q install agentlightning openai nest_asyncio python-dotenv > /dev/null

import os, threading, time, asyncio, nest_asyncio, random
from getpass import getpass
from agentlightning.litagent import LitAgent
from agentlightning.trainer import Trainer
from agentlightning.server import AgentLightningServer
from agentlightning.types import PromptTemplate

import openai
if not os.getenv("OPENAI_API_KEY"):
   try:
       os.environ["OPENAI_API_KEY"] = getpass("Enter OPENAI_API_KEY (leave blank if using a local/proxy base): ") or ""
   except Exception:
       pass
MODEL = os.getenv("MODEL", "gpt-4o-mini")

Defining the QA Agent

Next, we define a simple QA agent by extending LitAgent. This agent will handle training rollouts by sending user prompts to the LLM and scoring the responses against predefined answers.

class QAAgent(LitAgent):
   def training_rollout(self, task, rollout_id, resources):
       sys_prompt = resources["system_prompt"].template
       user = task["prompt"]; gold = task.get("answer","").strip().lower()
       try:
           r = openai.chat.completions.create(
               model=MODEL,
               messages=[{"role":"system","content":sys_prompt},
                         {"role":"user","content":user}],
               temperature=0.2,
           )
           pred = r.choices[0].message.content.strip()
       except Exception as e:
           pred = f"[error]{e}"
       def score(pred, gold):
           P = pred.lower()
           base = 1.0 if gold and gold in P else 0.0
           gt = set(gold.split()); pr = set(P.split());
           inter = len(gt & pr); denom = (len(gt)+len(pr)) or 1
           overlap = 2*inter/denom
           brevity = 0.2 if base==1.0 and len(P.split())<=8 else 0.0
           return max(0.0, min(1.0, 0.7*base + 0.25*overlap + brevity))
       return float(score(pred, gold))

Creating Tasks and Prompts

We create a benchmark with three QA tasks and curate several candidate system prompts to optimize our agent's performance.

TASKS = [
   {"prompt":"Capital of France?","answer":"Paris"},
   {"prompt":"Who wrote Pride and Prejudice?","answer":"Jane Austen"},
   {"prompt":"2+2 = ?","answer":"4"},
]

PROMPTS = [
   "You are a terse expert. Answer with only the final fact, no sentences.",
   "You are a helpful, knowledgeable AI. Prefer concise, correct answers.",
   "Answer as a rigorous evaluator; return only the canonical fact.",
   "Be a friendly tutor. Give the one-word answer if obvious."
]

Running the Server and Evaluating Prompts

We start the Agent-Lightning server and iterate through our candidate system prompts, updating the shared system prompt before queuing each training task.

async def run_server_and_search():
   server = AgentLightningServer(host=HOST, port=PORT)
   await server.start()
   print("Server started")
   await asyncio.sleep(1.5)

   results = []
   for sp in PROMPTS:
       await server.update_resources({"system_prompt": PromptTemplate(template=sp, engine="f-string")})
       scores = []
       for t in TASKS:
           tid = await server.queue_task(sample=t, mode="train")
           rollout = await server.poll_completed_rollout(tid, timeout=40)
           if rollout is None:
               print("Timeout waiting for rollout; continuing...")
               continue
           scores.append(float(getattr(rollout, "final_reward", 0.0)))
       avg = sum(scores)/len(scores) if scores else 0.0
       print(f"Prompt avg: {avg:.3f}  |  {sp}")
       results.append((sp, avg))

   best = max(results, key=lambda x: x[1]) if results else ("",0)
   print("BEST PROMPT:", best[0], " | score:", f"{best[1]:.3f}")
   await server.stop()

Launching the Client

We launch the client in a separate thread with two parallel workers, allowing it to process tasks sent by the server. This setup enables simultaneous evaluation of different prompts and collection of performance metrics.

def run_client_in_thread():
   agent = QAAgent()
   trainer = Trainer(n_workers=2)    
   trainer.fit(agent, backend=f"http://{HOST}:{PORT}")

client_thr = threading.Thread(target=run_client_in_thread, daemon=True)
client_thr.start()
asyncio.run(run_server_and_search())

Conclusion

In conclusion, Agent-Lightning facilitates the creation of a flexible agent pipeline with minimal code. By starting a server, running parallel client workers, evaluating various system prompts, and automatically measuring performance, developers can efficiently build, test, and optimize AI agents within a single Colab environment.

For additional resources, check out the GitHub Page for Tutorials, Codes, and Notebooks. Follow us on Twitter and join our 100k+ ML SubReddit. Subscribe to our Newsletter.