Step-by-Step Guide to AI Agent Development Using Microsoft Agent-Lightning
This tutorial walks through the process of setting up an advanced AI Agent using Microsoft’s Agent-Lightning framework. The setup is conducted directly within Google Colab, allowing for experimentation with both server and client components in a unified environment.
Understanding the Target Audience
The target audience for this guide includes:
- Business managers and decision-makers interested in AI implementation.
- Developers and data scientists seeking to enhance their AI capabilities.
- Researchers focused on AI applications in business contexts.
Pain Points: Limited understanding of AI frameworks, challenges in integrating AI solutions, and the need for practical, hands-on guidance.
Goals: To develop effective AI agents, streamline business processes, and enhance decision-making through AI technologies.
Interests: AI technology trends, practical applications of AI in business, and tutorials that provide clear, actionable steps.
Communication Preferences: Clear, concise instructions with an emphasis on practical applications and real-world examples.
Setting Up the Environment
To begin, we need to install the required libraries and import the necessary modules for Agent-Lightning. Additionally, we will securely set up our OpenAI API key and define the model for this tutorial.
!pip -q install agentlightning openai nest_asyncio python-dotenv > /dev/null import os, threading, time, asyncio, nest_asyncio, random from getpass import getpass from agentlightning.litagent import LitAgent from agentlightning.trainer import Trainer from agentlightning.server import AgentLightningServer from agentlightning.types import PromptTemplate import openai if not os.getenv("OPENAI_API_KEY"): try: os.environ["OPENAI_API_KEY"] = getpass("Enter OPENAI_API_KEY (leave blank if using a local/proxy base): ") or "" except Exception: pass MODEL = os.getenv("MODEL", "gpt-4o-mini")
Defining the QA Agent
Next, we define a simple QA agent by extending LitAgent. This agent will handle training rollouts by sending user prompts to the LLM and scoring the responses against predefined answers.
class QAAgent(LitAgent): def training_rollout(self, task, rollout_id, resources): sys_prompt = resources["system_prompt"].template user = task["prompt"]; gold = task.get("answer","").strip().lower() try: r = openai.chat.completions.create( model=MODEL, messages=[{"role":"system","content":sys_prompt}, {"role":"user","content":user}], temperature=0.2, ) pred = r.choices[0].message.content.strip() except Exception as e: pred = f"[error]{e}" def score(pred, gold): P = pred.lower() base = 1.0 if gold and gold in P else 0.0 gt = set(gold.split()); pr = set(P.split()); inter = len(gt & pr); denom = (len(gt)+len(pr)) or 1 overlap = 2*inter/denom brevity = 0.2 if base==1.0 and len(P.split())<=8 else 0.0 return max(0.0, min(1.0, 0.7*base + 0.25*overlap + brevity)) return float(score(pred, gold))
Creating Tasks and Prompts
We create a benchmark with three QA tasks and curate several candidate system prompts to optimize our agent's performance.
TASKS = [ {"prompt":"Capital of France?","answer":"Paris"}, {"prompt":"Who wrote Pride and Prejudice?","answer":"Jane Austen"}, {"prompt":"2+2 = ?","answer":"4"}, ] PROMPTS = [ "You are a terse expert. Answer with only the final fact, no sentences.", "You are a helpful, knowledgeable AI. Prefer concise, correct answers.", "Answer as a rigorous evaluator; return only the canonical fact.", "Be a friendly tutor. Give the one-word answer if obvious." ]
Running the Server and Evaluating Prompts
We start the Agent-Lightning server and iterate through our candidate system prompts, updating the shared system prompt before queuing each training task.
async def run_server_and_search(): server = AgentLightningServer(host=HOST, port=PORT) await server.start() print("Server started") await asyncio.sleep(1.5) results = [] for sp in PROMPTS: await server.update_resources({"system_prompt": PromptTemplate(template=sp, engine="f-string")}) scores = [] for t in TASKS: tid = await server.queue_task(sample=t, mode="train") rollout = await server.poll_completed_rollout(tid, timeout=40) if rollout is None: print("Timeout waiting for rollout; continuing...") continue scores.append(float(getattr(rollout, "final_reward", 0.0))) avg = sum(scores)/len(scores) if scores else 0.0 print(f"Prompt avg: {avg:.3f} | {sp}") results.append((sp, avg)) best = max(results, key=lambda x: x[1]) if results else ("",0) print("BEST PROMPT:", best[0], " | score:", f"{best[1]:.3f}") await server.stop()
Launching the Client
We launch the client in a separate thread with two parallel workers, allowing it to process tasks sent by the server. This setup enables simultaneous evaluation of different prompts and collection of performance metrics.
def run_client_in_thread(): agent = QAAgent() trainer = Trainer(n_workers=2) trainer.fit(agent, backend=f"http://{HOST}:{PORT}") client_thr = threading.Thread(target=run_client_in_thread, daemon=True) client_thr.start() asyncio.run(run_server_and_search())
Conclusion
In conclusion, Agent-Lightning facilitates the creation of a flexible agent pipeline with minimal code. By starting a server, running parallel client workers, evaluating various system prompts, and automatically measuring performance, developers can efficiently build, test, and optimize AI agents within a single Colab environment.
For additional resources, check out the GitHub Page for Tutorials, Codes, and Notebooks. Follow us on Twitter and join our 100k+ ML SubReddit. Subscribe to our Newsletter.