Microsoft AI Introduces Magentic-UI: An Open-Source Agent Prototype that Works with People to Complete Complex Tasks that Require Multi-Step Planning and Browser Use

Microsoft AI Introduces Magentic-UI: An Open-Source Agent Prototype

Modern web usage encompasses various digital interactions, such as filling out forms, managing accounts, executing data queries, and navigating complex dashboards. Despite the web’s integration with productivity and work processes, many actions still require repetitive human input. This is particularly evident in environments needing detailed instructions or complex decision-making beyond simple searches. While AI agents have emerged for task automation, many prioritize complete autonomy, which can often lead to outcomes that diverge from user expectations. The next step in AI-driven productivity involves agents designed for collaboration with users, integrating automation with real-time human input for more accurate and trusted results.

A major challenge in deploying AI agents for web tasks is the lack of user visibility and intervention. Users frequently cannot see what steps the agent is planning, how it intends to execute them, or when it might deviate from the intended path. In scenarios involving complex decisions—such as entering payment information or interpreting dynamic content—users require mechanisms to intervene and redirect the process. Without these capabilities, systems risk making irreversible mistakes or misaligning with user goals, highlighting a significant limitation in current AI automation: the absence of a structured human-in-the-loop design.

Traditional solutions have approached web automation through rule-based scripts or general-purpose AI agents driven by language models. These systems interpret user commands and attempt to execute them autonomously but often do so without revealing intermediate decisions or allowing meaningful user feedback. Some offer command-line-like interactions, which are generally inaccessible to the average user and lack layered safety mechanisms. Furthermore, limited support for task reuse or performance learning across sessions restricts their long-term value. These systems typically lack adaptability when context changes mid-task or when errors require collaborative correction.

Researchers at Microsoft have introduced Magentic-UI, an open-source prototype that emphasizes collaborative human-AI interaction for web-based tasks. Unlike previous systems striving for full independence, Magentic-UI promotes real-time co-planning, shared execution, and step-by-step user oversight. Built on Microsoft’s AutoGen framework and integrated with Azure AI Foundry Labs, Magentic-UI represents an evolution from the earlier Magentic-One system. Its launch aims to explore fundamental questions regarding human oversight, safety mechanisms, and learning in agentic systems by offering an experimental platform for researchers and developers.

Magentic-UI features four core interactive capabilities: co-planning, co-tasking, action guards, and plan learning. Co-planning allows users to view and adjust the agent’s proposed steps before execution, providing full control over the AI’s actions. Co-tasking enables real-time visibility during operation, allowing users to pause, edit, or take over specific actions. Action guards serve as customizable confirmations for high-risk activities, such as closing browser tabs or submitting forms, which could have unintended consequences. Plan learning allows Magentic-UI to remember and refine steps for future tasks, improving performance over time.

Technically, when a user submits a request, the Orchestrator agent generates a step-by-step plan. Users can modify it through a graphical interface by editing, deleting, or regenerating steps. Once finalized, the plan is distributed among specialized agents, including the Orchestrator, WebSurfer, Coder, and FileSurfer. Each agent reports back after completing its task, and the Orchestrator determines whether to proceed, repeat, or solicit user feedback. This architecture ensures transparency and allows for adaptive task flows. For instance, if a step fails due to a broken link, the Orchestrator can dynamically adjust the plan with user consent.

In controlled evaluations using the GAIA benchmark, which includes complex tasks such as web navigation and document interpretation, Magentic-UI’s performance was rigorously tested. The GAIA benchmark comprises 162 tasks requiring multimodal understanding. When operating autonomously, Magentic-UI successfully completed 30.3% of tasks. However, with support from a simulated user possessing additional task information, success increased to 51.9%, reflecting a 71% improvement. Another configuration using a more adept simulated user raised success to 42.6%. Notably, Magentic-UI only requested help in 10% of enhanced tasks and sought final answers in 18%, averaging just 1.1 help requests per task. This indicates that minimal yet timely human intervention significantly enhances task completion without incurring high oversight costs.

Magentic-UI also includes a “Saved Plans” gallery that displays strategies reused from past tasks. Retrieval from this gallery is approximately three times faster than generating a new plan. A predictive mechanism surfaces these plans while users type, streamlining repetitive tasks such as flight searches or form submissions. Safety mechanisms are robust, with every browser or code action running inside a Docker container, ensuring that no user credentials are exposed. Users can define allow-lists for site access, and every action can be gated behind approval prompts. A red-team evaluation further tested it against phishing attacks and prompt injections, whereby the system either sought user clarification or blocked execution, reinforcing its layered defense model.

Key Takeaways from Magentic-UI Research:

Magentic-UI boosts task completion by 71% (from 30.3% to 51.9%) with simple human input.
Requests user help in only 10% of enhanced tasks, averaging 1.1 help requests per task.
Features a co-planning UI that allows full user control before execution.
Executes tasks via four modular agents: Orchestrator, WebSurfer, Coder, and FileSurfer.
Stores and reuses plans, reducing repeat task latency by up to 3x.
All actions are sandboxed via Docker containers; no user credentials are exposed.
Passed red-team evaluations against phishing and injection threats.
Supports fully user-configurable “action guards” for high-risk steps.
Fully open-source and integrated with Azure AI Foundry Labs.

In conclusion, Magentic-UI addresses a long-standing problem in AI automation: the lack of transparency and controllability. Rather than replacing users, it allows them to remain central to the process. The system performs well even with minimal help and learns to improve with each interaction. The modular design, robust safeguards, and detailed interaction model create a strong foundation for future intelligent assistants.

Check out the Technical details and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.