«`html

Building AI Agents: 5% AI and 100% Software Engineering

The development of production-grade AI agents hinges more on software engineering than on the AI models themselves. Key aspects such as data plumbing, controls, and observability are critical for success. This article explores the essential components of a doc-to-chat pipeline and how to integrate AI agents into existing software stacks effectively.

Understanding the Doc-to-Chat Pipeline

A doc-to-chat pipeline is designed to process enterprise documents by ingesting, standardizing, enforcing governance, indexing embeddings alongside relational features, and serving retrieval and generation through authenticated APIs. This architecture is crucial for applications like agentic Q&A, copilots, and workflow automation, ensuring that responses adhere to permissions and are audit-ready.

Integration with Existing Stacks

To integrate AI agents seamlessly, utilize standard service boundaries such as REST/JSON or gRPC over a trusted storage layer. For tables, Iceberg provides ACID compliance, schema evolution, and snapshots, which are essential for reproducible retrieval. For vector data, consider using pgvector for embedding management alongside SQL filters or dedicated engines like Milvus for high-query-per-second (QPS) approximate nearest neighbor (ANN) searches.

Key Properties of Data Management

Iceberg Tables: ACID compliance, hidden partitioning, and snapshot isolation.
pgvector: Combines SQL and vector similarity in a single query plan.
Milvus: Scalable architecture for large-scale similarity searches.

Coordinating Agents, Humans, and Workflows

Effective production agents require explicit coordination points for human intervention. Tools like AWS A2I provide managed human-in-the-loop (HITL) processes, ensuring low-confidence outputs are reviewed. Frameworks such as LangGraph can model these checkpoints within agent workflows, making approvals integral to the process.

Ensuring Reliability Before Model Deployment

Reliability in AI systems should be treated as a layered defense strategy:

Language and Content Guardrails: Pre-validate inputs and outputs for safety.
PII Detection and Redaction: Use tools like Microsoft Presidio for identifying and masking personally identifiable information.
Access Control and Lineage: Implement row- and column-level access controls to ensure compliance.
Retrieval Quality Gates: Evaluate retrieval-augmented generation (RAG) using metrics like faithfulness and context precision.

Scaling Indexing and Retrieval

To handle real traffic effectively, focus on two main axes: ingest throughput and query concurrency. Normalize data at the lakehouse edge and write to Iceberg for versioned snapshots. For vector serving, leverage Milvus’s architecture to support horizontal scaling and independent failure domains.

Monitoring Beyond Logs

Effective monitoring requires a combination of traces, metrics, and evaluations:

Distributed Tracing: Use OpenTelemetry for end-to-end visibility.
LLM Observability Platforms: Compare options like LangSmith and Arize Phoenix.
Continuous Evaluation: Schedule evaluations on canary sets to track performance over time.

Conclusion: The Importance of Software Engineering in AI

The assertion that building AI agents is 5% AI and 100% software engineering reflects the reality that most failures in agent systems stem from issues related to data quality, permissioning, and retrieval decay rather than model performance. By prioritizing robust data management and observability practices, organizations can ensure their AI systems are reliable and effective.

References

«`

Building AI agents is 5% AI and 100% software engineering