Microsoft Introduces a DiskANN-Integrated System for Cost-Effective Vector Search Using Azure Cosmos DB
The ability to search high-dimensional vector representations has become essential for modern data systems. These vector representations, generated by deep learning models, encapsulate data’s semantic and contextual meanings, enabling systems to retrieve results based on relevance and similarity rather than exact matches. Such capabilities are crucial in large-scale applications like web search, AI-powered assistants, and content recommendations, where meaningful access to information is required.
Challenges in Vector-Based Retrieval
One significant challenge in vector-based retrieval is the high cost and complexity of operating separate systems for transactional data and vector indexes. Traditionally, vector databases are optimized solely for semantic search performance, necessitating data duplication from primary databases. This duplication introduces latency, storage overhead, and risks of inconsistencies. Developers face the burden of synchronizing two distinct systems, which can limit scalability, flexibility, and data integrity during rapid updates.
Popular tools for vector search, such as Zilliz and Pinecone, operate as standalone services that provide efficient similarity search. However, these platforms often rely on segment-based or fully in-memory architectures, requiring repeated rebuilding of indices and suffering from latency spikes and significant memory usage. This inefficiency is exacerbated in scenarios involving large-scale or constantly changing data, particularly when managing updates, filtering queries, or handling multiple tenants.
Microsoft’s Integrated Approach
Researchers at Microsoft have introduced an innovative approach that integrates vector indexing directly into Azure Cosmos DB’s NoSQL engine. By utilizing DiskANN, a graph-based indexing library known for its performance in large-scale semantic search, they have re-engineered it to function within Cosmos DB’s infrastructure. This integration eliminates the need for a separate vector database, fully leveraging Cosmos DB’s built-in capabilities—such as high availability, elasticity, multi-tenancy, and automatic partitioning—resulting in a cost-efficient and scalable solution.
Each collection maintains a single vector index per partition, synchronized with the main document data using the existing Bw-Tree index structure. The rewritten DiskANN library, developed in Rust, introduces asynchronous operations to ensure compatibility with database environments. This allows the database to retrieve or update only necessary vector components, such as quantized versions or neighbor lists, thereby reducing memory usage. Vector insertions and queries are managed using a hybrid approach, with most computations occurring in quantized space. This design supports paginated searches and filter-aware traversal, enabling efficient handling of complex predicates across billions of vectors. Additionally, a sharded indexing mode allows for separate indices based on defined keys, such as tenant ID or time period.
Performance and Cost Efficiency
In experiments, the system demonstrated strong performance. For a dataset of 10 million 768-dimensional vectors, query latency remained below 20 milliseconds (p50), achieving a recall@10 of 94.64%. Compared to enterprise-tier offerings, Azure Cosmos DB provided query costs that were 15× lower than Zilliz and 41× lower than Pinecone. Cost efficiency was maintained even as the index increased from 100,000 to 10 million vectors, with less than a 2× rise in latency or Request Units (RUs). On ingestion, Cosmos DB charged approximately $162.5 for 10 million vector inserts, which was lower than Pinecone and DataStax, though higher than Zilliz. Furthermore, recall remained stable even during heavy update cycles, with in-place deletions significantly improving accuracy in shifting data distributions.
Conclusion
This study presents a compelling solution for unifying vector search with transactional databases. The research team from Microsoft has designed a system that simplifies operations while achieving considerable performance in cost, latency, and scalability. By embedding vector search within Cosmos DB, they offer a practical template for integrating semantic capabilities directly into operational workloads.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.