Researchers from UC Berkeley and Anyscale Introduce RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing

Large Language Models (LLMs) have showcased impressive capabilities across various tasks but vary widely in costs and capabilities. Deploying these models in real-world applications presents a significant challenge: routing all queries to the most capable models ensures high-quality responses but is expensive while directing queries to smaller models saves costs at the expense of response quality. Researchers from UC Berkeley, Anyscale, and Canva propose RouteLLM, an open-source LLM routing framework that effectively balances price and performance to address this issue.

Challenges in LLM Routing

LLM routing aims to determine which model should handle each query to minimize costs while maintaining response quality. The routing system must infer the characteristics of incoming queries and the capabilities of different models, making the problem complex. RouteLLM addresses this by utilizing preference data to train its routers, allowing the system to learn which queries can be handled by weaker models and which require stronger models.

Image Source

Framework and Methodology

RouteLLM formalizes the problem of LLM routing and explores augmentation techniques to improve router performance. The framework uses public data from Chatbot Arena and incorporates novel training methods. Four different routers were trained:

Similarity-weighted (SW) ranking router: Performs a “weighted Elo calculation” based on similarity.
Matrix factorization model: Learns a scoring function for how well a model can answer a prompt.
BERT classifier: Predicts which model can provide a better response.
Causal LLM classifier: Also predicts which model can provide a better response.

The training process leverages preference data, where each data point consists of a prompt and a comparison of response quality between two models. This method helps understand the strengths and weaknesses of different models relative to various queries.

Performance and Cost Efficiency

The performance of these routers was evaluated on benchmarks like MT Bench, MMLU, and GSM8K. The results demonstrated that the routers could significantly reduce costs without compromising quality. For instance, on MT Bench, the matrix factorization router achieved 95% of GPT-4’s performance while making only 26% of the calls to GPT-4, resulting in a 48% cost reduction compared to the random baseline. Augmenting the training data using an LLM judge further improved the routers’ performance, reducing the number of GPT-4 calls required to just 14% while maintaining the same performance level.

Image Source

On MMLU, the routers initially performed poorly due to the out-of-distribution nature of most questions. However, augmenting the dataset with golden-label data from the MMLU validation split led to significant improvements. The best-performing causal LLM router required only 54% GPT-4 calls to achieve 95% GPT-4 performance, offering a 14% cost reduction compared to the random baseline.

Image Source

Comparison with Commercial Offerings

RouteLLM’s performance was compared against commercial routing systems like Martian and Unify AI. Using GPT-4 Turbo as the strong model and Llama 2 70B or Mixtral 8x7B as the weak model, RouteLLM achieved similar performance while being over 40% cheaper. This comparison underscores the cost-effectiveness and competitive edge of the RouteLLM framework.

Generalization to Other Models

To demonstrate its generalizability, RouteLLM was tested with different model pairs, such as Claude 3 Opus and Llama 3 8B. The routers maintained strong performance without retraining, indicating that they learned common characteristics that help distinguish between strong and weak models, applicable to new model pairs.

Conclusion

RouteLLM provides a scalable and cost-effective solution for deploying LLMs by effectively balancing cost and performance. The framework’s use of preference data and data augmentation techniques ensures high-quality responses while significantly reducing costs—the open-source release of RouteLLM, along with its datasets and code.

Check out the Paper, GitHub, and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter.

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 45k+ ML SubReddit

The post Researchers from UC Berkeley and Anyscale Introduce RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing appeared first on MarkTechPost.