Automatic benchmarks like AlpacaEval 2.0, Arena-Hard-Auto, and MTBench have gained popularity for evaluating LLMs due to their affordability and scalability compared to human evaluation. These benchmarks use LLM-based auto-annotators, which align well with human preferences, to provide timely assessments of new models. However, high win rates on these benchmarks can be manipulated by altering output…
Large language models (LLMs) have demonstrated impressive capabilities in in-context learning (ICL), a form of supervised learning that doesn’t require parameter updates. However, researchers are now exploring whether this ability extends to reinforcement learning (RL), introducing the concept of in-context reinforcement learning (ICRL). The challenge lies in adapting the ICL approach, which relies on input-output…
Model merging is an advanced technique in machine learning aimed at combining the strengths of multiple expert models into a single, more powerful model. This process allows the system to benefit from the knowledge of various models while reducing the need for large-scale individual model training. Merging models cuts down computational and storage costs and…
Robotic task execution in open-world environments presents significant challenges due to the vast state-action spaces and the dynamic nature of unstructured settings. Traditional robots struggle with unexpected objects, varying environments, and task ambiguities. Existing systems, often designed for controlled or pre-scanned environments, lack the adaptability required to respond effectively to real-time changes or unfamiliar tasks.…
High latency in time-to-first-token (TTFT) is a significant challenge for retrieval-augmented generation (RAG) systems. Existing RAG systems, which concatenate and process multiple retrieved document chunks to create responses, require substantial computation, leading to delays. Repeated computation of key-value (KV) caches for retrieved documents further exacerbates this inefficiency. As a result, RAG systems struggle to meet…
Scaling state-of-the-art models for real-world deployment often requires training different model sizes to adapt to various computing environments. However, training multiple versions independently is computationally expensive and leads to inefficiencies in deployment when intermediate-sized models are optimal. Current solutions like model compression and distillation have limitations, often requiring additional data and retraining, which may degrade…
Large Language Models (LLMs) have gained significant attention for their versatility in various tasks, from natural language processing to complex reasoning. A promising application of these models is the development of autonomous multi-agent systems (MAS), which aim to utilize the collective intelligence of multiple LLM-based agents for collaborative problem-solving. However, LLM-based MAS faces two critical…
Retrieval-augmented generation (RAG) is a method that integrates external knowledge sources into large language models (LLMs) to provide accurate and contextually relevant responses. These systems enhance the ability of LLMs to offer detailed and specific answers to user queries by utilizing up-to-date information from various domains. The field is particularly important in applications such as…
Ego-centric searches are essential in many applications, from financial fraud detection to social network research, because they concentrate on a single vertex and its immediate neighbors. These queries offer insights into direct connections by analyzing interconnections around a key node. Enabling such searches without jeopardizing privacy becomes a major difficulty when graphs are dispersed over…
In the ever-evolving world of artificial intelligence (AI), large language models have proven instrumental in addressing a wide array of challenges, from automating complex tasks to enhancing decision-making processes. However, scaling these models has also introduced considerable complexities, such as high computational costs, reduced accessibility, and the environmental impact of extensive resource requirements. The enormous…