Large Multimodal Models (LMMs) excel in many vision-language tasks, but their effectiveness needs to improve in cross-cultural contexts. This is because they need to counterbalance the bias in their training datasets and methodologies, preventing a rich array of cultural elements from being properly represented in image captions. Overcoming this limitation will help to make artificial…
LLMs enable interactions with external tools and data sources, such as weather APIs or calculators, through function calls, unlocking diverse applications like autonomous AI agents and neurosymbolic reasoning systems. However, the current synchronous approach to function calling, where LLMs pause token generation until the execution of each call is complete, could be more resource-intensive and…
Video generation has improved with models like Sora, which uses the Diffusion Transformer (DiT) architecture. While text-to-video (T2V) models have advanced, they often find it hard to create clear and consistent videos without extra references. Text-image-to-video (TI2V) models address this limitation by using an initial image frame as grounding to improve clarity. Reaching Sora-level performance…
Model Merging allows one to leverage the expertise of specific fine-tuned models as a single powerful entity. The concept is straightforward: teach variants of a base foundation model on independent tasks until they become experts, and then assemble these experts as one. However, new concepts, domains, and tasks are emerging at an ever-increasing rate, leaving…
Large Language Models (LLMs), trained on extensive datasets and equipped with billions of parameters, demonstrate remarkable abilities to process and respond to diverse linguistic tasks. However, as tasks increase in complexity, the interpretability and adaptability of LLMs become critical challenges. The ability to efficiently perform multi-step reasoning and deliver transparent solutions remains a barrier, even…
Large Language Models (LLMs) have significantly advanced natural language processing, but tokenization-based architectures bring notable limitations. These models depend on fixed-vocabulary tokenizers like Byte Pair Encoding (BPE) to segment text into predefined tokens before training. While functional, tokenization can introduce inefficiencies and biases, particularly when dealing with multilingual data, noisy inputs, or long-tail distributions. Additionally,…
Language model routing is a growing field focused on optimizing the utilization of large language models (LLMs) for diverse tasks. With capabilities spanning text generation, summarization, and reasoning, these models are increasingly applied to varied input data. The ability to dynamically route specific tasks to the most suitable model has become a crucial challenge, aiming…
The rapid advancements in large language models (LLMs) have introduced significant opportunities for various industries. However, their deployment in real-world scenarios also presents challenges, such as generating harmful content, hallucinations, and potential ethical misuse. LLMs can produce socially biased, violent, or profane outputs, and adversarial actors often exploit vulnerabilities through jailbreaks to bypass safety measures.…
Sampling from complex probability distributions is important in many fields, including statistical modeling, machine learning, and physics. This involves generating representative data points from a target distribution to solve problems such as Bayesian inference, molecular simulations, and optimization in high-dimensional spaces. Unlike generative modeling, which uses pre-existing data samples, sampling requires algorithms to explore high-probability…