LLMs enable interactions with external tools and data sources, such as weather APIs or calculators, through function calls, unlocking diverse applications like autonomous AI agents and neurosymbolic reasoning systems. However, the current synchronous approach to function calling, where LLMs pause token generation until the execution of each call is complete, could be more resource-intensive and…
Video generation has improved with models like Sora, which uses the Diffusion Transformer (DiT) architecture. While text-to-video (T2V) models have advanced, they often find it hard to create clear and consistent videos without extra references. Text-image-to-video (TI2V) models address this limitation by using an initial image frame as grounding to improve clarity. Reaching Sora-level performance…
Model Merging allows one to leverage the expertise of specific fine-tuned models as a single powerful entity. The concept is straightforward: teach variants of a base foundation model on independent tasks until they become experts, and then assemble these experts as one. However, new concepts, domains, and tasks are emerging at an ever-increasing rate, leaving…
Large Language Models (LLMs), trained on extensive datasets and equipped with billions of parameters, demonstrate remarkable abilities to process and respond to diverse linguistic tasks. However, as tasks increase in complexity, the interpretability and adaptability of LLMs become critical challenges. The ability to efficiently perform multi-step reasoning and deliver transparent solutions remains a barrier, even…
Large Language Models (LLMs) have significantly advanced natural language processing, but tokenization-based architectures bring notable limitations. These models depend on fixed-vocabulary tokenizers like Byte Pair Encoding (BPE) to segment text into predefined tokens before training. While functional, tokenization can introduce inefficiencies and biases, particularly when dealing with multilingual data, noisy inputs, or long-tail distributions. Additionally,…
Language model routing is a growing field focused on optimizing the utilization of large language models (LLMs) for diverse tasks. With capabilities spanning text generation, summarization, and reasoning, these models are increasingly applied to varied input data. The ability to dynamically route specific tasks to the most suitable model has become a crucial challenge, aiming…
The rapid advancements in large language models (LLMs) have introduced significant opportunities for various industries. However, their deployment in real-world scenarios also presents challenges, such as generating harmful content, hallucinations, and potential ethical misuse. LLMs can produce socially biased, violent, or profane outputs, and adversarial actors often exploit vulnerabilities through jailbreaks to bypass safety measures.…
Sampling from complex probability distributions is important in many fields, including statistical modeling, machine learning, and physics. This involves generating representative data points from a target distribution to solve problems such as Bayesian inference, molecular simulations, and optimization in high-dimensional spaces. Unlike generative modeling, which uses pre-existing data samples, sampling requires algorithms to explore high-probability…
AI Video Generation has become increasingly popular in many industries due to its efficacy, cost-effectiveness, and ease of use. However, most state-of-the-art video generators rely on bidirectional models that consider both forward and backward temporal information to create each video part. This approach yields high-quality videos but presents a heavy computational load and is not…
The advancement of AI model capabilities raises significant concerns about potential misuse and security risks. As artificial intelligence systems become more sophisticated and support diverse input modalities, the need for robust safeguards has become paramount. Researchers have identified critical threats, including the potential for cybercrime, biological weapon development, and the spread of harmful misinformation. Multiple…