A significant challenge in the field of visual question answering (VQA) is the task of Multi-Image Visual Question Answering (MIQA). This involves generating relevant and grounded responses to natural language queries based on a large set of images. Existing Large Multimodal Models (LMMs) excel in single-image visual question answering but face substantial difficulties when queries…
Multi-target multi-camera tracking (MTMCT) is essential for intelligent transportation systems. Still, it faces challenges in real-world applications due to limited publicly available data and the labor-intensive process of manual annotation. Efficient traffic management has been improved with advancements in computer vision, enabling accurate prediction and analysis of traffic volumes. MTMCT involves tracking vehicles across multiple…
Prior to PILOT, fitting linear model trees was slow and prone to overfitting, especially with large datasets. Traditional regression trees struggled to capture linear relationships effectively. Linear model trees faced interpretability challenges when incorporating linear models in leaf nodes. The research emphasized the need for algorithms combining decision tree interpretability with accurate linear relationship modeling.…
Meta announced the release of Llama 3.1, the most capable model in the LLama Series. This latest iteration of the Llama series, particularly the 405B model, represents a substantial advancement in open-source AI capabilities, positioning Meta at the forefront of AI innovation. Meta has long advocated for open-source AI, a stance underscored by Mark Zuckerberg’s…
Large Language Models (LLMs) have made a significant leap in recent years, but their inference process faces challenges, particularly in the prefilling stage. The primary issue lies in the time-to-first-token (TTFT), which can be slow for long prompts due to the deep and wide architecture of state-of-the-art transformer-based LLMs. This slowdown occurs because the cost…
As large language models surpass human-level capabilities, providing accurate supervision becomes increasingly difficult. Weak-to-strong learning, which uses a less capable model to enhance a stronger one, offers potential benefits but needs testing for complex reasoning tasks. This method currently lacks efficient techniques to prevent the stronger model from imitating the weaker model’s errors. As AI…
General circulation models (GCMs) form the backbone of weather and climate prediction, leveraging numerical solvers for large-scale dynamics and parameterizations for smaller-scale processes like cloud formation. Despite continuous improvements, GCMs face significant challenges, including persistent errors, biases, and uncertainties in long-term climate projections and extreme weather events. The recent machine-learning (ML) models have remarkably succeeded…
In recent years, research on tabular machine learning has grown rapidly. Yet, it still poses significant challenges for researchers and practitioners. Traditionally, academic benchmarks for tabular ML have not fully represented the complexities encountered in real-world industrial applications. Most available datasets either lack the temporal metadata necessary for time-based splits or come from less extensive…
Large Language Models (LLMs) excel in various tasks, including text generation, translation, and summarization. However, a growing challenge within NLP is how these models can effectively interact with external tools to perform tasks beyond their inherent capabilities. This challenge is particularly relevant in real-world applications where LLMs must fetch real-time data, perform complex calculations, or…