Large Language Models (LLMs) have made a significant leap in recent years, but their inference process faces challenges, particularly in the prefilling stage. The primary issue lies in the time-to-first-token (TTFT), which can be slow for long prompts due to the deep and wide architecture of state-of-the-art transformer-based LLMs. This slowdown occurs because the cost…
As large language models surpass human-level capabilities, providing accurate supervision becomes increasingly difficult. Weak-to-strong learning, which uses a less capable model to enhance a stronger one, offers potential benefits but needs testing for complex reasoning tasks. This method currently lacks efficient techniques to prevent the stronger model from imitating the weaker model’s errors. As AI…
General circulation models (GCMs) form the backbone of weather and climate prediction, leveraging numerical solvers for large-scale dynamics and parameterizations for smaller-scale processes like cloud formation. Despite continuous improvements, GCMs face significant challenges, including persistent errors, biases, and uncertainties in long-term climate projections and extreme weather events. The recent machine-learning (ML) models have remarkably succeeded…
In recent years, research on tabular machine learning has grown rapidly. Yet, it still poses significant challenges for researchers and practitioners. Traditionally, academic benchmarks for tabular ML have not fully represented the complexities encountered in real-world industrial applications. Most available datasets either lack the temporal metadata necessary for time-based splits or come from less extensive…
Large Language Models (LLMs) excel in various tasks, including text generation, translation, and summarization. However, a growing challenge within NLP is how these models can effectively interact with external tools to perform tasks beyond their inherent capabilities. This challenge is particularly relevant in real-world applications where LLMs must fetch real-time data, perform complex calculations, or…
Document Visual Question Answering (DocVQA) is a branch of visual question answering that focuses on answering queries about the contents of documents. These documents can take several forms, including scanned photographs, PDFs, and digital documents with text and visual features. However, there are few datasets for DocVQA because collecting and annotating the data is complicated.…
Recent advances in large language models (LLMs) have made it possible to use LLM agents in many areas, including safety-critical ones like finance, healthcare, and self-driving cars. Usually, these agents use an LLM to understand tasks for making plans, and they can use external tools, like third-party APIs, to carry out those plans. However, their…
Effectively evaluating document instruction data for training large language models (LLMs) and multimodal large language models (MLLMs) in document visual question answering (VQA) presents a significant challenge. Existing methods are primarily text-oriented, focusing on the textual content of instructions rather than the execution process, which limits their ability to comprehensively assess the quality and efficacy…
Securing their products is a challenge for businesses. Teams are inundated with false positives from current Static Application Security Testing (SAST) technologies, and those identifying vulnerabilities cannot be fixed. Meet ZeroPath, a GitHub app that detects, verifies, and issues pull requests for security vulnerabilities in your code. The ZeroPath tool not only automatically identifies vulnerabilities…
On-call shifts can be very stressful for engineers. When something goes wrong in a system, the person on call has to figure out the problem and fix it quickly. This often means going through lots of logs and data, which takes time and can be challenging, especially during off-hours. Getting to the root cause of…