Human beings possess innate extraordinary perceptual judgments, and when computer vision models are aligned with them, model’s performance can be improved manifold. Various attributes such as scene layout, subject location, camera pose, color, perspective, and semantics help us have a clear picture of the world and objects within. The alignment of vision models with visual…
Visual and action data are interconnected in robotic tasks, forming a perception-action loop. Robots rely on control parameters for movement, while VFMs excel in processing visual data. However, a modality gap exists between visual and action data arising from the fundamental differences in their sensory modalities, abstraction levels, temporal dynamics, contextual dependence, and susceptibility to…
Large language models (LLMs) like GPT-4, Gemini, and Llama 3 have revolutionized natural language processing through extensive pre-training and supervised fine-tuning (SFT). However, these models come with high computational costs for training and inference. Structured pruning has emerged as a promising method to improve LLM efficiency by selectively removing less critical components. Despite its potential,…
The problem of over-optimization of likelihood in Direct Alignment Algorithms (DAAs), such as Direct Preference Optimisation (DPO) and Identity Preference Optimisation (IPO), arises when these methods fail to improve model performance despite increasing the likelihood of preferred outcomes. These algorithms, which are alternatives to Reinforcement Learning from Human Feedback (RLHF), aim to align language models…
Large Language models (LLMs) have long been trained to process vast amounts of data to generate responses that align with patterns seen during training. However, researchers are exploring a more profound concept: introspection, the ability of LLMs to reflect on their behavior and gain knowledge that isn’t directly derived from their training data. This new…
Point tracking is paramount in video; from 3d reconstruction to editing tasks, a precise approximation of points is necessary to achieve quality results. Over time, trackers have incorporated transformer and neural network-based designs to track individual and multiple points simultaneously. However, these neural networks could be fully exploited only with high-quality training data. Now, while…
The rise of Transformer-based models has significantly advanced the field of natural language processing. However, the training of these models is often computationally intensive, requiring substantial resources and time. This research addresses the issue of improving the training efficiency of Transformer models without compromising their performance. Specifically, it seeks to explore whether the benefits of…
Bayesian Optimization, widely used in experimental design and black-box optimization, traditionally relies on regression models for predicting the performance of solutions within fixed search spaces. However, many regression methods are task-specific due to modeling assumptions and input constraints. This issue is especially prevalent in learning-based regression, which depends on fixed-length tensor inputs. Recent advancements in…
AI has significantly impacted healthcare, particularly in disease diagnosis and treatment planning. One area gaining attention is the development of Medical Large Vision-Language Models (Med-LVLMs), which combine visual and textual data for advanced diagnostic tools. These models have shown great potential for improving the analysis of complex medical images, offering interactive and intelligent responses that…
Dynamical systems are mathematical models that explain how a system evolves due to physical interactions or forces. These systems are fundamental to understanding various phenomena across scientific fields like physics, biology, and engineering. For example, they model fluid dynamics, celestial mechanics, and robotic movements. The core challenge in modeling these systems lies in their complexity,…