Large Language Models (LLMs) like GPT-4 exhibit impressive capabilities in text generation tasks such as summarization and question answering. However, they often produce “hallucinations,” generating content that is factually incorrect or contextually irrelevant. The problem is particularly acute when the LLMs are provided with correct facts but still produce inaccurate outputs, termed “contextual hallucinations.” These… →
The Retrieval-Augmented Generation (RAG) pipeline includes four major steps— generating embeddings for queries and documents, retrieving relevant documents, analyzing the retrieved data, and generating the final response. Each of these steps. requires separate queries and tools, resulting in a cumbersome, time-consuming, and potentially error-prone process. For example, generating embeddings might involve using a machine learning… →
Large Language Models (LLMs) have become critical tools in various domains due to their exceptional ability to understand and generate human language. These models, which often contain billions of parameters, require extensive computational resources for training and fine-tuning. The primary challenge lies in efficiently managing the memory and computational demands to make these models accessible… →
A fundamental topic in computer vision for nearly half a century, stereo matching involves calculating dense disparity maps from two corrected pictures. It plays a critical role in many applications, including autonomous driving, robotics, and augmented reality, among many others. According to their cost-volume computation and optimization methodologies, existing surveys categorize end-to-end architectures into 2D… →
In an effort to track its advancement towards creating Artificial Intelligence (AI) that can surpass human performance, OpenAI has launched a new classification system. According to a Bloomberg article, OpenAI has recently discussed a five-level framework to clarify its goal for AI safety and future improvements. Level 1: Conversational AI AI programs such as ChatGPT… →
Computer vision enables machines to interpret & understand visual information from the world. This encompasses a variety of tasks, such as image classification, object detection, and semantic segmentation. Innovations in this area have been propelled by developing advanced neural network architectures, particularly Convolutional Neural Networks (CNNs) and, more recently, Transformers. These models have demonstrated significant… →
Recent progress in Large Multimodal Models (LMMs) has demonstrated remarkable capabilities in various multimodal settings, moving closer to the goal of artificial general intelligence. By using large amounts of vision-language data, they enhance LLMs with visual abilities, by aligning vision encoders. However, most open-source LMMs have focused mainly on single-image scenarios, leaving the more complex… →
Large Language Models (LLMs) have made significant strides in recent years, prompting researchers to explore the development of Large Vision Language Models (LVLMs). These models aim to integrate visual and textual information processing capabilities. However, current open-source LVLMs face challenges in matching the versatility of proprietary models like GPT-4, Gemini Pro, and Claude 3. The… →
Obesity is a risk factor for postmenopausal breast cancer (BC), and evidence suggests a role for adiponectin in the relationship between obesity and BC. We investigated whether adiponectin or other biomarkers mediate the effect of body mass index (BMI) on postmenopausal BC risk in a cohort study nested in the IBIS-II Prevention Trial. We measured… →
BACKGROUND: Participants in research trials often disclose severe depression symptoms, including thoughts of self-harm and suicidal ideation, in validated self-administered questionnaires such as the Patient Health Questionnaire (PHQ-9). However, there is no standard protocol for responding to such disclosure, and the opportunity to support people at risk is potentially missed. We developed and evaluated a… →