Researchers have highlighted concerns regarding hallucinations in LLMs due to their generation of plausible but inaccurate or unrelated content. However, these hallucinations hold potential in creativity-driven fields like drug discovery, where innovation is essential. LLMs have been widely applied in scientific domains, such as materials science, biology, and chemistry, aiding tasks like molecular description and…
Large Language Models (LLMs) have become an indispensable part of contemporary life, shaping the future of nearly every conceivable domain. They are widely acknowledged for their impressive performance across tasks of varying complexity. However, instances have arisen where LLMs have been criticized for generating unexpected and unsafe responses. Consequently, ongoing research aims to align LLMs…
Knowledge distillation, a crucial technique in artificial intelligence for transferring knowledge from large language models (LLMs) to smaller, resource-efficient ones, faces several significant challenges that limit its utility. Over-distillation tends to cause homogenization, in which student models over-imitate teacher models and lose diversity and the capacity to solve novel or challenging tasks. Also, the non-transparent…
Multimodal AI integrates diverse data formats, such as text and images, to create systems capable of accurately understanding and generating content. By bridging textual and visual data, these models address real-world problems like visual question answering, instruction-following, and creative content generation. They rely on advanced architectures and large-scale datasets to enhance performance, focusing on overcoming…
SSL is a powerful technique for extracting meaningful patterns from large, unlabelled datasets, proving transformative in fields like computer vision and NLP. In single-cell genomics (SCG), SSL offers significant potential for analyzing complex biological data, especially with the advent of foundation models. SCG, fueled by advances in single-cell RNA sequencing, has evolved into a data-intensive…
With the release of DeepSeek R1, there is a buzz in the AI community. The open-source model offers some best-in-class performance across many metrics, even at par with state-of-the-art proprietary models in many cases. Such huge success invites attention and curiosity to learn more about it. In this article, we will look into implementing a …
Artificial intelligence has grown significantly with the integration of vision and language, allowing systems to interpret and generate information across multiple data modalities. This capability enhances applications such as natural language processing, computer vision, and human-computer interaction by seamlessly allowing AI models to process textual, visual, and video inputs. However, challenges remain in ensuring that…
Large language models (LLMs) have shown remarkable abilities in language tasks and reasoning, but their capacity for autonomous planning—especially in complex, multi-step scenarios—remains limited. Traditional approaches often rely on external verification tools or linear prompting methods, which struggle with error correction, state tracking, and computational efficiency. This gap becomes evident in benchmarks like Blocksworld, where…
Novel view synthesis has witnessed significant advancements recently, with Neural Radiance Fields (NeRF) pioneering 3D representation techniques through neural rendering. While NeRF introduced innovative methods for reconstructing scenes by accumulating RGB values along sampling rays using multilayer perceptrons (MLPs), it encountered substantial computational challenges. The extensive ray point sampling and large neural network volumes created…
The advancements in large language models (LLMs) have significantly enhanced natural language processing (NLP), enabling capabilities like contextual understanding, code generation, and reasoning. However, a key limitation persists: the restricted context window size. Most LLMs can only process a fixed amount of text, typically up to 128K tokens, which limits their ability to handle tasks…