LLMs, characterized by their massive parameter sizes, often lead to inefficiencies in deployment due to high memory and computational demands. One practical solution is semi-structured pruning, particularly the N: M sparsity pattern, which enhances efficiency by maintaining N non-zero values among M parameters. While hardware-friendly, such as for GPUs, this approach faces challenges due to…
Large language models (LLMs) have garnered significant attention for their ability to understand and generate human-like text. These models possess the unique capability to encode factual knowledge effectively, thanks to the vast amount of data they are trained on. This ability is crucial in various applications, ranging from natural language processing (NLP) tasks to more…
Large language models (LLMs) have advanced significantly in recent years. However, its real-world applications are restricted due to substantial processing power and memory requirements. The need to make LLMs more accessible on smaller and resource-limited devices drives the development of more efficient frameworks for model inference and deployment. Existing methods for running LLMs include hardware…
Large Language Models (LLMs) have made significant strides in various Natural Language Processing tasks, yet they still struggle with mathematics and complex logical reasoning. Chain-of-Thought (CoT) prompting has emerged as a promising approach to enhance reasoning capabilities by incorporating intermediate steps. However, LLMs often exhibit unfaithful reasoning, where conclusions don’t align with the generated reasoning…
Instruction-tuned LMs have shown remarkable zero-shot generalization but often fail on tasks outside their training data. These LMs, built on large datasets and billions of parameters, excel in In-Context Learning (ICL), generating responses based on a few examples without re-training. However, the training dataset’s scope limits its effectiveness on unfamiliar tasks. Techniques like prompt engineering…
Large language models (LLMs) have gained significant attention due to their advanced capabilities in processing and generating text. However, the increasing demand for multimodal input processing has led to the development of vision language models. These models combine the strengths of LLMs with image encoders to create large vision language models (LVLMs). Despite their promising…
Retrieval-augmented generation (RAG) has been a transformative approach in natural language processing, combining retrieval mechanisms with generative models to enhance factual accuracy and reasoning capabilities. RAG systems excel in generating complex responses by leveraging external sources and synthesizing the retrieved information into coherent narratives. Unlike traditional models that rely solely on pre-existing knowledge, RAG systems…
Generating versatile and high-quality text embeddings across various tasks is a significant challenge in natural language processing (NLP). Current embedding models, despite advancements, often struggle to handle unseen tasks and complex retrieval operations effectively. These limitations hinder their ability to adapt dynamically to diverse contexts, a critical requirement for real-world applications. Addressing this challenge is…