The rapid evolution of AI has brought notable advancements in natural language understanding and generation. However, these improvements often fall short when faced with complex reasoning, long-term planning, or optimization tasks requiring deeper contextual understanding. While models like OpenAI’s GPT-4 and Meta’s Llama excel in language modeling, their capabilities in advanced planning and reasoning remain…
Text generation is a foundational component of modern natural language processing (NLP), enabling applications ranging from chatbots to automated content creation. However, handling long prompts and dynamic contexts presents significant challenges. Existing systems often face limitations in latency, memory efficiency, and scalability. These constraints are especially problematic for applications requiring extensive context, where bottlenecks in…
Open-source MLLMs exhibit considerable promise across diverse tasks by integrating visual encoders with language models. However, their reasoning abilities could be improved, largely due to existing instruction-tuning datasets often repurposed from academic resources like VQA and AI2D. These datasets focus on simplistic tasks with phrase-based answers and need more complexity for advanced reasoning. CoT reasoning,…
DeepSeek AI has made significant progress in advancing artificial intelligence, particularly in areas like reasoning, mathematics, and coding. Earlier versions of its models achieved notable success in tackling mathematical and reasoning tasks, but there was room to improve their consistency across a broader range of applications, such as live coding and nuanced writing. These gaps…
Neural networks (NNs) remarkably transform high-dimensional data into compact, lower-dimensional latent spaces. While researchers traditionally focus on model outputs like classification or generation, understanding the internal representation geometry has emerged as a critical area of investigation. These internal representations offer profound insights into neural network functionality, enabling researchers to repurpose learned features for downstream tasks…
Transformers have been the foundation of large language models (LLMs), and recently, their application has expanded to search problems in graphs, a foundational domain in computational logic, planning, and AI. Graph search is integral to solving tasks requiring systematically exploring nodes and edges to find connections or paths. Despite transformers’ apparent adaptability, their ability to…
Wireless communication is the foundation of modern systems, enabling critical applications in military, commercial, and civilian domains. Its increasing prevalence has changed daily life and operations worldwide while introducing serious security threats. Attackers exploit these vulnerabilities to intercept sensitive data, disrupt communications, or conduct targeted attacks, compromising confidentiality and functionality. While encryption is a critical…
Large language models (LLMs) have made important advances in artificial intelligence, with superior performance on various tasks as their parameters and training data grow. GPT-3, PaLM, and Llama-3.1 perform well in many applications with billions of parameters. However, when implemented in low-power platforms, scaling LLMs poses severe difficulties regarding training and inference queries. While it…
Sequential Recommendation systems have crucial applications in industries like e-commerce and streaming services. These systems collect and analyze the user interaction data over time to predict their preferences. However, the ID-based representations of users and items these systems rely on face critical drawbacks when transferring the same model to a new system. The new system…
LLMs like GPT-4 and LLaMA have gained significant attention for their exceptional capabilities in natural language inference, summarization, and question-answering tasks. However, these models often generate outputs that appear credible but include inaccuracies, fabricated details, or misleading information, a phenomenon termed hallucinations. This issue presents a critical challenge for deploying LLMs in applications where precision…