LLMs are gaining traction as the workforce across domains is exploring artificial intelligence and automation to plan their operations and make crucial decisions. Generative and Foundational models are thus relied on for multi-step reasoning tasks to achieve planning and execution at par with humans. Although this aspiration is yet to be achieved, we require extensive…
Large Language Models (LLMs) have gained significant attention in recent years, but improving their performance remains a challenging task. Researchers are striving to enhance already-trained models by creating additional, targeted training data that addresses specific weaknesses. This process, known as instruction tuning and alignment, has shown promise in enhancing model capabilities across various tasks. However,…
Multimodal Large Language Models (MLLMs) have made significant progress in various applications using the power of Transformer models and their attention mechanisms. However, these models face a critical challenge of inherent biases in their initial parameters, known as modality priors, which can negatively impact output quality. The attention mechanism, which determines how input information is…
Graphical User Interface (GUI) agents are crucial in automating interactions within digital environments, similar to how humans operate software using keyboards, mice, or touchscreens. GUI agents can simplify complex processes such as software testing, web automation, and digital assistance by autonomously navigating and manipulating GUI elements. These agents are designed to perceive their surroundings through…
Addressing the Challenges in AI Development The journey to building open source and collaborative AI has faced numerous challenges. One major problem is the centralization of AI model development, which has largely been controlled by a big AI players with vast resources. This concentration of power limits opportunities for broader participation in the AI development…
In the rapidly evolving world of artificial intelligence, one pressing challenge that developers face is orchestrating complex multi-agent systems. These systems, involving multiple AI agents working collaboratively, often present significant difficulties in coordination, control, and scalability. Current solutions tend to be heavy, requiring extensive resource allocation, which complicates deployment and testing. OpenAI introduces the Swarm…
Text-to-Audio (TTA) and Text-to-Music (TTM) generation have seen significant advancements in recent years, driven by audio-domain diffusion models. These models have demonstrated superior audio modeling capabilities compared to generative adversarial networks (GANs) and variational autoencoders (VAEs). However, diffusion models face the challenge of long inference times due to their iterative denoising process. This results in…
Retrieval-augmented generation (RAG) has become a key technique in enhancing the capabilities of LLMs by incorporating external knowledge into their outputs. RAG methods enable LLMs to access additional information from external sources, such as web-based databases, scientific literature, or domain-specific corpora, which improves their performance in knowledge-intensive tasks. RAG systems can generate more contextually accurate…
Multimodal Attributed Graphs (MMAGs) have received little attention despite their versatility in image generation. MMAGs represent relationships between entities with combinatorial complexity in a graph-structured manner. Nodes in the graph contain both image and text information. Compared to text or image conditioning models, graphs could be converted into better and more informative images. Graph2Image is…
The problem that this research seeks to address lies in the inherent limitations of existing large language models (LLMs) when applied to formal theorem proving. Current models are often trained or fine-tuned on specific datasets, such as those focused on undergraduate-level mathematics, but struggle to generalize to more advanced mathematical domains. These limitations become more…