Category Added in a WPeMatico Campaign
The natural language processing (NLP) field rapidly evolves, with small language models gaining prominence. These models, designed for efficient inference on consumer hardware and edge devices, are increasingly important. They allow for full offline applications and have shown significant utility when fine-tuned for tasks such as sequence classification, question answering, or token classification, often outperforming…
Robustness is crucial for deploying deep learning models in real-world applications. Vision Transformers (ViTs) have shown strong robustness and state-of-the-art performance in various vision tasks since their introduction in the 2020s, outperforming traditional CNNs. Recent advancements in large kernel convolutions have revived interest in CNNs, showing they can match or exceed ViT performance. However, the…
Large language models (LLMs) have gained significant attention in solving planning problems, but current methodologies must be revised. Direct plan generation using LLMs has shown limited success, with GPT-4 achieving only 35% accuracy on simple planning tasks. This low accuracy highlights the need for more effective approaches. Another significant challenge lies in the lack of…
Whole-body pose estimation is a key component for improving the capabilities of human-centric AI systems. It is useful in human-computer interaction, virtual avatar animation, and the film industry. Early research in this field was challenging due to the task’s complexity and limited computational power and data, so, researchers focused on estimating the pose of separate…
CAMEL-AI has recently announced the release of CAMEL, a groundbreaking communicative agent framework designed to enhance the scalability and autonomous cooperation among language model agents. The rapid progression of conversational and chat-based language models has ushered in the era of complex problem-solving capabilities. However, these advancements have predominantly depended on substantial human input to guide…
Document retrieval, a subfield of information retrieval, focuses on matching user queries with relevant documents within a corpus. It is crucial in various industrial applications, such as search engines and information extraction systems. Effective document retrieval systems must handle textual content and visual elements like images, tables, and figures to convey information to users efficiently.…
Technological advancements in sensors, AI, and processing power have propelled robot navigation to new heights in the last several decades. To take robotics to the next level and make them a regular part of our lives, many studies suggest transferring the natural language space of ObjNav and VLN to the multimodal space so the robot…
Recent developments in neural information retrieval (IR) models have greatly improved their effectiveness across various IR tasks. These advancements have made neural IR models more capable of understanding and retrieving relevant information in response to user queries. However, ensuring the reliability of these models in practical applications requires a focus on their robustness, which has…
Human-computer interaction (HCI) has significantly enhanced how humans and computers communicate. Researchers focus on improving various aspects, such as social dialogue, writing assistance, and multimodal interactions, to make these exchanges more engaging and satisfying. These advancements aim to integrate multiple perspectives and social skills into interactions, thus making them more realistic and effective. One major…
Researchers at IBM address the difficulty of extracting valuable insights from large databases, especially in businesses. The massive volume and variety of data make it difficult for employees to locate the necessary information. Writing SQL code required to retrieve data across multiple schemas and tables can be complex. This limitation hampers the ability of businesses…