Reconstructing high-fidelity surfaces from multi-view images, especially with sparse inputs, is a critical challenge in computer vision. This task is essential for various applications, including autonomous driving, robotics, and virtual reality, where accurate 3D models are necessary for effective decision-making and interaction with real-world environments. However, achieving this level of detail and accuracy is difficult…
IBM’s release of PowerLM-3B and PowerMoE-3B signifies a significant leap in effort to improve the efficiency and scalability of language model training. IBM has introduced these models based on innovative methodologies that address some of the key challenges researchers and developers face in training large-scale models. These models, built on top of IBM’s Power scheduler,…
End-to-end (E2E) neural networks have emerged as flexible and accurate models for multilingual automatic speech recognition (ASR). However, as the number of supported languages increases, particularly those with large character sets like Chinese, Japanese, and Korean (CJK), the output layer size grows substantially. This expansion negatively impacts compute resources, memory usage, and asset size. The…
Large Language Models (LLMs) have revolutionized software engineering, demonstrating remarkable capabilities in various coding tasks. While recent efforts have produced autonomous software agents based on LLMs for end-to-end development tasks, these systems are typically designed for specific Software Engineering (SE) tasks. Researchers from FPT Software AI Center, Viet Nam, introduce HyperAgent, a novel generalist multi-agent…
Understanding multi-page documents and news videos is a common task in human daily life. To tackle such scenarios, Multimodal Large Language Models (MLLMs) should be equipped with the ability to understand multiple images with rich visually-situated text information. However, comprehending document images is more challenging than natural images, as it requires a more fine-grained perception…
AI has seen significant progress in coding, mathematics, and reasoning tasks. These advancements are driven largely by the increased use of large language models (LLMs), essential for automating complex problem-solving tasks. These models are increasingly used to handle highly specialized and structured problems in competitive programming, mathematical proofs, and real-world coding issues. This rapid evolution…
Recent advancements in medical multimodal large language models (MLLMs) have shown significant progress in medical decision-making. However, many models, such as Med-Flamingo and LLaVA-Med, are designed for specific tasks and require large datasets and high computational resources, limiting their practicality in clinical settings. While the Mixture-of-Expert (MoE) strategy offers a solution using smaller, task-specific modules…
AI models, such as language models, need to maintain a long-term memory of their interactions to generate relevant and contextually appropriate content. One of the primary challenges in maintaining a long-term memory of their interactions is data storage and retrieval efficiency. Current language models, such as Claude, need more effective memory systems, leading to repetitive…