Electronic Health Records (EHRs) present a wealth of information, combining structured tabular data and unstructured clinical notes. This valuable resource forms the foundation for training clinical decision support systems and automating diagnosis and treatment planning processes. While large language models (LLMs) can utilize unstructured text, they lack interpretability, an important factor in high-risk clinical applications.…
The field of spoken dialogue systems has evolved significantly over the years, moving beyond simple voice-based interfaces to complex models capable of sustaining real-time conversations. Early systems such as Siri, Alexa, and Google Assistant pioneered voice-activated interactions, allowing users to trigger specific actions through voice commands. These systems, while groundbreaking, were limited to basic tasks…
Data-Free Knowledge Distillation (DFKD) methods transfer knowledge from teacher to student models without real data, using synthetic data generation. Non-adversarial approaches employ heuristics to create data resembling the original, while adversarial methods utilize adversarial learning to explore distribution spaces. One-Shot Federated Learning (FL) addresses communication and security challenges in standard FL setups, enabling collaborative model…
Collaborative perception has become a critical area of research in autonomous driving and robotics. In these fields, agents—such as vehicles or robots—must work together to understand their environment more accurately and efficiently. By sharing sensory data among multiple agents, the accuracy and depth of environmental perception are enhanced, leading to safer and more reliable systems.…
Large Language Models (LLMs) have demonstrated impressive performance in tasks like Natural Language Processing, generation, and text synthesis. However, they still encounter major difficulties in more complicated circumstances. These are assignments that call for using tools to solve problems, dealing with structured data, or carrying out complex multi-step reasoning. For instance, although LLMs are adept…
Mistral AI recently announced the release of Mistral-Small-Instruct-2409, a new open-source large language model (LLM) designed to address critical challenges in artificial intelligence research and application. This development has generated significant excitement in the AI community, as it promises to enhance the performance of AI systems, improve accessibility to cutting-edge models, and offer new possibilities…
Artificial intelligence (AI) and natural language processing (NLP) have seen significant advancements in recent years, particularly in the development and deployment of large language models (LLMs). These models are essential for various tasks, such as text generation, question answering, and document summarization. However, while LLMs have demonstrated remarkable capabilities, they encounter limitations when processing long…
Early attempts in 3D generation focused on single-view reconstruction using category-specific models. Recent advancements utilize pre-trained image and video generators, particularly diffusion models, to enable open-domain generation. Fine-tuning on multi-view datasets improved results, but challenges persisted in generating complex compositions and interactions. Efforts to enhance compositionality in image generative models faced difficulties in transferring techniques…
Microscopic imaging is crucial in modern medicine as an indispensable tool for researchers and clinicians. This imaging technology allows detailed examination of biological structures at the cellular and molecular levels, enabling the study of tissue samples in disease diagnosis and pathology. By capturing these microscopic images, medical professionals can better understand disease mechanisms and progression,…
A major challenge in the field of Speech-Language Models (SLMs) is the lack of comprehensive evaluation metrics that go beyond basic textual content modeling. While SLMs have shown significant progress in generating coherent and grammatically correct speech, their ability to model acoustic features such as emotion, background noise, and speaker identity remains underexplored. Evaluating these…