Using LLMs in clinical diagnostics offers a promising way to improve doctor-patient interactions. Patient history-taking is central to medical diagnosis. However, factors such as increasing patient loads, limited access to care, brief consultations, and the rapid adoption of telemedicine—accelerated by the COVID-19 pandemic—have strained this traditional practice. These challenges threaten diagnostic accuracy, underscoring the need… →
The development of multimodal large language models (MLLMs) has brought new opportunities in artificial intelligence. However, significant challenges persist in integrating visual, linguistic, and speech modalities. While many MLLMs perform well with vision and text, incorporating speech remains a hurdle. Speech, a natural medium for human interaction, plays an essential role in dialogue systems, yet… →
Enhancing user experiences and boosting retention using recommendation systems is an effective and ever-evolving strategy used by many industries, such as e-commerce, streaming services, social media, etc. These systems must analyze complex relationships between users, items, and contextual factors to suggest precisely what the user might want. However, the existing recommendation systems are static, relying… →
Conversational AI has come a long way, but one challenge persists: getting systems to engage proactively in a way that feels natural. Many AI tools either wait passively for direct prompts or overwhelm users by jumping into conversations unnecessarily. This is especially tricky in multi-party settings, where timing and relevance are everything. Striking the right… →
Artificial intelligence has come a long way, transforming the way we work, live, and interact. Yet, challenges remain. Many AI systems rely heavily on cloud-based infrastructure, which raises valid privacy concerns. Others offer limited user control, making customization a difficult task. On top of that, aligning AI behavior with specific needs is often more complicated… →
Graph generation is an important task across various fields, including molecular design and social network analysis, due to its ability to model complex relationships and structured data. Despite recent advancements, many graph generative models still rely heavily on adjacency matrix representations. While effective, these methods can be computationally demanding and often lack flexibility. This can… →
Latent diffusion models are advanced techniques for generating high-resolution images by compressing visual data into a latent space using visual tokenizers. These tokenizers reduce computational demands while retaining essential details. However, such models suffer from a critical challenge: increasing the dimensions of the token feature increases reconstruction quality but decreases image generation quality. It thus… →
GUI agents face three critical challenges in professional environments: (1) the greater complexity of professional applications compared to general-use software, requiring detailed comprehension of intricate layouts; (2) the higher resolution of professional tools, resulting in smaller target sizes and reduced grounding accuracy; and (3) the reliance on additional tools and documents, adding complexity to workflows.… →
Protein docking, the process of predicting the structure of protein-protein complexes, remains a complex challenge in computational biology. While advances like AlphaFold have transformed sequence-to-structure prediction, accurately modeling protein interactions is often complicated by conformational flexibility, where proteins undergo structural changes upon binding. For example, AlphaFold-multimer (AFm), an extension of AlphaFold, achieves a success rate… →