Traditional psychological counseling, often conducted in person, remains limited to individuals actively seeking help for psychological concerns. In contrast, online automated counseling presents a viable option for those hesitant to pursue therapy due to stigma or shame. Cognitive Behavioral Therapy (CBT), a widely practiced approach in psychological counseling, aims to help individuals identify and correct…
The automation of radiology report generation has become one of the significant areas of focus in biomedical natural language processing. This is driven by the vast and exponentially growing medical imaging data and a dependency on highly accurate diagnostic interpretation in modern health care. Advancements in artificial intelligence make image analysis combined with natural language…
Reconstructing unmeasured causal drivers of complex time series from observed response data represents a fundamental challenge across diverse scientific domains. Latent variables, including genetic regulators or environmental factors, are essential to determining a system’s dynamics but are rarely measured. Challenges with current approaches arise from data noise, the systems’ high dimensionality, and existing algorithms’ capacities…
Generative models have revolutionized fields like language, vision, and biology through their ability to learn and sample from complex data distributions. While these models benefit from scaling up during training through increased data, computational resources, and model sizes, their inference-time scaling capabilities face significant challenges. Specifically, diffusion models, which excel in generating continuous data like…
Swarm is an innovative open-source framework designed to explore the orchestration and coordination of multi-agent systems. It is developed and managed by the OpenAI Solutions team, and it provides a lightweight, ergonomic, and educational environment for developers to learn and experiment with agent-based systems. At its core, Swarm is built to facilitate the interaction of…
Vision-language models (VLMs) play a crucial role in multimodal tasks like image retrieval, captioning, and medical diagnostics by aligning visual and linguistic data. However, understanding negation in these models remains one of the main challenges. Negation is critical for nuanced applications, such as distinguishing “a room without windows” from “a room with windows.” Despite their…
One of the most significant and advanced capabilities of a multimodal large language model is long-context video modeling, which allows models to handle movies, documentaries, and live streams spanning multiple hours. However, despite the commendable advancements made in video comprehension in LLMs, including caption generation and question answering, many obstructions remain in processing extremely long…
LLMs have made significant strides in automated writing, particularly in tasks like open-domain long-form generation and topic-specific reports. Many approaches rely on Retrieval-Augmented Generation (RAG) to incorporate external information into the writing process. However, these methods often fall short due to fixed retrieval strategies, limiting the generated content’s depth, diversity, and utility—this lack of nuanced…
Scaling the size of large language models (LLMs) and their training data have now opened up emergent capabilities that allow these models to perform highly structured reasoning, logical deductions, and abstract thought. These are not incremental improvements over previous tools but mark the journey toward reaching Artificial general intelligence (AGI). Training LLMs to reason well…
Video diffusion models have emerged as powerful tools for video generation and physics simulation, showing promise in developing game engines. These generative game engines function as video generation models with action controllability, allowing them to respond to user inputs like keyboard and mouse interactions. A critical challenge in this field is scene generalization – the…