As the world is evolving towards a personal digital experience, recommendation systems, while being a must, from e-commerce to media streaming, fail to simulate users’ preferences to make better recommendations. Conventional models do not capture the subtlety of reasons behind user-item interactions thus generalized recommendations are presented. With such restrictions on the limited rationale, large…
The rise of large language models has been accompanied by significant challenges, particularly around ensuring the factuality of generated responses. One persistent issue is that these models can produce outputs that are factually incorrect or even misleading, a phenomenon often called “hallucination.” These hallucinations occur when models generate confident-sounding but incorrect or unverifiable information. Given…
Transformer-based architectures have revolutionized natural language processing, delivering exceptional performance across diverse language modeling tasks. However, they still face major challenges when handling long-context sequences. The self-attention mechanism in Transformers suffers from quadratic computational complexity, and their memory requirement grows linearly with context length during inference. These factors impose practical constraints on sequence length due…
Understanding and analyzing long videos has been a significant challenge in AI, primarily due to the vast amount of data and computational resources required. Traditional Multimodal Large Language Models (MLLMs) struggle to process extensive video content because of limited context length. This challenge is especially evident with hour-long videos, which need hundreds of thousands of…
In recent years, text-to-speech (TTS) technology has made significant strides, yet numerous challenges still remain. Autoregressive (AR) systems, while offering diverse prosody, tend to suffer from robustness issues and slow inference speeds. Non-autoregressive (NAR) models, on the other hand, require explicit alignment between text and speech during training, which can lead to unnatural results. The…
Machine learning for predictive modeling aims to forecast outcomes based on input data accurately. One of the primary challenges in this field is “domain adaptation,” which addresses differences between training and application scenarios, especially when models face new, varied conditions after training. This challenge is significant for tabular finance, healthcare, and social sciences datasets, where…
Messenger RNA (mRNA) plays a crucial role in protein synthesis, translating genetic information into proteins via a process that involves sequences of nucleotides called codons. However, current language models used for biological sequences, especially mRNA, fail to capture the hierarchical structure of mRNA codons. This limitation leads to suboptimal performance when predicting properties or generating…
Theory of Mind (ToM) capabilities – the ability to attribute mental states and predict behaviors of others – have become increasingly critical as Large Language Models (LLMs) become more integrated into human interactions and decision-making processes. While humans naturally infer others’ knowledge, anticipate actions, and expect rational behaviors, replicating these sophisticated social reasoning abilities in…