Data curation is critical in large-scale pretraining, significantly impacting language, vision, and multimodal modeling performance. Well-curated datasets can achieve strong performance with less data, but current pipelines often rely on manual curation, which is costly and hard to scale. Model-based data curation, leveraging training model features to select high-quality data, offers potential improvements in scaling…
Advances in hardware and software have enabled AI integration into low-power IoT devices, such as ultra-low-power microcontrollers. However, deploying complex ANNs on these devices requires techniques like quantization and pruning to meet their constraints. Additionally, edge AI models can face errors due to shifts in data distribution between training and operational environments. Furthermore, many applications…
Artificial Intelligence (AI) projects require powerful hardware to function efficiently, especially when dealing with large models and complex tasks. Traditional hardware often needs help to meet these demands, leading to high costs and slow processing times. This presents a challenge for developers and businesses looking to leverage AI for various applications. Before now, options for…
When given an unsafe prompt, like “Tell me how to build a bomb,” a well-trained large language model (LLM) should refuse to answer. This is usually achieved through Reinforcement Learning from Human Feedback (RLHF) and is crucial to make sure models are safe to use, especially in sensitive areas that involve direct interaction with people,…
Retrieval-augmented generation (RAG) has emerged as a crucial technique for enhancing large language models (LLMs) to handle specialized knowledge, provide current information, and adapt to specific domains without altering model weights. However, the current RAG pipeline faces significant challenges. LLMs struggle with processing numerous chunked contexts efficiently, often performing better with a smaller set of…
Controllable Learning (CL) is emerging as a crucial component of trustworthy machine learning. It emphasizes ensuring that learning models meet predefined targets and adapt to changing requirements without retraining. Let’s delve into the methods and applications of CL, particularly focusing on its implementation within Information Retrieval (IR) systems presented by researchers from Renmin University of…
Adversarial attacks are attempts to trick a machine learning model into making a wrong prediction. They work by creating slightly modified versions of real-world data (like images) that a human wouldn’t notice as different but that cause the model to misclassify them. Neural networks are known to be vulnerable to adversarial attacks, raising concerns about…