AI News — Страница 81

Agent-Based Debugging Gets a Cost-Effective Alternative: Salesforce AI Presents SWERank for Accurate and Scalable Software Issue Localization

14 мая, 2025

Identifying the exact location of a software issue—such as a bug or feature request—remains one of the most labor-intensive tasks in the development lifecycle. Despite advances in automated patch generation and code assistants, the process of pinpointing where in the codebase a change is needed often consumes more time than determining how to fix it.…

Read more →

This AI Paper Investigates Test-Time Scaling of English-Centric RLMs for Enhanced Multilingual Reasoning and Domain Generalization

14 мая, 2025

Reasoning language models, or RLMs, are increasingly used to simulate step-by-step problem-solving by generating long, structured reasoning chains. These models break down complex questions into simpler parts and build logical steps to reach answers. This chain-of-thought (CoT) approach has proven effective in improving output quality, especially in mathematical and logical tasks. Despite multilingual capabilities in…

Read more →

Rethinking Toxic Data in LLM Pretraining: A Co-Design Approach for Improved Steerability and Detoxification

14 мая, 2025

In the pretraining of LLMs, the quality of training data is crucial in determining model performance. A common strategy involves filtering out toxic content from the training corpus to minimize harmful outputs. While this approach aligns with the principle that neural networks reflect their training data, it introduces a tradeoff. Removing toxic content can reduce…

Read more →

PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the Enterprise

14 мая, 2025

In its latest executive guide, “Agentic AI – The New Frontier in GenAI,” PwC presents a strategic approach for what it defines as the next pivotal evolution in enterprise automation: Agentic Artificial Intelligence. These systems, capable of autonomous decision-making and context-aware interactions, are poised to reconfigure how organizations operate—shifting from traditional software models to orchestrated…

Read more →

Reinforcement Learning, Not Fine-Tuning: Nemotron-Tool-N1 Trains LLMs to Use Tools with Minimal Supervision and Maximum Generalization

13 мая, 2025

Equipping LLMs with external tools or functions has become popular, showing great performance across diverse domains. Existing research depends on synthesizing large volumes of tool-use trajectories through advanced language models and SFT to enhance LLMs’ tool-calling capability. The critical limitation lies in the synthetic datasets’ inability to capture explicit reasoning steps, resulting in superficial tool…

Read more →

A Step-by-Step Guide to Deploy a Fully Integrated Firecrawl-Powered MCP Server on Claude Desktop with Smithery and VeryaX

13 мая, 2025

In this tutorial, we will learn how to deploy a fully functional Model Context Protocol (MCP) server using smithery as the configuration framework and VeryaX as the runtime orchestrator. We’ll walk through installing and configuring smithery to define your MCP endpoints, then leverage VeryaX to spin up and manage the server processes. Finally, we’ll integrate…

Read more →

Implementing an LLM Agent with Tool Access Using MCP-Use

13 мая, 2025

MCP-Use is an open-source library that lets you connect any LLM to any MCP server, giving your agents tool access like web browsing, file operations, and more — all without relying on closed-source clients. In this tutorial, we’ll use langchain-groq and MCP-Use’s built-in conversation memory to build a simple chatbot that can interact with tools…

Read more →

RL^V: Unifying Reasoning and Verification in Language Models through Value-Free Reinforcement Learning

13 мая, 2025

LLMs have gained outstanding reasoning capabilities through reinforcement learning (RL) on correctness rewards. Modern RL algorithms for LLMs, including GRPO, VinePPO, and Leave-one-out PPO, have moved away from traditional PPO approaches by eliminating the learned value function network in favor of empirically estimated returns. This reduces computational demands and GPU memory consumption, making RL training…

Read more →

OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in Healthcare

13 мая, 2025

OpenAI has released HealthBench, an open-source evaluation framework designed to measure the performance and safety of large language models (LLMs) in realistic healthcare scenarios. Developed in collaboration with 262 physicians across 60 countries and 26 medical specialties, HealthBench addresses the limitations of existing benchmarks by focusing on real-world applicability, expert validation, and diagnostic coverage. Addressing…

Read more →

Offline Video-LLMs Can Now Understand Real-Time Streams: Apple Researchers Introduce StreamBridge to Enable Multi-Turn and Proactive Video Understanding

13 мая, 2025

Video-LLMs process whole pre-recorded videos at once. However, applications like robotics and autonomous driving need causal perception and interpretation of visual information online. This fundamental mismatch shows a limitation of current Video-LLMs, as they are not naturally designed to operate in streaming scenarios where timely understanding and responsiveness are paramount. The transition from offline to…

Read more →