Anyscale and NovaSky Team Releases SkyRL tx v0.1.0: Bringing Tinker Compatible Reinforcement Learning RL Engine To Local GPU Clusters

Understanding the Target Audience

The target audience for the SkyRL tx v0.1.0 release primarily includes AI developers, data scientists, and machine learning engineers who are focused on reinforcement learning (RL) applications. These professionals are often working within enterprise environments where performance, scalability, and control over infrastructure are critical.

Pain Points:
/ Limited access to scalable RL solutions that can be deployed on local infrastructure
/ Challenges in integrating existing tools with proprietary systems
/ Need for faster sampling and training processes to enhance productivity
/ Desire for a unified API that simplifies the development workflow

Goals:
/ Implement efficient RL algorithms on large language models (LLMs)
/ Optimize resource utilization on local GPU clusters
/ Maintain flexibility in model training and inference processes
/ Achieve faster experimentation cycles with reliable performance metrics

Interests:
/ Innovations in AI and machine learning frameworks
/ Best practices for deploying RL solutions in production environments
/ Community-driven resources and tutorials for practical implementation

Communication Preferences:
/ Technical documentation and detailed release notes
/ Community forums and discussion groups for peer support
/ Tutorials and code examples for hands-on learning

Overview of SkyRL tx v0.1.0

The SkyRL tx v0.1.0 release enables AI teams to run Tinker-style reinforcement learning on large language models using their own infrastructure through a unified engine. This version allows developers to execute a Tinker-compatible training and inference engine directly on their hardware while maintaining the same minimal API that Tinker offers in its managed service.

Key Features of SkyRL tx

The research team describes SkyRL tx as a unified training and inference engine that implements the Tinker API, allowing users to run a Tinker-like service on their infrastructure. The v0.1.0 version is the first in a series that supports end-to-end reinforcement learning and significantly enhances sampling speed.

Tinker API Overview

The Tinker API from Thinking Machines is built around four core functions:

forward_backward: Performs a forward pass and a backward pass, accumulating gradients.
optim_step: Updates model weights based on accumulated gradients.
sample: Generates tokens for interaction, evaluation, or RL actions.
save_state: Writes checkpoints for resuming training.

SkyRL tx targets this API, implementing an open backend that users can deploy locally, thus eliminating reliance on a hosted environment.

Architecture of SkyRL tx

The architecture of SkyRL tx is designed as an inference engine that also supports backward passes. It consists of four main components:

REST API server: Processes incoming requests from different users.
Database: Tracks metadata about models, checkpoints, requests, and futures, acting as a job queue. The current implementation uses SQLite but also supports other SQL databases like Postgres.
Engine: Schedules and batches requests across users, with each instance serving a single base model and multiple LoRA adapters.
Worker: Executes forward and backward passes, holding model definitions and optimizer states. Future versions will enable advanced multi-node sharding.

What’s New in v0.1.0?

The v0.1.0 release focuses on reinforcement learning support and performance improvements. Key updates include:

Faster sampling due to jitting and proper batching and sharding.
Support for different sampling parameters per request, including seeds and stop tokens, beneficial for experiments sharing a base model.
Fixes to ensure the RL loop runs correctly through the engine.
Implementation of gradient checkpointing and micro-batching for sampling.
Support for Postgres as a database backend alongside SQLite.

Running RL End-to-End on 8 H100 GPUs

The official release includes a specific code recipe for running reinforcement learning end-to-end on a cluster with 8 H100 GPUs. Users can clone the SkyRL repository and start the engine with the following command:

uv run --extra gpu --extra tinker -m tx.tinker.api \
  --base-model Qwen/Qwen3-4B \
  --max-lora-adapters 3 \
  --max-lora-rank 1 \
  --tensor-parallel-size 8 \
  --train-micro-batch-size 8 > out.log

Next, users can clone the Tinker Cookbook and run the RL loop with:

export TINKER_API_KEY=dummy
export WANDB_API_KEY=
uv run --with wandb --with tinker rl_loop.py \
  base_url=http://localhost:8000 \
  model_name="Qwen/Qwen3-4B" \
  lora_rank=1 \
  max_length=1024 \
  save_every=100

This process produces a reward curve confirming the RL loop runs correctly through the local SkyRL tx backend.

Conclusion

SkyRL tx v0.1.0 provides a practical solution for development teams seeking Tinker-style reinforcement learning on their own clusters, maintaining a consistent Tinker API surface. The architecture effectively integrates inference and training capabilities, reducing stack divergence. With support for LoRA, gradient checkpointing, micro-batching, and Postgres, this release transforms Tinker compatibility into an actionable local RL backend for large language models.

For more information, check out the Repo and the Official Release. Explore our GitHub Page for tutorials, codes, and notebooks. Follow us on Twitter and join our community on ML SubReddit. Don’t forget to subscribe to our Newsletter and join us on Telegram.