{"id":"b50de1d8-b96e-485c-8405-c7f4a541d152","shortId":"sZ5b4Q","kind":"skill","title":"Train agent policies with rLLM reinforcement learning","tagline":"Use rLLM to evaluate, trace, reward, and train LLM agents with reinforcement learning across common agent frameworks.","description":"# Train agent policies with rLLM reinforcement learning\n\nUse rLLM to evaluate, trace, reward, and train LLM agents with reinforcement learning across common agent frameworks.\n\n## Prerequisites\n\nPython 3.11 or newer, rLLM, agent code or benchmark task, reward/evaluator function, optional Tinker or verl training backend\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- uv pip install \"rllm @ git+https://github.com/rllm-org/rllm.git\"\n- uv pip install rllm[verl] @ git+https://github.com/rllm-org/rllm.git\n\nRequirements and caveats from upstream:\n- rLLM requires Python >= 3.11. You can install it either directly via pip or build from source.\n- For building from source or Docker, see the [installation guide](https://docs.rllm-project.com/installation).\n- ### Option B: Python API\n\nBasic usage or getting-started notes:\n- bash\n- this installs dependencies for running rllm cli, which uses Tinker as the training backend.\n- To use verl as the training backend (GPU machine required), install via\n\n- Source: https://github.com/rllm-org/rllm\n- Extracted from upstream docs: https://raw.githubusercontent.com/rllm-org/rllm/HEAD/README.md\n\n## Documentation\n\n- https://docs.rllm-project.com\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/train-agent-policies-with-rllm-reinforcement-learning/)","tags":["train","agent","policies","with","rllm","reinforcement","learning","skills","agentskillexchange","agent-skills","ai-agents","ai-tools"],"capabilities":["skill","source-agentskillexchange","skill-train-agent-policies-with-rllm-reinforcement-learning","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/train-agent-policies-with-rllm-reinforcement-learning","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,357 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:12:53.210Z","embedding":null,"createdAt":"2026-05-18T13:20:00.151Z","updatedAt":"2026-05-18T19:12:53.210Z","lastSeenAt":"2026-05-18T19:12:53.210Z","tsv":"'/installation).':130 '/rllm-org/rllm':172 '/rllm-org/rllm.git':87,96 '/rllm-org/rllm/head/readme.md':179 '/skills/train-agent-policies-with-rllm-reinforcement-learning/)':188 '3.11':51,105 'across':21,45 'agent':2,17,23,26,41,47,55,183 'agentskillexchange.com':187 'agentskillexchange.com/skills/train-agent-policies-with-rllm-reinforcement-learning/)':186 'api':134 'b':132 'backend':67,156,163 'bash':142 'basic':135 'benchmark':58 'build':115,119 'caveat':99 'cli':149 'code':56 'common':22,46 'depend':145 'direct':111 'doc':176 'docker':123 'docs.rllm-project.com':129,181 'docs.rllm-project.com/installation).':128 'document':180 'either':110 'environ':79 'evalu':11,35 'exchang':185 'extract':173 'framework':24,48 'function':61 'get':139 'getting-start':138 'git':84,93 'github.com':86,95,171 'github.com/rllm-org/rllm':170 'github.com/rllm-org/rllm.git':85,94 'gpu':164 'guid':127 'instal':68,72,82,90,108,126,144,167 'learn':7,20,31,44 'llm':16,40 'machin':165 'match':77 'newer':53 'note':141 'option':62,131 'path':75 'pip':81,89,113 'polici':3,27 'prerequisit':49 'python':50,104,133 'raw.githubusercontent.com':178 'raw.githubusercontent.com/rllm-org/rllm/head/readme.md':177 'reinforc':6,19,30,43 'requir':97,103,166 'reward':13,37 'reward/evaluator':60 'rllm':5,9,29,33,54,83,91,102,148 'run':147 'see':124 'setup':74 'skill':184 'skill-train-agent-policies-with-rllm-reinforcement-learning' 'sourc':117,121,169,182 'source-agentskillexchange' 'start':140 'task':59 'tinker':63,152 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'trace':12,36 'train':1,15,25,39,66,155,162 'upstream':71,101,175 'usag':136 'use':8,32,69,151,158 'uv':80,88 'verl':65,92,159 'via':112,168","prices":[{"id":"23379f77-22f5-415b-9347-b58812b0a4ab","listingId":"b50de1d8-b96e-485c-8405-c7f4a541d152","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:20:00.151Z"}],"sources":[{"listingId":"b50de1d8-b96e-485c-8405-c7f4a541d152","source":"github","sourceId":"agentskillexchange/skills/train-agent-policies-with-rllm-reinforcement-learning","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/train-agent-policies-with-rllm-reinforcement-learning","isPrimary":false,"firstSeenAt":"2026-05-18T13:20:00.151Z","lastSeenAt":"2026-05-18T19:12:53.210Z"}],"details":{"listingId":"b50de1d8-b96e-485c-8405-c7f4a541d152","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"train-agent-policies-with-rllm-reinforcement-learning","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"60d858dcafb9ef49b0d88cea5ef2dc88d2d9dd0f","skill_md_path":"skills/train-agent-policies-with-rllm-reinforcement-learning/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/train-agent-policies-with-rllm-reinforcement-learning"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Train agent policies with rLLM reinforcement learning","description":"Use rLLM to evaluate, trace, reward, and train LLM agents with reinforcement learning across common agent frameworks."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/train-agent-policies-with-rllm-reinforcement-learning"},"updatedAt":"2026-05-18T19:12:53.210Z"}}