Together AI Drops Largest Open Dataset for Training Coding Agents

Rongchai Wang Feb 04, 2026 16:23

TogetherCoder-Preview releases 161K verified coding trajectories achieving 59.4% on SWE-Bench, giving developers unprecedented training data for AI agents.

Together AI Drops Largest Open Dataset for Training Coding Agents

Together AI has released TogetherCoder-Preview, a dataset containing 161,703 test-verified coding agent trajectories that scored 59.4% on SWE-Bench Verified. The company claims it's the largest open dataset available for training coding agents.

The numbers here are substantial. Across 54,110 tasks spanning 1,639 repositories, the dataset includes 274,548 total trajectories—with roughly 59% successfully completing their assigned coding challenges.

Three sources feed into the combined dataset. SWE-Smith contributes the bulk with 158,252 trajectories across 39,465 tasks from 130 repositories, of which 92,847 proved successful. SWE-Rebench adds 79,557 trajectories covering 10,070 tasks across 1,499 repos, with 45,888 successes. R2E-Gym rounds things out with 36,851 trajectories from 4,575 tasks, showing 22,968 successful completions across just 10 repositories.

Why does this matter? Training effective coding agents has been bottlenecked by data quality and availability. Most teams either generate synthetic data (hit or miss) or rely on proprietary datasets they can't share. Having 161K verified working examples changes the calculus for smaller teams and researchers who couldn't otherwise compete.

The timing aligns with accelerating competition in AI coding tools. Just last week, the ACP Agent Registry went live, enabling developers to find and connect AI coding agents directly within JetBrains IDEs. Together AI itself has been building infrastructure around this space, including sandboxed code execution environments that let agents safely iterate on solutions.

The 59.4% SWE-Bench Verified score provides a concrete benchmark. SWE-Bench tests whether AI systems can resolve real GitHub issues from popular Python repositories—not toy problems, but actual bugs and feature requests that human developers tackled. Breaking 50% on the verified subset represents meaningful capability.

For teams building coding assistants or autonomous development tools, this dataset offers a shortcut past months of data collection. The repository diversity—from 10 repos in R2E-Gym to nearly 1,500 in SWE-Rebench—should help models generalize across different codebases rather than overfitting to specific project patterns.

Together AI hasn't announced pricing changes or new products alongside the release. The dataset appears positioned as a community resource, though it obviously showcases the company's capabilities in generating high-quality training data at scale.

Image source: Shutterstock