Harmony, unified training and inference
Harmony is Adaptive Engine's unified compute backend for reinforcement learning: one engine for the whole loop, no separate stacks.
The RL training loop
A single step runs three phases: rollout, train, and optim.
Rollout
Assigning a score
Several methods turn a score into a training signal. Here we illustrate GRPO.
Train
Optim
Parallel training
Scaling further
Production runs combine more than data parallelism. Tensor and pipeline parallelism split a model too large for one GPU across many; asynchronous rollout drops the barrier between generating and training. The Ultra-Scale Playbook covers these in depth.
Harmony
Rollout is inference. Train and optim are the opposite: gradient updates. Different jobs, historically different tools.
Running them as separate systems means two copies of the model, wired together and kept in sync as the weights change. That orchestration is most of the work.
Harmony removes the boundary.
The same weights serve both rollout and training, which opens optimizations a split pipeline can't reach.
It all fits in one file. A recipe is that loop:
Further reading
- Recipes: the whole loop as a single file.
- GRPO, simply explained: the scoring method shown above.
- RL Glossary: every term here, in depth.





