Blog posts

June 9, 2026

Harmony, unified training and inference

Research
Authors
Dylan Ebert
Editors
No items found.
Acknowledgements

Harmony, unified training and inference

Harmony is Adaptive Engine's unified compute backend for reinforcement learning: one engine for the whole loop, no separate stacks.

The RL training loop

A single step runs three phases: rollout, train, and optim.

Rollout

Assigning a score

Several methods turn a score into a training signal. Here we illustrate GRPO.

Train

Optim

Parallel training

Scaling further

Production runs combine more than data parallelism. Tensor and pipeline parallelism split a model too large for one GPU across many; asynchronous rollout drops the barrier between generating and training. The Ultra-Scale Playbook covers these in depth.

Harmony

Rollout is inference. Train and optim are the opposite: gradient updates. Different jobs, historically different tools.

Running them as separate systems means two copies of the model, wired together and kept in sync as the weights change. That orchestration is most of the work.

Harmony removes the boundary.

The same weights serve both rollout and training, which opens optimizations a split pipeline can't reach.

It all fits in one file. A recipe is that loop:

Further reading

Copyright © 2026

Adaptive ML, Inc.
All rights reserved
Privacy Policy