Introducing Recipes
A recipe is a Python file that defines an AI workflow on Harmony, Adaptive Engine's compute backend.
Reinforcement learning has many moving parts: training, inference, graders. They typically run in separate systems. Most of the work in an RL pipeline is the coordination between them.
In Harmony, all of this can live in one file.
A recipe has three pieces:
InputConfig: declares the recipe's inputs@recipe_main: marks the entrypoint functionRecipeContext: handles Harmony calls
class MyConfig(InputConfig):
...
@recipe_main
async def main(config: MyConfig, ctx: RecipeContext):
# load resources, run the workflow, save
...
Built-in recipes ship with Adaptive Engine: supervised fine-tuning, RL with a grader, RL on preferences, evaluation, speculative decoding draft alignment. Each wraps training, data loading, and monitoring into one workflow. Custom recipes use the same primitives.
InputConfig
The recipe's inputs, declared as a Pydantic model.
class SummarizationGRPO(InputConfig):
model: Annotated[Model[model_kinds.Trainable], Field(description="Policy")]
dataset: Annotated[Dataset[dataset_kinds.Prompt], Field(description="Prompts")]
grader: Annotated[Grader, Field(description="Reward")]
learning_rate: Annotated[float, Field(default=7.5e-7)] = 7.5e-7
Model, Dataset, and Grader are engine-aware: they reference models, datasets, and graders registered on Adaptive Engine.
@recipe_main
Marks the recipe's async entrypoint.
@recipe_main
async def main(config: SummarizationGRPO, ctx: RecipeContext):
...
Harmony parses input arguments, instantiates the typed config, and calls the function.
RecipeContext
The recipe's connection to Harmony.
dataset = await config.dataset.load(ctx)
grader = await config.grader.load(ctx)
policy = await config.model.spawn_train("policy", ctx, max_batch_size=10000, tp=4)
reference = await policy.clone_inf()
for prompt in dataset:
samples = await async_map(policy.generate_tokens, [prompt for _ in range(8)])
texts = await async_map(policy.detokenize_thread, samples)
grades = await async_map(grader.grade, texts)
scores = np.array([g.value for g in grades])
advantages = (scores - scores.mean()) / (scores.std() + 1e-8)
for sample, adv in zip(samples, advantages):
lp = await policy.logprobs_per_token(sample)
ref_lp = await reference.logprobs_per_token(sample)
await policy.train_grpo(sample, lp, ref_lp, [adv] * len(lp), clip_range=0.1, kl_beta=0.01)
await policy.optim_step(config.learning_rate, wd=0.0, max_grad_norm=1.0)
await policy.save(model_name="policy-v1", ctx=ctx)
RecipeContext flows into every Harmony call. Policy, reference (one-line clone_inf), and grader (possibly a large judge LLM) all live on the same GPUs in Harmony's unified architecture. The full GRPO loop, in one file.
That's a recipe. Built-in or custom, all run on Harmony. See the docs for more.





