Blog posts

May 11, 2026

Introducing Recipes

Research
Authors
Dylan Ebert
Editors
No items found.
Acknowledgements

Introducing Recipes

A recipe is a Python file that defines an AI workflow on Harmony, Adaptive Engine's compute backend.

Reinforcement learning has many moving parts: training, inference, graders. They typically run in separate systems. Most of the work in an RL pipeline is the coordination between them.

In Harmony, all of this can live in one file.

A recipe has three pieces:

class MyConfig(InputConfig):
    ...

@recipe_main
async def main(config: MyConfig, ctx: RecipeContext):
    # load resources, run the workflow, save
    ...

Built-in recipes ship with Adaptive Engine: supervised fine-tuning, RL with a grader, RL on preferences, evaluation, speculative decoding draft alignment. Each wraps training, data loading, and monitoring into one workflow. Custom recipes use the same primitives.

InputConfig

The recipe's inputs, declared as a Pydantic model.

class SummarizationGRPO(InputConfig):
    model: Annotated[Model[model_kinds.Trainable], Field(description="Policy")]
    dataset: Annotated[Dataset[dataset_kinds.Prompt], Field(description="Prompts")]
    grader: Annotated[Grader, Field(description="Reward")]
    learning_rate: Annotated[float, Field(default=7.5e-7)] = 7.5e-7

Model, Dataset, and Grader are engine-aware: they reference models, datasets, and graders registered on Adaptive Engine.

@recipe_main

Marks the recipe's async entrypoint.

@recipe_main
async def main(config: SummarizationGRPO, ctx: RecipeContext):
    ...

Harmony parses input arguments, instantiates the typed config, and calls the function.

RecipeContext

The recipe's connection to Harmony.

dataset = await config.dataset.load(ctx)
grader = await config.grader.load(ctx)
policy = await config.model.spawn_train("policy", ctx, max_batch_size=10000, tp=4)
reference = await policy.clone_inf()

for prompt in dataset:
    samples = await async_map(policy.generate_tokens, [prompt for _ in range(8)])
    texts = await async_map(policy.detokenize_thread, samples)
    grades = await async_map(grader.grade, texts)
    scores = np.array([g.value for g in grades])
    advantages = (scores - scores.mean()) / (scores.std() + 1e-8)
    for sample, adv in zip(samples, advantages):
        lp = await policy.logprobs_per_token(sample)
        ref_lp = await reference.logprobs_per_token(sample)
        await policy.train_grpo(sample, lp, ref_lp, [adv] * len(lp), clip_range=0.1, kl_beta=0.01)
    await policy.optim_step(config.learning_rate, wd=0.0, max_grad_norm=1.0)

await policy.save(model_name="policy-v1", ctx=ctx)

RecipeContext flows into every Harmony call. Policy, reference (one-line clone_inf), and grader (possibly a large judge LLM) all live on the same GPUs in Harmony's unified architecture. The full GRPO loop, in one file.

That's a recipe. Built-in or custom, all run on Harmony. See the docs for more.

Copyright © 2026

Adaptive ML, Inc.
All rights reserved