Skip to content

Ensemble Inference

Reference Workflow Step 8 for the base command. This guide focuses on how to assemble folds/models and control aggregation once you want more accuracy than a single checkpoint provides.


Basic Usage

ml-inference --ensemble   runs/my_dataset_fold_0/weights/best.pt   runs/my_dataset_fold_1/weights/best.pt   runs/my_dataset_fold_2/weights/best.pt

Predictions from each checkpoint are combined according to the aggregation strategy defined in the config or CLI.


Aggregation Strategies

Strategy Description When to use
soft_voting Average probabilities/logits Default; smooth and stable
hard_voting Majority vote on predicted labels Use when models have similar accuracy
weighted Weighted average Give stronger folds higher influence

To supply weights:

inference:
  strategy: 'ensemble'
  ensemble:
    checkpoints:
      - 'runs/my_dataset_fold_0/weights/best.pt'
      - 'runs/my_dataset_fold_1/weights/best.pt'
      - 'runs/my_dataset_fold_2/weights/best.pt'
    aggregation: 'weighted'
    weights: [0.4, 0.35, 0.25]


Selecting Members

  • Prefer checkpoints trained on different folds or architectures to maximise diversity.
  • Ensure all checkpoints were trained with the same class ordering and preprocessing.
  • Keep a record of each model’s validation metrics so weights reflect performance.

Performance Considerations

  • Runtime scales roughly with the number of checkpoints (5 models ≈ 5× slower).
  • To manage compute, start with the top-2 folds and expand only if accuracy gains justify the cost.
  • Combine with TTA only for final evaluations (ml-inference --ensemble ... --tta).

Troubleshooting

Issue Fix
Shape mismatch One checkpoint was trained with different num_classes; exclude it
Accuracy decreases Remove underperforming models or adjust weights
Memory pressure Run ensemble in mixed precision or evaluate in batches

After validating ensemble performance, proceed to export the strongest checkpoint(s) via Model Export.