Ensemble Inference¶
Reference Workflow Step 8 for the base command. This guide focuses on how to assemble folds/models and control aggregation once you want more accuracy than a single checkpoint provides.
Basic Usage¶
ml-inference --ensemble runs/my_dataset_fold_0/weights/best.pt runs/my_dataset_fold_1/weights/best.pt runs/my_dataset_fold_2/weights/best.pt
Predictions from each checkpoint are combined according to the aggregation strategy defined in the config or CLI.
Aggregation Strategies¶
| Strategy | Description | When to use |
|---|---|---|
soft_voting |
Average probabilities/logits | Default; smooth and stable |
hard_voting |
Majority vote on predicted labels | Use when models have similar accuracy |
weighted |
Weighted average | Give stronger folds higher influence |
To supply weights:
inference:
strategy: 'ensemble'
ensemble:
checkpoints:
- 'runs/my_dataset_fold_0/weights/best.pt'
- 'runs/my_dataset_fold_1/weights/best.pt'
- 'runs/my_dataset_fold_2/weights/best.pt'
aggregation: 'weighted'
weights: [0.4, 0.35, 0.25]
Selecting Members¶
- Prefer checkpoints trained on different folds or architectures to maximise diversity.
- Ensure all checkpoints were trained with the same class ordering and preprocessing.
- Keep a record of each model’s validation metrics so weights reflect performance.
Performance Considerations¶
- Runtime scales roughly with the number of checkpoints (5 models ≈ 5× slower).
- To manage compute, start with the top-2 folds and expand only if accuracy gains justify the cost.
- Combine with TTA only for final evaluations (
ml-inference --ensemble ... --tta).
Troubleshooting¶
| Issue | Fix |
|---|---|
| Shape mismatch | One checkpoint was trained with different num_classes; exclude it |
| Accuracy decreases | Remove underperforming models or adjust weights |
| Memory pressure | Run ensemble in mixed precision or evaluate in batches |
After validating ensemble performance, proceed to export the strongest checkpoint(s) via Model Export.