Hyperparameter Tuning Guide¶
Refer to Workflow Step 5 for the end-to-end flow. This page captures tuning patterns, search-space design tips, and troubleshooting once you move beyond the basics.
Manual Exploration (Quick Iterations)¶
- Override a single parameter per run to isolate impact:
- Record run metadata (command, git commit, seed) alongside results.
- Compare runs with TensorBoard (
tensorboard --logdir runs/) or by inspectingsummary.txtin each run directory.
When manual sweeps become tedious, switch to automated search.
Automated Search with Optuna¶
- Install extras (once):
uv pip install -e "[optuna]". - Generate a config that includes a
searchblock (ml-init-config data/<dataset> --optuna --yes). - Launch trials:
- Review results:
- Train with the exported best configuration:
Resume an existing study with ml-search --config ... --resume.
Designing the Search Space¶
| Field | Example | Notes |
|---|---|---|
type: categorical |
Architectures, optimiser choices | Explicit list of options |
type: uniform |
Continuous range (linear) | Good for momentum, dropout |
type: loguniform |
Exponential range | Ideal for learning rates, weight decay |
type: int |
Discrete integers | Epochs, scheduler steps |
Keep the space focused—start with the parameters that historically move the needle (LR, batch size, scheduler) before adding architecture choices.
Sampler/Pruner defaults: TPESampler + MedianPruner cover most cases. Adjust n_startup_trials and n_warmup_steps if trials are pruned too aggressively.
Best Practices¶
- Start small: run 10–20 trials first; scale up once you see promising regions.
- Use pruning: saves compute by terminating weak trials early.
- Log trial context: Optuna stores trial params; export
best_config.yamlto freeze winning settings. - Parallelise thoughtfully: point
search.storageto a shared SQLite/PostgreSQL DB before running multiple workers. - Cross-validation search: Enable
search.cross_validationfor small datasets when variance across folds is high (slower but robust).
Troubleshooting¶
| Issue | Resolution |
|---|---|
| All trials pruned early | Reduce pruner strictness (n_warmup_steps), widen search space, or increase max epochs per trial |
| Best trial worse than baseline | Increase n_trials, ensure baseline config lies within the search space, or refine sampler/pruner settings |
| Trials crash with CUDA OOM | Restrict batch-size choices; consider mixed precision during search |
| Search too slow | Lower trial epochs, use pruning, or run workers in parallel |
Still stuck? Consult the Troubleshooting reference or Optuna’s documentation for sampler-specific guidance.