Best Practices¶
Tips and conventions for effective use of the framework.
Configuration¶
- Start with defaults - Modify incrementally
- Version control configs - Track experiment settings
- Document changes - Note why you changed defaults
- Use meaningful names - For custom configs
- Check saved config - Verify what was actually used
Training¶
- Test quickly first - Few epochs to verify pipeline
- Monitor with TensorBoard - Watch training live
- Check GPU utilization - Should be near 100%
- Save checkpoints - Always use checkpointing
- Track experiments - Document what works
Hyperparameter Tuning¶
- One at a time - Isolate effects
- Start coarse - Wide range, then narrow
- Use TensorBoard - Visual comparison
- Be systematic - Grid or random search
- Document results - Track all experiments
Data¶
- Verify structure - Use verification script
- Check class balance - Ensure fair distribution
- Use validation set - Don't overfit to test
- Augment appropriately - Match your domain
- Inspect samples - Verify preprocessing
Code¶
- Don't modify entry points - Extend via modules
- Test changes - Before full training
- Follow conventions - Match existing patterns
- Document extensions - Help future users
- Keep backups - Before major changes
Reproducibility¶
- Set seed - Always use fixed seed
- Save config - With each run
- Document environment - Python/PyTorch versions
- Use deterministic - When exact reproduction needed
- Track hardware - GPU model affects results
Performance¶
- Maximize batch size - Within memory limits
- Use appropriate workers - 4-8 typically good
- Non-deterministic default - Faster training
- Monitor bottlenecks - CPU, GPU, or I/O
- Profile if needed - Find slow components
Deployment¶
- Use best.pt - Highest validation accuracy
- Test inference - Before deployment
- Document model - Architecture and training details
- Save transforms - Needed for inference
- Version models - Track which version deployed