Skip to content

ML Field Notes: Building Models That Work. For ML engineers who've hit the limits of tutorials. Battle-tested insights on training, debugging, and shipping models that actually work in the real world.

License

Notifications You must be signed in to change notification settings

libertininick/ml-field-notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ML Field Notes: Building Models That Work

For ML engineers who've hit the limits of tutorials. Battle-tested insights on training, debugging, and shipping models that actually work in the real world.

Field Notes

  1. Stop Using 80/20 Blindly: That 80/20 train-validation split you're using? It's probably either wasting thousands of labeled samples or giving you metrics too noisy to trust—and the fix requires thinking in absolute sample counts, not percentages.
  2. The Label Noise That Actually Kills Your Model: Perfect training labels are overrated; but perfect evaluation data is non-negotiable and systematic labeling bias can destroy your model. Here's a simple technique to root out these biases in under an hour.
  3. Which Samples Should You Label Next?: You're probably selecting the wrong samples to label—this two-step strategy shows you how to maximize model improvement per labeled example while avoiding catastrophic forgetting.

Series Overview

Your model hits 95% accuracy on the validation set, then crashes to 30% in production. Your active learning loop labels the same examples repeatedly. Your "balanced" dataset still fails on minority classes that actually matter. ML Field Notes tackles these real-world problems with battle-tested solutions from 16+ years of applied machine learning and computer vision experience across the financial, defense, and critical infrastructure industries.

Each post dissects one specific problem with a practical solution you can apply immediately; the kind of knowledge that only comes from debugging failures late into the night, optimizing models under production deadlines, and navigating the messy reality where clean datasets don't exist.

What You'll Learn

This series covers the nuanced, often overlooked aspects of building production ML systems:

  • 🚀 Production Reality: Planning for model drift, monitoring that matters, and building systems that survive deployment
  • 🏗️ Model Validation: Detecting when your model learns shortcuts, identifying systematic errors, and avoiding validation set overfitting
  • 🔍 Data & Evaluation: Why 80/20 splits fail, how much label noise you can tolerate, and what to label next
  • 📊 Practical Strategies: Active learning that works in practice, handling class imbalance pragmatically, and solving cold-start problems
  • ⚙️ Architecture Decisions: When to add more data vs. better models, understanding learning curves, and recognizing architectural limits

Who This Is For

You'll get the most value from these posts if you:

  • Ship models to production (or need to soon) and face real-world constraints
  • Debug models that work in development but fail in deployment
  • Make architecture decisions without clear guidance (more data vs. better model?)
  • Deal with messy data, label noise, and class imbalance in practice
  • Want to make better decisions faster, informed by battle-tested experience

About

ML Field Notes: Building Models That Work. For ML engineers who've hit the limits of tutorials. Battle-tested insights on training, debugging, and shipping models that actually work in the real world.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published