Machine Learning: Notes and Advice
Am I overfitting/underfitting?
Training loss much greater than validation loss. That is underfitting.
Training loss much less than validation loss. That is overfitting.
Bagging vs boosting
bagging = parallel models (ex. random forest)
boosting = sequential models (ex. haar cascade)
Good ideas to improve a model
Cluster the unlabeled training data, then add cluster features to get additional free features
Let's say you have multiple labels per data point in your training data. If labelers disagree a lot on a single data point, then that sample is pretty crappy. If you sort your data in order of crappiness, then drop the most crappy ones first, then you'll get a small boost in model performance.
Even smart is to penalize the model less if it misclassifies a crappier data point.
Precision vs Recall (w/ statistics translation)