In the previous post, we’ve learned two common problems from models “Overfitting” and “Underfitting” when validating them.
If you are not sure about this topic,
check here: Overfitting and Underfitting in terms of Model Validation
We saw the overview of the relationship between overfitting/underfitting and bias/variance in the previous post:
In this post, we will learn:
The ultimate goal of our machine learning task here is:
The whole dataset we got looks like below.
We can imagine that every gray dot in the graph represents every mouse.
Given this whole dataset, we would like to find the best model to predict the height of a mouse given its weight.
The best model will capture the true relationship curve as shown in the graph above.
For training and validating models, we need to split the dataset into training and validation dataset.
If you are not sure about why and how we split the dataset,
check here: Training, Testing and Validation Datasets in Machine Learning
We have two trained models on the training dataset as follows:
Bias is the inability to capture the true relationship during training process.
Model 1 shows high bias because it could not capture the relationship between the hight & weight from the training dataset.
Model 2 shows low bias because it could capture the relationship between the hight & weight from the training dataset.
But maybe too well!
Now it is time to validate the trained models to see if it will work well on the unseen dataset (test dataset) in the last step.
Variance refers to the differences in fits between datasets (in our case, between training and validation datasets).
Model 2 shows high variance because its fits beween training and validation datasets are quite different. It worked well on training dataset but bad on validation dataset.
Model 1 shows low variance because its fits beween training and validation datasets are quite similar. It worked ok both on training and validation datset. But not great.
Now, we can understand the relationship between overfitting/underfitting and bias/variance.
Model 1 has high bias & low variance showing underfitting.
Model 2 has low bias & high variance showing overfitting.
It is hard to find a perfect model having low bias & low variance because the two concepts have a trade-off relationship.
What we need to do is to find the sweet spot between a simple model (Model 1) and a complex model (Model 2).
Then how can we find the best model showing the optimal fitting by preventing overfitting and underfitting?
In the next post, we will learn: