본문 바로가기

ML&DL/Dive into Deep Learning

[3.6.5] Dive into Deep Learning : exercise answers

728x90
반응형
 

3.6. Generalization — Dive into Deep Learning 1.0.3 documentation

 

d2l.ai

[1]

Although the algorithm alows to plot non linear data, the problem of polynomial regression is that it has the vice of overfitting. Especially when there are n training data and we choose to fit the hypothesis function's degree to n. Since theoretically there is a perfect n degree function that passess all n points. Which will result in 0 loss on training set, but huge disappointment in validation. 


[2]

1. test socre = h(study hour, academy hour)

2. age = h(height, weight, gender)

3. GPA = h(math abillity , english abillity, science abillity, IQ)

4. I want to 


[3]

If we are using linear regression as for degree 1, the loss will only be zero if the training data is 'completely linear', all the data is on a straight line, and has gone through sufficient epochs there might be a chance for zero training error. Or just overfit the model by for n training examples, set the degree equal to n. 

 

Thus we can see zero generalization error when the test data exactly follows the distribution of training data. However this is very rare and if even it really happens, we must chech for overfitting.


[4]

The K-fold-cross-validation, when there are K subsets, has to go through K(K-1) iterations. And this is opperation cost is much bigger compared to the original method where we just split the data to training/validation and calculate it only once.


[5]

There are a couple of reasons this method can be biased.

  1. If the dataset is splitted into inherent patterns
  2. This technique is used in a small dataset thus there might not be enough labels. For example if we are to esetimate test score(target variable) from study hour(feature variable) which the test score varies from 0 to 100. If there are insufficient data in a particular range of score(i.e. 80 ~ 100), this can lead to biased training.

[6]

VC dimension is not for regression.


[7]

There might be a problem of Multicollinearity in the difficult dataset. Thus we can  merge or erase some features from the dataset and therefore we've obtained a more simpler dataset.

728x90
반응형