[1]
$\lambda$ can be any positive real number. However, to maintain the purpose of weight decay we should follow $\lambda \eta <1 $. Since the learning rate is 0.01, we can try $\lambda$ for 0 ~ 99
data = Data(num_train=100, num_val=100, num_inputs=200, batch_size=20)
trainer = d2l.Trainer(max_epochs=10)
test_lambds=[*range(100)]
board = d2l.ProgressBoard('lambda')
def accuracy(y_hat, y):
return (1 - ((y_hat - y).mean() / y.mean()).abs()) * 100
def train_ex1(lambd):
model = WeightDecay(wd=lambd, lr=0.01)
model.board.yscale='log'
trainer.fit(model, data)
y_hat = model.forward(data.X)
acc_train = accuracy(y_hat[:data.num_train], data.y[:data.num_train])
acc_val = accuracy(y_hat[data.num_train:], data.y[data.num_train:])
return acc_train, acc_val
for item in test_lambds:
acc_train, acc_val = train_ex1(item)
board.draw(item, acc_train.item(), 'acc_train', every_n=1)
board.draw(item, acc_val.item(), 'acc_val', every_n=1)
[2]
By the result picture above, even if we find the optimal $\lambda$ it really doesn't matters since there are so many top accuracy points over 0~99.
[3]
$$(1-\eta \lambda)w \to w-\eta \lambda$$
[4]
$$\left\| \textbf{X}\right\|_{F}=\sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n}x_{ij}^{2}}$$
[5]
training error : Represents the model's performance in a real number scale. It is reused for parameter modification. We ideally want this error to be low by learning through the training data.
generalization error : Hard to get the exact score for this since we can't access all the unseen data in the universe. For practical perposes, generalization error is more important than training error.
Low training error doesn't guaruntee low generalization error, which can lead to overfitting. To prevent this we use validation data, regularization.
[6]
If we use log and solve the equation for minimizing $P(w|x)$, the $P(w)$ will act as a weigth decay.
'ML&DL > Dive into Deep Learning' 카테고리의 다른 글
[4.2.5] Dive into Deep Learning : exercise answers (0) | 2024.02.29 |
---|---|
[4.1.5] Dive into Deep Learning : exercise answers (1) | 2024.02.25 |
[3.6.5] Dive into Deep Learning : exercise answers (0) | 2024.02.15 |
[3.5.6] Dive into Deep Learning : exercise answers (1) | 2024.02.12 |
[3.4.6] Dive into Deep Learning : exercise answers (1) | 2024.02.11 |