[3.7.6] Dive into Deep Learning : exercise answers

728x90

3.7. Weight Decay — Dive into Deep Learning 1.0.3 documentation

d2l.ai

[1]

$\lambda$ can be any positive real number. However, to maintain the purpose of weight decay we should follow $\lambda \eta <1 $. Since the learning rate is 0.01, we can try $\lambda$ for 0 ~ 99

data = Data(num_train=100, num_val=100, num_inputs=200, batch_size=20)
trainer = d2l.Trainer(max_epochs=10)
test_lambds=[*range(100)]
board = d2l.ProgressBoard('lambda')

def accuracy(y_hat, y):
    return (1 - ((y_hat - y).mean() / y.mean()).abs()) * 100

def train_ex1(lambd):    
    model = WeightDecay(wd=lambd, lr=0.01)
    model.board.yscale='log'
    trainer.fit(model, data)
    y_hat = model.forward(data.X)
    acc_train = accuracy(y_hat[:data.num_train], data.y[:data.num_train])
    acc_val = accuracy(y_hat[data.num_train:], data.y[data.num_train:])
    return acc_train, acc_val

for item in test_lambds:
    acc_train, acc_val = train_ex1(item)
    board.draw(item, acc_train.item(), 'acc_train', every_n=1)
    board.draw(item, acc_val.item(), 'acc_val', every_n=1)

[2]

By the result picture above, even if we find the optimal $\lambda$ it really doesn't matters since there are so many top accuracy points over 0~99.

[3]

$$(1-\eta \lambda)w \to w-\eta \lambda$$

[4]

$$\left\| \textbf{X}\right\|_{F}=\sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n}x_{ij}^{2}}$$

[5]

training error : Represents the model's performance in a real number scale. It is reused for parameter modification. We ideally want this error to be low by learning through the training data.

generalization error : Hard to get the exact score for this since we can't access all the unseen data in the universe. For practical perposes, generalization error is more important than training error.

Low training error doesn't guaruntee low generalization error, which can lead to overfitting. To prevent this we use validation data, regularization.

[6]

If we use log and solve the equation for minimizing $P(w|x)$, the $P(w)$ will act as a weigth decay.

728x90

저작자표시 비영리 변경금지

'ML&DL > Dive into Deep Learning' 카테고리의 다른 글

[4.2.5] Dive into Deep Learning : exercise answers (0)	2024.02.29
[4.1.5] Dive into Deep Learning : exercise answers (1)	2024.02.25
[3.6.5] Dive into Deep Learning : exercise answers (0)	2024.02.15
[3.5.6] Dive into Deep Learning : exercise answers (1)	2024.02.12
[3.4.6] Dive into Deep Learning : exercise answers (1)	2024.02.11

구르는감자

[3.7.6] Dive into Deep Learning : exercise answers

'ML&DL > Dive into Deep Learning' 카테고리의 다른 글

티스토리툴바

[3.7.6] Dive into Deep Learning : exercise answers

'ML&DL > Dive into Deep Learning' 카테고리의 다른 글

'ML&DL/Dive into Deep Learning' Related Articles

티스토리툴바