본문 바로가기

ML&DL/Dive into Deep Learning

[4.3.4] Dive into Deep Learning : exercise answers

728x90
반응형
 

4.3. The Base Classification Model — Dive into Deep Learning 1.0.3 documentation

 

d2l.ai

[1]

 

$L_{v}$ denotes the 'averaged total validation loss' and $L_{v}^{d}$ denotes 'averaged validation loss of a minibatch'. By the question, we have to find the relationship between $L_{v}$ and $L_{v}^{d}$.

 

Let 

sample size $=N$(total examples in the dataset)

minibatch size $=M$(number of examples in the minibatch)

 

Thus $\frac{N}{M}=$ number of minibatches = $\alpha$ 

 

$L_{v}=\frac{1}{\alpha}\sum_{i=1}^{\alpha}L_{v}^{q} $ 

Averaging the sum of all validation loss of minibatches.


[2]

 

In our case, there are no biased probability(all tha minibatches has the same probalility to occur which is $1/\alpha$). Which means that the expected value will match the averaged value.

 

Since $L_{v}$ is the averaged $L_{v}^{q}$, the expected values will be the same.

 

Reasons why we should use $L_{v}$ even though it is unbiased. 

 

1. There are no evidence the data distributed through the minibatch is well distributed.

 

2. In statistics, it is known that unbiased estimators are more reliable if the batch size and number of batches are big enough.


[3]

 

Let $L(\hat{y}|x)$ be the expeced loss. For optimal $\hat{y}$, we have to find $\hat{y}$ that minimizes L. Thus $\hat{y}_{optimal}=\textbf{argmin}_{\hat{y}} L(\hat{y}|x)$

 

By the same context of expected value, the same holds for expected loss.

 

$L(\hat{y}|x)=\sum_{y} l(y,\hat{y})P(y|x)$

 

Here, $P(y|x)$ is the probability of $y$ and $l(y,\hat{y})$ is the loss given $y$ from $P(y|x)$.

 

Thus for an optimal selection of $\hat{y}$ is

$\hat{y}_{optimal}=\textbf{argmin}_{\hat{y}}\sum_{y} l(y,\hat{y})P(y|x)$

728x90
반응형