[1]
$L_{v}$ denotes the 'averaged total validation loss' and $L_{v}^{d}$ denotes 'averaged validation loss of a minibatch'. By the question, we have to find the relationship between $L_{v}$ and $L_{v}^{d}$.
Let
sample size $=N$(total examples in the dataset)
minibatch size $=M$(number of examples in the minibatch)
Thus $\frac{N}{M}=$ number of minibatches = $\alpha$
$L_{v}=\frac{1}{\alpha}\sum_{i=1}^{\alpha}L_{v}^{q} $
Averaging the sum of all validation loss of minibatches.
[2]
In our case, there are no biased probability(all tha minibatches has the same probalility to occur which is $1/\alpha$). Which means that the expected value will match the averaged value.
Since $L_{v}$ is the averaged $L_{v}^{q}$, the expected values will be the same.
Reasons why we should use $L_{v}$ even though it is unbiased.
1. There are no evidence the data distributed through the minibatch is well distributed.
2. In statistics, it is known that unbiased estimators are more reliable if the batch size and number of batches are big enough.
[3]
Let $L(\hat{y}|x)$ be the expeced loss. For optimal $\hat{y}$, we have to find $\hat{y}$ that minimizes L. Thus $\hat{y}_{optimal}=\textbf{argmin}_{\hat{y}} L(\hat{y}|x)$
By the same context of expected value, the same holds for expected loss.
$L(\hat{y}|x)=\sum_{y} l(y,\hat{y})P(y|x)$
Here, $P(y|x)$ is the probability of $y$ and $l(y,\hat{y})$ is the loss given $y$ from $P(y|x)$.
Thus for an optimal selection of $\hat{y}$ is
$\hat{y}_{optimal}=\textbf{argmin}_{\hat{y}}\sum_{y} l(y,\hat{y})P(y|x)$
'ML&DL > Dive into Deep Learning' 카테고리의 다른 글
[4.4.7] Dive into Deep Learning : exercise answers (0) | 2024.03.03 |
---|---|
[4.2.5] Dive into Deep Learning : exercise answers (0) | 2024.02.29 |
[4.1.5] Dive into Deep Learning : exercise answers (1) | 2024.02.25 |
[3.7.6] Dive into Deep Learning : exercise answers (1) | 2024.02.15 |
[3.6.5] Dive into Deep Learning : exercise answers (0) | 2024.02.15 |