Mini-batch Gradient Descent
Batch Gradient Descent - Training once by m training sets
Mini-batch Gradient Descent - Training t times by m / t training sets
Understanding Mini-batch Gradient Descent
Batch Gradient Descent - mini-batch size is m
Stocastic Gradient Descent - mini-batch size is 1
In practice, mini-batch size is in-between 1 and m
Exponentially Weighted Averages
\( v_t \) = \( \beta v_{t - 1} \) + \( (1 - \beta) \) \( \theta_t \)
\( v_t \) is approximately average over \( \frac{1}{1 - \beta} \) days temperature
Understanding Exponentially Weighted Averages
\( (1 - \epsilon)^{\frac{1}{\epsilon}} = \frac{1}{e} \)
Bias Correction in Exponentially Weighted Averages
\( v_t = \frac{ \beta v_{t - 1} + (1 - \beta) \theta_t }{ 1 - \beta^t } \)
Gradient Descent with Momentum
Momentum: For each iteration t,
\( v_{dw} = \beta v_{dw} + (1 - \beta) dw \), \( v_{db} = \beta v_{db} + (1 - \beta) db \)
\( w = w - \alpha v_{dw} \), \( b = b - \alpha v_{db} \)
People usually don't use bias correction.
RMSprop (Root Mean Square)
On iteration t:
Compute dw, db on current mini-batch
\( S_{dw} = \beta S_{dw} + (1 - \beta) dw^2 (element-wise) \) <- small
\( S_{db} = \beta S_{db} + (1 - \beta) db^2) <- large
\( w := w - \alpha \frac{dw}{\sqrt{S_{dw}}} \), \( b := b - \alpha \frac{db}{\sqrt{S_{db}}} \)
Adam Optimization Algorithm
Momentum + RMSprop
\( \alpha \): needs to be tune
\( \beta_1 \): 0.9
\( \beta_2 \): 0.999
\( \epsilon \): \( 10^{-8} \)
Adam: Adaptive moment estimation
Learning Rate Decay
\( \alpha = \frac{1}{1 + \text{decay_rate} * \text{epoch_num}} \alpha_0 \)
\( \alpha = 0.95^{\text{epoch_num}} \alpha_0 \)
\( \alpha = \frac{k}{\sqrt{\text{epoch_num}}} \alpha_0 \)
'Google Machine Learning Bootcamp 2022 > Improving Deep Neural Networks' 카테고리의 다른 글
3. Hyperparameter Tuning, Batch Normalization and Programming Frameworks (0) | 2022.07.15 |
---|---|
1. Practical Aspects of Deep Learning #3 (0) | 2022.07.10 |
1. Practical Aspects of Deep Learning #2 (0) | 2022.07.06 |
1. Practical Aspects of Deep Learning #1 (0) | 2022.07.05 |
댓글