2. Optimization Algorithms

Mini-batch Gradient Descent

Batch Gradient Descent - Training once by m training sets

Mini-batch Gradient Descent - Training t times by m / t training sets

Understanding Mini-batch Gradient Descent

Batch Gradient Descent - mini-batch size is m

Stocastic Gradient Descent - mini-batch size is 1

In practice, mini-batch size is in-between 1 and m

Exponentially Weighted Averages

\( v_t \) = \( \beta v_{t - 1} \) + \( (1 - \beta) \) \( \theta_t \)

\( v_t \) is approximately average over \( \frac{1}{1 - \beta} \) days temperature

Understanding Exponentially Weighted Averages

\( (1 - \epsilon)^{\frac{1}{\epsilon}} = \frac{1}{e} \)

Bias Correction in Exponentially Weighted Averages

\( v_t = \frac{ \beta v_{t - 1} + (1 - \beta) \theta_t }{ 1 - \beta^t } \)

Gradient Descent with Momentum

Momentum: For each iteration t,

\( v_{dw} = \beta v_{dw} + (1 - \beta) dw \), \( v_{db} = \beta v_{db} + (1 - \beta) db \)

\( w = w - \alpha v_{dw} \), \( b = b - \alpha v_{db} \)

People usually don't use bias correction.

RMSprop (Root Mean Square)

On iteration t:

Compute dw, db on current mini-batch

\( S_{dw} = \beta S_{dw} + (1 - \beta) dw^2 (element-wise) \) <- small

\( S_{db} = \beta S_{db} + (1 - \beta) db^2) <- large

\( w := w - \alpha \frac{dw}{\sqrt{S_{dw}}} \), \( b := b - \alpha \frac{db}{\sqrt{S_{db}}} \)

Adam Optimization Algorithm

Momentum + RMSprop

\( \alpha \): needs to be tune

\( \beta_1 \): 0.9

\( \beta_2 \): 0.999

\( \epsilon \): \( 10^{-8} \)

Adam: Adaptive moment estimation

Learning Rate Decay

\( \alpha = \frac{1}{1 + \text{decay_rate} * \text{epoch_num}} \alpha_0 \)

\( \alpha = 0.95^{\text{epoch_num}} \alpha_0 \)

\( \alpha = \frac{k}{\sqrt{\text{epoch_num}}} \alpha_0 \)

저작자표시 (새창열림)

'Google Machine Learning Bootcamp 2022 > Improving Deep Neural Networks' 카테고리의 다른 글

3. Hyperparameter Tuning, Batch Normalization and Programming Frameworks (0)	2022.07.15
1. Practical Aspects of Deep Learning #3 (1)	2022.07.10
1. Practical Aspects of Deep Learning #2 (0)	2022.07.06
1. Practical Aspects of Deep Learning #1 (0)	2022.07.05

Life Story

2. Optimization Algorithms

Mini-batch Gradient Descent

Understanding Mini-batch Gradient Descent

Exponentially Weighted Averages

Understanding Exponentially Weighted Averages

Bias Correction in Exponentially Weighted Averages

Gradient Descent with Momentum

RMSprop (Root Mean Square)

Adam Optimization Algorithm

Learning Rate Decay

'Google Machine Learning Bootcamp 2022 > Improving Deep Neural Networks' 카테고리의 다른 글

댓글

티스토리툴바

2. Optimization Algorithms

Mini-batch Gradient Descent

Understanding Mini-batch Gradient Descent

Exponentially Weighted Averages

Understanding Exponentially Weighted Averages

Bias Correction in Exponentially Weighted Averages

Gradient Descent with Momentum

RMSprop (Root Mean Square)

Adam Optimization Algorithm

Learning Rate Decay

'Google Machine Learning Bootcamp 2022 > Improving Deep Neural Networks' 카테고리의 다른 글

관련글

댓글

티스토리툴바