본문 바로가기
Google Machine Learning Bootcamp 2022/Improving Deep Neural Networks

3. Hyperparameter Tuning, Batch Normalization and Programming Frameworks

by 사향낭 2022. 7. 15.

Hyperparameter Tuning

 

 

Tuning Process

 

 

Try random values for hyperparameter, Don't use a grid

 

Coarse to fine

 

 

Using an Appropriate Scale to pick Hyperparameters

 

 

For example, check \( \alpha \) in log scale

 

For \( \beta \), consider the value of \( 1 - \beta \)

 

 

Hyperparameters Tuning in Practice: Pandas vs. Cavier

 

 

Babysitting one model vs Training many models in parallel

 

 

 

Batch Normalization

 

 

Normalizing Activations in a Network

 

 

Batch Norm: Normalize Z, and set its mean and variance something learnable

 

 

Fitting Batch Norm into a Neural Network

 

 

If we use batch norm, bias term is not meaningful, so get rid of these.

 

 

Why does Batch Norm work?

 

 

Covariate shift

 

Training set과 test set이 다른 distribution을 보이지만, x와 y의 상관관계는 동일한 경우를 의미한다.

 

input의 조그마한 변화도 layer를 거쳐가며 그 영향이 커질 수 있는데 이를 normalization으로 줄여줄 수 있다.

 

각각의 mini-batch는 그 mini-batch의 mean과 variance로 scaling 될 것이다.

 

이는 z 값에 노이즈를 더해 dropout처럼 regularization 효과를 준다.

 

 

Batch Norm at Test Time

 

 

z -> normalization (mean: 0, variance: 1) -> chane its mean and variance as \( \beta, \gamma \)

 

 

 

Multi-class Classification

 

 

Softmax Regression

 

 

Compute each class' probability

 

 

Training a Softmax Classifier

 

 

\( z^{[L]} \) -> \( np.exp(z^{[L]}) \) -> \( \frac{np.exp(z^{[L]})}{ \sum_i np.exp(z^{[L]}_i) } \) 

 

hardmax -> 1 (100 %) for one, 0 (0 %) for others

 

softmax -> probability

 

 

 

Introduction to Programming Frameworks

 

 

Deep Learning Frameworks

 

 

Choosing deep learning frameworks

 

- Ease of programming (development and deployment)

 

- Running speed

 

- Truly open (open source with good governance)

댓글