Google Machine Learning Bootcamp 2022/Improving Deep Neural Networks5 3. Hyperparameter Tuning, Batch Normalization and Programming Frameworks Hyperparameter Tuning Tuning Process Try random values for hyperparameter, Don't use a grid Coarse to fine Using an Appropriate Scale to pick Hyperparameters For example, check \( \alpha \) in log scale For \( \beta \), consider the value of \( 1 - \beta \) Hyperparameters Tuning in Practice: Pandas vs. Cavier Babysitting one model vs Training many models in parallel Batch Normalization Normaliz.. 2022. 7. 15. 2. Optimization Algorithms Mini-batch Gradient Descent Batch Gradient Descent - Training once by m training sets Mini-batch Gradient Descent - Training t times by m / t training sets Understanding Mini-batch Gradient Descent Batch Gradient Descent - mini-batch size is m Stocastic Gradient Descent - mini-batch size is 1 In practice, mini-batch size is in-between 1 and m Exponentially Weighted Averages \( v_t \) = \( \beta .. 2022. 7. 14. 1. Practical Aspects of Deep Learning #3 Setting Up your Optimization Problem Normalizing Inputs 1. Subtract mean \( \mu = \frac{1}{m} \sum^m_i x^{(i)} \) \( x := x - \mu \) 2. Normalize variance \( \sigma^2 = \frac{1}{m} \sum^m_i x^{(i)}*x^{(i)} \) \( x /= \sigma \) Vanishing / Exploding Gradients network가 너무 깊으면 gradient가 너무 작아 사라지거나 너무 커 폭발할 수 있다. Weight initialization for Deep Networks Numerical Approximation of Gradients Gradient .. 2022. 7. 10. 1. Practical Aspects of Deep Learning #2 Regularizing your Neural Network Regularization l2 norm(vector)의 합이 Frobenius norm(matrix)이 된다. Why Regularization Reduces Overfitting? regularization parameter lambda가 크다고 가정하면 weight는 0에 가까울 것이다. 따라서, 1. 몇몇 node들이 사라지며 model의 크기가 줄어든다고 볼 수 있을 것이며, 2. activation function이 non-linearity를 제대로 model에 부여할 수 없다. (activation function이 없어지는 효과가 나타나 이것도 model의 크기가 줄어든다고 볼 수 있다.) overfitting이 불가능할 정도로 m.. 2022. 7. 6. 이전 1 2 다음