Transformers
Transformer Network Intuition
Attention + CNN: Self-Attention, Multi-Head Attention
Self-Attention
\( A(q, K, V) \) = attention-based vector representation of a word
= \( \sum_i \frac{ \text{exp} (q*k^{<i>}) }{ \sum_j \text{exp} (q*k^{<j>}) } v^{<i>} \)
= \( \text{softmax} (\frac{QK^T}{\sqrt{d_k}}) V \)
Multi-Head Attention
\( \text{head}_i = \text{Attention} (W_i^Q Q, W_i^K K, W_i^V V) \) for i in (1, ..., h) (h : # of heads)
\( \text{MultiHead} (Q, K, V) = \text{concat} (\text{head}_1 \text{head}_2 ... \text{head}_h) W_o \)
Transformer Network
Multi-Head attention, Positional Encoding, residual connection. masking 등을 이용하여 Transformer model을 만들 수 있다.
'Google Machine Learning Bootcamp 2022 > Sequence Models' 카테고리의 다른 글
3. Sequence Models & Attention Mechanism #2 (0) | 2022.08.15 |
---|---|
3. Sequence Models & Attention Mechanism #1 (0) | 2022.08.15 |
2. Natural Language Processing & Word Embeddings #3 (0) | 2022.08.12 |
2. Natural Language Processing & Word Embeddings #2 (0) | 2022.08.12 |
2. Natural Language Processing & Word Embeddings #1 (0) | 2022.08.12 |
댓글