본문 바로가기
Google Machine Learning Bootcamp 2022/Sequence Models

4. Transformer Network

by 사향낭 2022. 8. 16.

Transformers

 

 

Transformer Network Intuition

 

Attention + CNN: Self-Attention, Multi-Head Attention

 

 

Self-Attention

 

\( A(q, K, V) \) = attention-based vector representation of a word

 

= \( \sum_i \frac{ \text{exp} (q*k^{<i>}) }{ \sum_j \text{exp} (q*k^{<j>}) } v^{<i>} \)

 

= \( \text{softmax} (\frac{QK^T}{\sqrt{d_k}}) V \)

 

 

Multi-Head Attention

 

\( \text{head}_i = \text{Attention} (W_i^Q Q, W_i^K K, W_i^V V) \) for i in (1, ..., h) (h : # of heads)

 

\( \text{MultiHead} (Q, K, V) = \text{concat} (\text{head}_1 \text{head}_2 ... \text{head}_h) W_o \)

 

 

Transformer Network

 

Multi-Head attention, Positional Encoding, residual connection. masking 등을 이용하여 Transformer model을 만들 수 있다.

댓글