2. Natural Language Processing & Word Embeddings #1

단어를 표현하는 방법은 여러 가지가 있다.

1-hot representation -> 단어를 표현하기는 쉽지만 다른 단어와의 관계를 representation으로부터 알 수 없다.

Featurized representation: word embedding -> 단어가 feature와 관계있는 정도를 element로 가지는 vector로 표현한다.

단어 간의 관계를 representation으로부터 얻을 수 있다. (뜻이 비슷하다면 거리가 가깝다던지)

Word embeddig을 사용하면 model의 performance가 높아질 것을 기대해볼 수 있다.

(비슷한 단어라면 word embedding이 비슷할 것이기 때문에 하나를 model이 학습했다면 다른 하나도 알 것이라 기대)

1. Learn word embeddings from large text corpus (1-100B words)

(Or download pre-trained embedding online)

2. Transfer embedding to new task with smaller training set

(say 100k words)

3. Optional: Continue to finetune the word embeddings with new data

정말 유명한 word embedding에 대한 예시다.

\( e_{\text{man}} - e_{\text{woman}} \approx e_{\text{king}} - e_{\text{queen}} \)

representation이 단어의 특징을 포함하기 때문에 저러한 식이 성립할 수 있다.

한 단어의 one-hot vector를 matrix multiplication했을 때 word embedding을 구할 수 있는 matrix를 의미한다.

column이 각 단어의 word embedding으로 이루어져있다.

3. Sequence Models & Attention Mechanism #2 (0)	2022.08.15
3. Sequence Models & Attention Mechanism #1 (0)	2022.08.15
2. Natural Language Processing & Word Embeddings #3 (0)	2022.08.12
2. Natural Language Processing & Word Embeddings #2 (0)	2022.08.12
1. Recurrent Neural Networks (0)	2022.08.11

Life Story