2. Natural Language Processing & Word Embeddings #3

리뷰를 input으로, 별점을 output으로 두는 task를 생각해보자.

word embedding으로 averaging 하는 접근은 부정적인 문맥임에도 불구하고 긍정적인 단어가 있는 경우를 판별하지 못할 것이라는 예측을 할 수 있다.

RNN도 마찬가지다. 과거의 가장 중요한 단어가 재대로 propagation 되지 않을 수 있다.

word embedding은 다양한 문장의 training set으로 훈련된다.

하지만 이러한 문장들을 보통은 인터넷에서 가져오게 되고, 과거에 존재했던 옳지 않은 bias가 문장에 섞일 수 있다.

(Man : Computer_Programmer as Woman : Homemaker 과 같이)

이러한 bias는 당연하게도 model에 긍정적인 영향을 주지 않는다.

(model 또한 bias를 가지게 만든다)

bias를 다루는 방법은 다음과 같다.

1. e_{he} - e_{she}, e_{male} - e_{female} 과 같은 diff들을 average 해서 bias direction을 구한다.

2. 이 경우 gender과 관계없는 단어들에서 bias를 제거하여 non-bias direction으로 projection한다.

3. grandmother - grandfather과 같은 pair들을 non-bias direction에서 같은 거리를 갖도록 만든다.

3. Sequence Models & Attention Mechanism #2 (0)	2022.08.15
3. Sequence Models & Attention Mechanism #1 (0)	2022.08.15
2. Natural Language Processing & Word Embeddings #2 (0)	2022.08.12
2. Natural Language Processing & Word Embeddings #1 (0)	2022.08.12
1. Recurrent Neural Networks (0)	2022.08.11

Life Story