일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
- Ai
- 기초통계
- 주간보고
- 리뷰
- python
- 열심히
- leetcode
- 꾸준히
- 매일매일
- 재미져
- yolo
- 빅데이터
- 2021
- 성실히
- bootcamp
- 파이썬
- 선형회귀
- 자료구조
- pandas
- selenium
- JavaScript
- 독서
- 코드스테이츠
- 딥러닝
- SQL
- Codestates
- 부트캠프
- 코딩테스트
- 노마드코더
- MYSQL
- Today
- Total
코딩일기
[논문 리뷰] Beyond Goldfish Memory: Long-Term Open-Domain Conversation(feat. Blenderbot2.0, Long-Term chatbot, Facebook AI Research, Parlai, MSC) 본문
[논문 리뷰] Beyond Goldfish Memory: Long-Term Open-Domain Conversation(feat. Blenderbot2.0, Long-Term chatbot, Facebook AI Research, Parlai, MSC)
daje 2022. 6. 14. 19:07Personas
-. We use the 1,155 personas crowdsourced from Zhang et al. (2018)
session1
-. For the first chat session we use the PERSONACHAT dataset (Zhang et al., 2018), which already involves short conversations where two speakers get to know each other for the first time.
-. We note that these conversations rarely go beyond the superficial stage because speakers simply do not have enough turns to discuss any topic deeply
session1에서는 처음 만난 상대와 대화를 나누기 때문에 피상적인 대화가 지속적으로 이루어진다.
이런 피상적인 대화가 없는것을 거의 볼 수 없었다라고 이야기합니다.
session2,3,4
-. subsequent session : session2,3,4
-. first select a random amount of time that has elapsed since the previous session.
-. We ask the crowdworkers to play the same roles that were played in the previous session, acting as if that amount of time has transpired.
-. We note these crowdworkers may not be the same ones that played those characters in previous sessions, but will be playing the same roles because this makes the task tractable in a crowdworking frameworking where jobs are typically short, and matching pairs over a long duration would be infeasible.
알바처럼 짧게 일을 하는 업무 특성상 한 사람이 했던 작업의 쌍을 맞추고
관리하는 것이 어렵기 때문에 위와 같이 작업했다고 합니다.
session Length
training conversation : 4000episodes with 3 sessions
Conversation Summaries
-. We then show these summaries as the primary reference for subsequent session dialogues
-. they can also be seen to function as extensions of the original given personas. As the two speakers continue to converse they create more depth to those characters.
Dataset Examples
-. 예시 이미지 넣기
Dataset Statistics
4. Modeling Multi-Session Chat
4.1 Transformer Encoder-Decoders
-. We consider using the BST 2.7B parameter model from BlenderBot as an initial pre-trained model, which we then fine-tune on the Multi-Session Chat task
1) Encoder-Truncation
-. As BST 2.7B has a truncation of 128 tokens in the encoder, we consider extending this to a larger input.
-. To do this, we extend its learnable positional encodings from 128 to 256, 512 or 1024 tokens.
4.2 Retrieval-Augmentation
-. Transformer encoder with a large context, only some of which is relevant, is to use retireval augmentation.
-. a retrieval system is used to find and select part of the context to be included in the final encoding which is attended to by the decoder.
RAG
-. RAG utilizes a neural-retriever-in-the-loop to retrieve documents or passages stored in an approximate nearest neighbor FAISS index.
-. DPR(Transformer bi-encoder model) is used to score document-context pairs in order to rank them based on their match
-. The DPR model is thus used to both retrieve from the FAISS index, and then score the top N candidates.
FiD and FiD-RAG
-. each of the top N documents returned is prepended to the context and encoded separately by the encoder, and finally all the results are concatenated. The decoder then attends to these encodings to produce a final response
Retriever and Documents
-. Then given a dialogue context, we score each memory using the bi-encoder, and use the top N for generation.
4.3 Summarization memory-Augmentation
-. The retrieval-augmentation model described in the previous section retrieves from the set of past dialogues.
-. However, those approaches have two potential drawbacks:
(i) there is a lot of context to store, and hence retrieve from;
(ii) no processing has been done on that content, so the reading, retrieving and combining to finally generate leaves a lot of work for the model to do.
1) An encoder-decoder abstractive summarizer
2) A memory-augmented generator