[논문 리뷰] Beyond Goldfish Memory: Long-Term Open-Domain Conversation(feat. Blenderbot2.0, Long-Term chatbot, Facebook AI Research, Parlai, MSC)

250x250

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

코딩일기

[논문 리뷰] Beyond Goldfish Memory: Long-Term Open-Domain Conversation(feat. Blenderbot2.0, Long-Term chatbot, Facebook AI Research, Parlai, MSC) 본문

Paper Reviews

[논문 리뷰] Beyond Goldfish Memory: Long-Term Open-Domain Conversation(feat. Blenderbot2.0, Long-Term chatbot, Facebook AI Research, Parlai, MSC)

daje 2022. 6. 14. 19:07

728x90

Personas

-. We use the 1,155 personas crowdsourced from Zhang et al. (2018)

session1

-. For the first chat session we use the PERSONACHAT dataset (Zhang et al., 2018), which already involves short conversations where two speakers get to know each other for the first time.

-. We note that these conversations rarely go beyond the superficial stage because speakers simply do not have enough turns to discuss any topic deeply

session1에서는 처음 만난 상대와 대화를 나누기 때문에 피상적인 대화가 지속적으로 이루어진다.

이런 피상적인 대화가 없는것을 거의 볼 수 없었다라고 이야기합니다.

session2,3,4

-. subsequent session : session2,3,4

-. first select a random amount of time that has elapsed since the previous session.

-. We ask the crowdworkers to play the same roles that were played in the previous session, acting as if that amount of time has transpired.

-. We note these crowdworkers may not be the same ones that played those characters in previous sessions, but will be playing the same roles because this makes the task tractable in a crowdworking frameworking where jobs are typically short, and matching pairs over a long duration would be infeasible.

알바처럼 짧게 일을 하는 업무 특성상 한 사람이 했던 작업의 쌍을 맞추고

관리하는 것이 어렵기 때문에 위와 같이 작업했다고 합니다.

session Length

training conversation : 4000episodes with 3 sessions

Conversation Summaries

-. We then show these summaries as the primary reference for subsequent session dialogues

-. they can also be seen to function as extensions of the original given personas. As the two speakers continue to converse they create more depth to those characters.

Dataset Examples

-. 예시 이미지 넣기

Dataset Statistics

4. Modeling Multi-Session Chat

4.1 Transformer Encoder-Decoders

-. We consider using the BST 2.7B parameter model from BlenderBot as an initial pre-trained model, which we then fine-tune on the Multi-Session Chat task

1) Encoder-Truncation

-. As BST 2.7B has a truncation of 128 tokens in the encoder, we consider extending this to a larger input.

-. To do this, we extend its learnable positional encodings from 128 to 256, 512 or 1024 tokens.

4.2 Retrieval-Augmentation

-. Transformer encoder with a large context, only some of which is relevant, is to use retireval augmentation.

-. a retrieval system is used to find and select part of the context to be included in the final encoding which is attended to by the decoder.

RAG

-. RAG utilizes a neural-retriever-in-the-loop to retrieve documents or passages stored in an approximate nearest neighbor FAISS index.

-. DPR(Transformer bi-encoder model) is used to score document-context pairs in order to rank them based on their match

-. The DPR model is thus used to both retrieve from the FAISS index, and then score the top N candidates.

FiD and FiD-RAG

-. each of the top N documents returned is prepended to the context and encoded separately by the encoder, and finally all the results are concatenated. The decoder then attends to these encodings to produce a final response

Retriever and Documents

-. Then given a dialogue context, we score each memory using the bi-encoder, and use the top N for generation.

4.3 Summarization memory-Augmentation

-. The retrieval-augmentation model described in the previous section retrieves from the set of past dialogues.

-. However, those approaches have two potential drawbacks:

(i) there is a lot of context to store, and hence retrieve from;

(ii) no processing has been done on that content, so the reading, retrieving and combining to finally generate leaves a lot of work for the model to do.

1) An encoder-decoder abstractive summarizer

2) A memory-augmented generator

728x90

저작자표시

'Paper Reviews' 카테고리의 다른 글

[논문 리뷰] Language Models that Seek for Knowledge:Modular Search & Generation for Dialogue and Prompt Completion(feat. SeeKeR, FaceBooK, Chatbot, opendomain, 챗봇) (0)	2022.08.21
[논문 리뷰] Recipes for building an open-domain chatbot ( feat. blenderbot1, parlai, facebook, chatbot, open-domain, bb1, 블렌더봇, 블렌더봇1, 페이스북) (0)	2022.08.01
[논문 리뷰] Internet-Augmented Dialogue Generation(feat. Blenderbot2.0, Long-Term chatbot, Facebook Research, Parlai) (0)	2022.06.13
What is FAISS index (feat. facebook) (0)	2022.06.13
[Code Review] CLIPCAP:CLIP Prefix for Image Captioning (0)	2022.04.22

'Paper Reviews' Related Articles

코딩일기

[논문 리뷰] Beyond Goldfish Memory: Long-Term Open-Domain Conversation(feat. Blenderbot2.0, Long-Term chatbot, Facebook AI Research, Parlai, MSC) 본문

[논문 리뷰] Beyond Goldfish Memory: Long-Term Open-Domain Conversation(feat. Blenderbot2.0, Long-Term chatbot, Facebook AI Research, Parlai, MSC)

'Paper Reviews' 카테고리의 다른 글

티스토리툴바