Benutzer:Cybjoerk/Retrieval-augmented generation
Retrieval-augmented generation (RAG) is a type of information retrieval process. It modifies interactions with a large language model (LLM) so that it responds to queries with reference to a specified set of documents, using it in preference to information drawn from its own vast, static training data. This allows LLMs to use domain-specific and/or updated information.[1] Use cases include providing chatbot access to internal company data, or giving factual information only from an authoritative source.[2]
Process
[Bearbeiten | Quelltext bearbeiten]The RAG process is made up of four key stages. First, all the data must be prepared and indexed for use by the LLM. Thereafter, each query consists of a retrieval, augmentation and a generation phase.[1]
Indexing
[Bearbeiten | Quelltext bearbeiten]The data to be referenced must first be converted into LLM embeddings, numerical representations in the form of large vectors. RAG can be used on unstructured (usually text), semi-structured, or structured data (for example knowledge graphs).[1] These embeddings are then stored in a vector database to allow for document retrieval.
Retrieval
[Bearbeiten | Quelltext bearbeiten]Given a user query, a document retriever is first called to select the most relevant documents which will be used to augment the query.[3] This is done by encoding the query as a vector embedding and then comparing it to the vectors of the source documents.[2] This comparison can be done using a variety of methods, which depend in part on the type of indexing used.[1]
Augmentation
[Bearbeiten | Quelltext bearbeiten]The model feeds this relevant retrieved information into the LLM via prompt engineering of the user's original query.[2] Newer implementations (Vorlage:As of) can also incorporate specific augmentation modules with abilities such as expanding queries into multiple domains, and using memory and self-improvement to learn from previous retrievals.[1]
Generation
[Bearbeiten | Quelltext bearbeiten]Finally, the LLM can generate output based on both the query and the retrieved documents.[4] Some models incorporate extra steps to improve output such as the re-ranking of retrieved information, context selection and fine tuning.[1]
Challenges
[Bearbeiten | Quelltext bearbeiten]If the external data source is large, retrieval can be slow. The use of RAG does not completely eliminate the general challenges faced by LLMs, including hallucination.[3]
References
[Bearbeiten | Quelltext bearbeiten]{{compu-ai-stub}} [[Category:Large language models]] [[Category:Natural language processing]] [[Category:Information retrieval systems]]
- ↑ a b c d e f Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen Wang: Retrieval-Augmented Generation for Large Language Models: A Survey. In: eprint arXiv. 2023, doi:10.48550/arXiv.2312.10997.
- ↑ a b c What is RAG? - Retrieval-Augmented Generation AI Explained - AWS. In: Amazon Web Services, Inc. Abgerufen am 16. Juli 2024.
- ↑ a b Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook. In: freeCodeCamp.org. 11. Juni 2024, abgerufen am 16. Juli 2024 (englisch).
- ↑ Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In: Advances in Neural Information Processing Systems. 33. Jahrgang. Curran Associates, Inc., 2020, S. 9459–9474, arxiv:2005.11401 (neurips.cc).