Benutzer:Cybjoerk/Retrieval-augmented generation

aus Wikipedia, der freien Enzyklopädie
Zur Navigation springen Zur Suche springen
Dieser Artikel (Retrieval-augmented generation) ist im Entstehen begriffen und noch nicht Bestandteil der freien Enzyklopädie Wikipedia.
Wenn du dies liest:
  • Der Text kann teilweise in einer Fremdsprache verfasst, unvollständig sein oder noch ungeprüfte Aussagen enthalten.
  • Wenn du Fragen zum Thema hast, nimm am besten Kontakt mit dem Autor Cybjoerk auf.
Wenn du diesen Artikel überarbeitest:
  • Bitte denke daran, die Angaben im Artikel durch geeignete Quellen zu belegen und zu prüfen, ob er auch anderweitig den Richtlinien der Wikipedia entspricht (siehe Wikipedia:Artikel).
  • Nach erfolgter Übersetzung kannst du diese Vorlage entfernen und den Artikel in den Artikelnamensraum verschieben. Die entstehende Weiterleitung kannst du schnelllöschen lassen.
  • Importe inaktiver Accounts, die länger als drei Monate völlig unbearbeitet sind, werden gelöscht.
Vorlage:Importartikel/Wartung-2024-07

Vorlage:Short description

Retrieval-augmented generation (RAG) is a type of information retrieval process. It modifies interactions with a large language model (LLM) so that it responds to queries with reference to a specified set of documents, using it in preference to information drawn from its own vast, static training data. This allows LLMs to use domain-specific and/or updated information.[1] Use cases include providing chatbot access to internal company data, or giving factual information only from an authoritative source.[2]

The RAG process is made up of four key stages. First, all the data must be prepared and indexed for use by the LLM. Thereafter, each query consists of a retrieval, augmentation and a generation phase.[1]

The data to be referenced must first be converted into LLM embeddings, numerical representations in the form of large vectors. RAG can be used on unstructured (usually text), semi-structured, or structured data (for example knowledge graphs).[1] These embeddings are then stored in a vector database to allow for document retrieval.

Overview of RAG process: user input and context from documents are combined into an LLM prompt to get tailored responses

Given a user query, a document retriever is first called to select the most relevant documents which will be used to augment the query.[3] This is done by encoding the query as a vector embedding and then comparing it to the vectors of the source documents.[2] This comparison can be done using a variety of methods, which depend in part on the type of indexing used.[1]

The model feeds this relevant retrieved information into the LLM via prompt engineering of the user's original query.[2] Newer implementations (Vorlage:As of) can also incorporate specific augmentation modules with abilities such as expanding queries into multiple domains, and using memory and self-improvement to learn from previous retrievals.[1]

Finally, the LLM can generate output based on both the query and the retrieved documents.[4] Some models incorporate extra steps to improve output such as the re-ranking of retrieved information, context selection and fine tuning.[1]

If the external data source is large, retrieval can be slow. The use of RAG does not completely eliminate the general challenges faced by LLMs, including hallucination.[3]

Vorlage:Reflist

{{compu-ai-stub}} [[Category:Large language models]] [[Category:Natural language processing]] [[Category:Information retrieval systems]]

  1. a b c d e f Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen Wang: Retrieval-Augmented Generation for Large Language Models: A Survey. In: eprint arXiv. 2023, doi:10.48550/arXiv.2312.10997.
  2. a b c What is RAG? - Retrieval-Augmented Generation AI Explained - AWS. In: Amazon Web Services, Inc. Abgerufen am 16. Juli 2024.
  3. a b Next-Gen Large Language Models: The Retrieval-Augmented Generation (RAG) Handbook. In: freeCodeCamp.org. 11. Juni 2024, abgerufen am 16. Juli 2024 (englisch).
  4. Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In: Advances in Neural Information Processing Systems. 33. Jahrgang. Curran Associates, Inc., 2020, S. 9459–9474, arxiv:2005.11401 (neurips.cc).