What is RAG?
Retrieval-Augmented Generation (RAG) pairs a search system with a language model. Instead of answering only from memorized training data, the model first retrieves relevant passages from a knowledge source and answers from that evidence — keeping responses current, accurate, and traceable.
What are embeddings?
An embedding turns text into a vector of numbers that captures meaning. Texts with similar meaning get similar vectors, which is why embeddings power semantic search: they match ideas rather than exact keywords, finding the right passage even when the wording is different.
How vector search works
Vector search ranks stored vectors by how close they are to a query vector, usually with cosine similarity. A vector database compares the question against every chunk and returns the closest matches in milliseconds — the retrieval step that makes real-time RAG possible.
How AI assistants use retrieval
A production assistant ingests documents ahead of time — chunking, embedding, and storing them. At query time it embeds your question, retrieves the most relevant chunks, and builds a grounded prompt so the model answers from real sources instead of guessing.