You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: bootcamp/Evaluation/eval_ragas.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@
14
14
"Ragas is an open source project for evaluating RAG components. [Paper](https://arxiv.org/abs/2309.15217), [Code](https://docs.ragas.io/en/stable/getstarted/index.html), [Docs](https://docs.ragas.io/en/stable/getstarted/index.html), [Intro blog](https://medium.com/towards-data-science/rag-evaluation-using-ragas-4645a4c6c477).\n",
Copy file name to clipboardExpand all lines: bootcamp/Integration/openai_embedding.ipynb
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -15,11 +15,11 @@
15
15
"\n",
16
16
"On January 25, OpenAI released 2 latest embedding models, `text-embedding-3-small` and `text-embedding-3-large`. Both embedding models has better performance over `text-embedding-ada-002`. The `text-embedding-3-small` is a highly efficient model. With 5X cost reduction, it achieves slight higher [MTEB](https://huggingface.co/spaces/mteb/leaderboard) score of 62.3% compared to 61%. `text-embedding-3-large` is OpenAI's best performing model, with 64.6% MTEB score.\n",
"More impressively, both models support trading-off performance and cost with a technique called \"Matryoshka Representation Learning\". Users can get shorten embeddings for vast reduction of the vector storage cost, without sacrificing the retrieval quality much. For example, reducing the vector dimension from 3072 to 256 only reduces the MTEB score from 64.6% to 62%. However, it achieves 12X cost reduction!\n",
Milvus uses a shared-storage [architecture](https://milvus.io/docs/architecture_overview.md) with 4 layers which are mutually independent for scaling or disaster recovery: 1)access layer, 2)coordinator service, 3)worker nodes, and 4)storage. Milvus also includes data sharding, logs-as-data persistence, and streaming data ingestion.
Copy file name to clipboardExpand all lines: bootcamp/OpenAIAssistants/custom_RAG_workflow.ipynb
+4-2Lines changed: 4 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@
20
20
"Using open-source Q&A with retrieval saves money since we make free calls to our own data almost all the time - retrieval, evaluation, and development iterations. We only make a paid call to OpenAI once for the final chat generation step. \n",
Copy file name to clipboardExpand all lines: bootcamp/RAG/advanced_rag/README.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ It's important to note that we'll only provide a high-level exploration of these
9
9
10
10
The diagram below shows the most straightforward vanilla RAG pipeline. First, document chunks are loaded into a vector store (such as [Milvus](https://milvus.io/docs) or [Zilliz cloud](https://zilliz.com/cloud)). Then, the vector store retrieves the Top-K most relevant chunks related to the query. These relevant chunks are then injected into the [LLM](https://zilliz.com/glossary/large-language-models-\(llms\))'s context prompt, and finally, the LLM returns the final answer.
11
11
12
-

12
+

13
13
14
14
## Various Types of RAG Enhancement Techniques
15
15
@@ -31,15 +31,15 @@ Let's explore four effective methods to enhance your query experience: Hypotheti
31
31
32
32
Creating hypothetical questions involves utilizing an LLM to generate multiple questions that users might ask about the content within each document chunk. Before the user's actual query reaches the LLM, the vector store retrieves the most relevant hypothetical questions related to the real query, along with their corresponding document chunks, and forwards them to the LLM.
This methodology bypasses the cross-domain asymmetry problem in the vector search process by directly engaging in query-to-query searches, alleviating the burden on vector searches. However, it introduces additional overhead and uncertainty in generating hypothetical questions.
37
37
38
38
### HyDE (Hypothetical Document Embeddings)
39
39
40
40
HyDE stands for Hypothetical Document Embeddings. It leverages an LLM to craft a "***Hypothetical Document***" or a ***fake*** answer in response to a user query devoid of contextual information. This fake answer is then converted into vector embeddings and employed to query the most relevant document chunks within a vector database. Subsequently, the vector database retrieves the Top-K most relevant document chunks and transmits them to the LLM and the original user query to generate the final answer.
41
41
42
-

42
+

43
43
44
44
This method is similar to the hypothetical question technique in addressing cross-domain asymmetry in vector searches. However, it also has drawbacks, such as the added computational costs and uncertainties of generating fake answers.
45
45
@@ -56,7 +56,7 @@ Imagine a user asking: "***What are the differences in features between Milvus a
56
56
57
57
Once we have these sub-queries, we send them all to the vector database after converting them into vector embeddings. The vector database then finds the Top-K document chunks most relevant to each sub-query. Finally, the LLM uses this information to generate a better answer.
58
58
59
-

59
+

60
60
61
61
By breaking down the user query into sub-queries, we make it easier for our system to find relevant information and provide accurate answers, even to complex questions.
62
62
@@ -72,7 +72,7 @@ To simplify this user query, we can use an LLM to generate a more straightforwar
72
72
73
73
***Stepback Question: "What is the dataset size limit that Milvus can handle?"***
74
74
75
-

75
+

76
76
77
77
This method can help us get better and more accurate answers to complex queries. It breaks down the original question into a simpler form, making it easier for our system to find relevant information and provide accurate responses.
78
78
@@ -84,15 +84,15 @@ Enhancing indexing is another strategy for enhancing the performance of your RAG
84
84
85
85
When building an index, we can employ two granularity levels: child chunks and their corresponding parent chunks. Initially, we search for child chunks at a finer level of detail. Then, we apply a merging strategy: if a specific number, ***n***, of child chunks from the first ***k*** child chunks belong to the same parent chunk, we provide this parent chunk to the LLM as contextual information.
This methodology has been implemented in [LlamaIndex](https://docs.llamaindex.ai/en/stable/examples/retrievers/recursive_retriever_nodes.html).
90
90
91
91
### Constructing Hierarchical Indices
92
92
93
93
When creating indices for documents, we can establish a two-level index: one for document summaries and another for document chunks. The vector search process comprises two stages: initially, we filter relevant documents based on the summary, and subsequently, we retrieve corresponding document chunks exclusively within these relevant documents.
This approach proves beneficial in situations involving extensive data volumes or instances where data is hierarchical, such as content retrieval within a library collection.
98
98
@@ -102,7 +102,7 @@ The Hybrid Retrieval and Reranking technique integrates one or more supplementar
102
102
103
103
Common supplementary retrieval algorithms include lexical frequency-based methods like [BM25](https://milvus.io/docs/embed-with-bm25.md) or big models utilizing sparse embeddings like [Splade](https://zilliz.com/learn/discover-splade-revolutionize-sparse-data-processing). Re-ranking algorithms include RRF or more sophisticated models such as [Cross-Encoder](https://www.sbert.net/examples/applications/cross-encoder/README.html), which resembles BERT-like architectures.
This approach leverages diverse retrieval methods to improve retrieval quality and address potential gaps in vector recall.
108
108
@@ -114,15 +114,15 @@ Refinement of the retriever component within the RAG system can also improve RAG
114
114
115
115
In a basic RAG system, the document chunk given to the LLM is a larger window encompassing the retrieved embedding chunk. This ensures that the information provided to the LLM includes a broader range of contextual details, minimizing information loss. The Sentence Window Retrieval technique decouples the document chunk used for embedding retrieval from the chunk provided to the LLM.
However, expanding the window size may introduce additional interfering information. We can adjust the size of the window expansion based on the specific business needs.
120
120
121
121
### Meta-data Filtering
122
122
123
123
To ensure more precise answers, we can refine the retrieved documents by filtering metadata like time and category before passing them to the LLM. For instance, if financial reports spanning multiple years are retrieved, filtering based on the desired year will refine the information to meet specific requirements. This method proves effective in situations with extensive data and detailed metadata, such as content retrieval in library collections.
@@ -132,7 +132,7 @@ Let’s explore more RAG optimizing techniques by improving the generator within
132
132
133
133
The noise information within retrieved document chunks can significantly impact the accuracy of RAG's final answer. The limited prompt window in LLMs also presents a hurdle for more accurate answers. To address this challenge, we can compress irrelevant details, emphasize key paragraphs, and reduce the overall context length of retrieved document chunks.
This approach is similar to the earlier discussed hybrid retrieval and reranking method, wherein a reranker is utilized to sift out irrelevant document chunks.
138
138
@@ -142,7 +142,7 @@ In the paper "[Lost in the middle](https://arxiv.org/abs/2307.03172)," researche
142
142
143
143
Based on this observation, we can adjust the order of retrieved chunks to improve the answer quality: when retrieving multiple knowledge chunks, chunks with relatively low confidence are placed in the middle, and chunks with relatively high confidence are positioned at both ends.
@@ -156,16 +156,16 @@ Some initially retrieved Top-K document chunks are ambiguous and may not answer
156
156
157
157
We can conduct the reflection using efficient reflection methods such as Natural Language Inference(NLI) models or additional tools like internet searches for verification.
This concept of self-reflection has been explored in several papers or projects, including [Self-RAG](https://arxiv.org/pdf/2310.11511.pdf), [Corrective RAG](https://arxiv.org/pdf/2401.15884.pdf), [LangGraph](https://github.com/langchain-ai/langgraph/blob/main/examples/reflexion/reflexion.ipynb), etc.
162
162
163
163
### Query Routing with an Agent
164
164
165
165
Sometimes, we don’t have to use a RAG system to answer simple questions as it might result in more misunderstanding and inference from misleading information. In such cases, we can use an agent as a router at the querying stage. This agent assesses whether the query needs to go through the RAG pipeline. If it does, the subsequent RAG pipeline is initiated; otherwise, the LLM directly addresses the query.
Copy file name to clipboardExpand all lines: bootcamp/RAG/bedrock_langchain_zilliz_rag.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -131,7 +131,7 @@
131
131
"The zilliz cloud uri and zilliz api key can be obtained from the [Zilliz cloud console guide](https://docs.zilliz.com/docs/on-zilliz-cloud-console).\n",
132
132
"\n",
133
133
"In simple terms, you can access them on your zilliz cloud cluster page.\n",
0 commit comments