Skip to content

Commit b6b172d

Browse files
authored
Refactor folder structure (#1550)
Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
1 parent 0e36baf commit b6b172d

File tree

282 files changed

+191
-214
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

282 files changed

+191
-214
lines changed

.github/workflows/black.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,5 @@ jobs:
1010
- uses: psf/black@stable
1111
with:
1212
options: "--check --diff --verbose"
13-
src: "./bootcamp/tutorials"
13+
src: "./tutorials"
1414
jupyter: true

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<img src="images/logo.png" alt="milvus bootcamp banner">
1+
<img src="pics/logo.png" alt="milvus bootcamp banner">
22

33
<div class="column" align="middle">
44
<a href="https://github.com/milvus-io/bootcamp/blob/master/LICENSE"><img height="20" src="https://img.shields.io/github/license/milvus-io/bootcamp" alt="license"/></a>

blog/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Codes & Tutorials used in blogs

bootcamp/Evaluation/eval_ragas.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
"Ragas is an open source project for evaluating RAG components. [Paper](https://arxiv.org/abs/2309.15217), [Code](https://docs.ragas.io/en/stable/getstarted/index.html), [Docs](https://docs.ragas.io/en/stable/getstarted/index.html), [Intro blog](https://medium.com/towards-data-science/rag-evaluation-using-ragas-4645a4c6c477).\n",
1515
"\n",
1616
"<div>\n",
17-
"<img src=\"../../images/ragas_eval_image.png\" width=\"80%\"/>\n",
17+
"<img src=\"../../pics/ragas_eval_image.png\" width=\"80%\"/>\n",
1818
"</div>\n",
1919
"\n",
2020
"**Please note that RAGAS can use a large amount of OpenAI api token consumption.** <br> \n",

bootcamp/Integration/bge_m3_embedding.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
"\n",
2020
"This tutorial shows how to use **BGE M3 embedding model with Milvus** for semantic similarity search.\n",
2121
"\n",
22-
"![](../../images/bge_m3.png)\n"
22+
"![](../../pics/bge_m3.png)\n"
2323
]
2424
},
2525
{

bootcamp/Integration/openai_embedding.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,11 @@
1515
"\n",
1616
"On January 25, OpenAI released 2 latest embedding models, `text-embedding-3-small` and `text-embedding-3-large`. Both embedding models has better performance over `text-embedding-ada-002`. The `text-embedding-3-small` is a highly efficient model. With 5X cost reduction, it achieves slight higher [MTEB](https://huggingface.co/spaces/mteb/leaderboard) score of 62.3% compared to 61%. `text-embedding-3-large` is OpenAI's best performing model, with 64.6% MTEB score.\n",
1717
"\n",
18-
"![](../../images/openai_embedding_scores.png)\n",
18+
"![](../../pics/openai_embedding_scores.png)\n",
1919
"\n",
2020
"More impressively, both models support trading-off performance and cost with a technique called \"Matryoshka Representation Learning\". Users can get shorten embeddings for vast reduction of the vector storage cost, without sacrificing the retrieval quality much. For example, reducing the vector dimension from 3072 to 256 only reduces the MTEB score from 64.6% to 62%. However, it achieves 12X cost reduction!\n",
2121
"\n",
22-
"![](../../images/openai_embedding_vector_size.png)\n",
22+
"![](../../pics/openai_embedding_vector_size.png)\n",
2323
"\n",
2424
"This tutorial shows how to use OpenAI's newest embedding models with Milvus for semantic similarity search."
2525
]

bootcamp/MilvusCheatSheet.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
- [Community & Help](#community--help)
2222

2323
<div>
24-
<img src="../images/milvus_zilliz_overview.png" width="90%"/>
24+
<img src="../pics/milvus_zilliz_overview.png" width="90%"/>
2525
</div>
2626

2727
## Milvus Introduction
@@ -53,7 +53,7 @@
5353
Milvus uses a shared-storage [architecture](https://milvus.io/docs/architecture_overview.md) with 4 layers which are mutually independent for scaling or disaster recovery: 1)access layer, 2)coordinator service, 3)worker nodes, and 4)storage. Milvus also includes data sharding, logs-as-data persistence, and streaming data ingestion.
5454

5555
<div>
56-
<img src="../images/oss_zilliz_architecture.png" width="90%"/>
56+
<img src="../pics/oss_zilliz_architecture.png" width="90%"/>
5757
</div>
5858

5959
### Documentation & Releases

bootcamp/OpenAIAssistants/custom_RAG_workflow.ipynb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
"Using open-source Q&A with retrieval saves money since we make free calls to our own data almost all the time - retrieval, evaluation, and development iterations. We only make a paid call to OpenAI once for the final chat generation step. \n",
2121
"\n",
2222
"<div>\n",
23-
"<img src=\"../../images/rag_image.png\" width=\"80%\"/>\n",
23+
"<img src=\"../../pics/rag_image.png\" width=\"80%\"/>\n",
2424
"</div>\n",
2525
"\n",
2626
"Let's get started!"
@@ -194,6 +194,7 @@
194194
},
195195
{
196196
"cell_type": "markdown",
197+
"id": "60a51aa1",
197198
"metadata": {},
198199
"source": [
199200
"## Create a Milvus collection\n",
@@ -234,6 +235,7 @@
234235
{
235236
"cell_type": "code",
236237
"execution_count": 4,
238+
"id": "341bf019",
237239
"metadata": {},
238240
"outputs": [
239241
{
@@ -306,7 +308,7 @@
306308
"For each original text chunk, we'll write the quadruplet (`vector, text, source, h1, h2`) into the database.\n",
307309
"\n",
308310
"<div>\n",
309-
"<img src=\"../../images/db_insert.png\" width=\"80%\"/>\n",
311+
"<img src=\"../../pics/db_insert.png\" width=\"80%\"/>\n",
310312
"</div>\n",
311313
"\n",
312314
"**The Milvus Client wrapper can only handle loading data from a list of dictionaries.**\n",

bootcamp/RAG/advanced_rag/README.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ It's important to note that we'll only provide a high-level exploration of these
99

1010
The diagram below shows the most straightforward vanilla RAG pipeline. First, document chunks are loaded into a vector store (such as [Milvus](https://milvus.io/docs) or [Zilliz cloud](https://zilliz.com/cloud)). Then, the vector store retrieves the Top-K most relevant chunks related to the query. These relevant chunks are then injected into the [LLM](https://zilliz.com/glossary/large-language-models-\(llms\))'s context prompt, and finally, the LLM returns the final answer.
1111

12-
![](../../../images/advanced_rag/vanilla_rag.png)
12+
![](../../../pics/advanced_rag/vanilla_rag.png)
1313

1414
## Various Types of RAG Enhancement Techniques
1515

@@ -31,15 +31,15 @@ Let's explore four effective methods to enhance your query experience: Hypotheti
3131

3232
Creating hypothetical questions involves utilizing an LLM to generate multiple questions that users might ask about the content within each document chunk. Before the user's actual query reaches the LLM, the vector store retrieves the most relevant hypothetical questions related to the real query, along with their corresponding document chunks, and forwards them to the LLM.
3333

34-
![](../../../images/advanced_rag/hypothetical_question.png)
34+
![](../../../pics/advanced_rag/hypothetical_question.png)
3535

3636
This methodology bypasses the cross-domain asymmetry problem in the vector search process by directly engaging in query-to-query searches, alleviating the burden on vector searches. However, it introduces additional overhead and uncertainty in generating hypothetical questions.
3737

3838
### HyDE (Hypothetical Document Embeddings)
3939

4040
HyDE stands for Hypothetical Document Embeddings. It leverages an LLM to craft a "***Hypothetical Document***" or a ***fake*** answer in response to a user query devoid of contextual information. This fake answer is then converted into vector embeddings and employed to query the most relevant document chunks within a vector database. Subsequently, the vector database retrieves the Top-K most relevant document chunks and transmits them to the LLM and the original user query to generate the final answer.
4141

42-
![](../../../images/advanced_rag/hyde.png)
42+
![](../../../pics/advanced_rag/hyde.png)
4343

4444
This method is similar to the hypothetical question technique in addressing cross-domain asymmetry in vector searches. However, it also has drawbacks, such as the added computational costs and uncertainties of generating fake answers.
4545

@@ -56,7 +56,7 @@ Imagine a user asking: "***What are the differences in features between Milvus a
5656

5757
Once we have these sub-queries, we send them all to the vector database after converting them into vector embeddings. The vector database then finds the Top-K document chunks most relevant to each sub-query. Finally, the LLM uses this information to generate a better answer.
5858

59-
![](../../../images/advanced_rag/sub_query.png)
59+
![](../../../pics/advanced_rag/sub_query.png)
6060

6161
By breaking down the user query into sub-queries, we make it easier for our system to find relevant information and provide accurate answers, even to complex questions.
6262

@@ -72,7 +72,7 @@ To simplify this user query, we can use an LLM to generate a more straightforwar
7272

7373
***Stepback Question: "What is the dataset size limit that Milvus can handle?"***
7474

75-
![](../../../images/advanced_rag/stepback.png)
75+
![](../../../pics/advanced_rag/stepback.png)
7676

7777
This method can help us get better and more accurate answers to complex queries. It breaks down the original question into a simpler form, making it easier for our system to find relevant information and provide accurate responses.
7878

@@ -84,15 +84,15 @@ Enhancing indexing is another strategy for enhancing the performance of your RAG
8484

8585
When building an index, we can employ two granularity levels: child chunks and their corresponding parent chunks. Initially, we search for child chunks at a finer level of detail. Then, we apply a merging strategy: if a specific number, ***n***, of child chunks from the first ***k*** child chunks belong to the same parent chunk, we provide this parent chunk to the LLM as contextual information.
8686

87-
![](../../../images/advanced_rag/merge_chunks.png)
87+
![](../../../pics/advanced_rag/merge_chunks.png)
8888

8989
This methodology has been implemented in [LlamaIndex](https://docs.llamaindex.ai/en/stable/examples/retrievers/recursive_retriever_nodes.html).
9090

9191
### Constructing Hierarchical Indices
9292

9393
When creating indices for documents, we can establish a two-level index: one for document summaries and another for document chunks. The vector search process comprises two stages: initially, we filter relevant documents based on the summary, and subsequently, we retrieve corresponding document chunks exclusively within these relevant documents.
9494

95-
![](../../../images/advanced_rag/hierarchical_index.png)
95+
![](../../../pics/advanced_rag/hierarchical_index.png)
9696

9797
This approach proves beneficial in situations involving extensive data volumes or instances where data is hierarchical, such as content retrieval within a library collection.
9898

@@ -102,7 +102,7 @@ The Hybrid Retrieval and Reranking technique integrates one or more supplementar
102102

103103
Common supplementary retrieval algorithms include lexical frequency-based methods like [BM25](https://milvus.io/docs/embed-with-bm25.md) or big models utilizing sparse embeddings like [Splade](https://zilliz.com/learn/discover-splade-revolutionize-sparse-data-processing). Re-ranking algorithms include RRF or more sophisticated models such as [Cross-Encoder](https://www.sbert.net/examples/applications/cross-encoder/README.html), which resembles BERT-like architectures.
104104

105-
![](../../../images/advanced_rag/hybrid_and_rerank.png)
105+
![](../../../pics/advanced_rag/hybrid_and_rerank.png)
106106

107107
This approach leverages diverse retrieval methods to improve retrieval quality and address potential gaps in vector recall.
108108

@@ -114,15 +114,15 @@ Refinement of the retriever component within the RAG system can also improve RAG
114114

115115
In a basic RAG system, the document chunk given to the LLM is a larger window encompassing the retrieved embedding chunk. This ensures that the information provided to the LLM includes a broader range of contextual details, minimizing information loss. The Sentence Window Retrieval technique decouples the document chunk used for embedding retrieval from the chunk provided to the LLM.
116116

117-
![](../../../images/advanced_rag/sentence_window.png)
117+
![](../../../pics/advanced_rag/sentence_window.png)
118118

119119
However, expanding the window size may introduce additional interfering information. We can adjust the size of the window expansion based on the specific business needs.
120120

121121
### Meta-data Filtering
122122

123123
To ensure more precise answers, we can refine the retrieved documents by filtering metadata like time and category before passing them to the LLM. For instance, if financial reports spanning multiple years are retrieved, filtering based on the desired year will refine the information to meet specific requirements. This method proves effective in situations with extensive data and detailed metadata, such as content retrieval in library collections.
124124

125-
![](../../../images/advanced_rag/metadata_filtering.png)
125+
![](../../../pics/advanced_rag/metadata_filtering.png)
126126

127127
## Generator Enhancement
128128

@@ -132,7 +132,7 @@ Let’s explore more RAG optimizing techniques by improving the generator within
132132

133133
The noise information within retrieved document chunks can significantly impact the accuracy of RAG's final answer. The limited prompt window in LLMs also presents a hurdle for more accurate answers. To address this challenge, we can compress irrelevant details, emphasize key paragraphs, and reduce the overall context length of retrieved document chunks.
134134

135-
![](../../../images/advanced_rag/compress_prompt.png)
135+
![](../../../pics/advanced_rag/compress_prompt.png)
136136

137137
This approach is similar to the earlier discussed hybrid retrieval and reranking method, wherein a reranker is utilized to sift out irrelevant document chunks.
138138

@@ -142,7 +142,7 @@ In the paper "[Lost in the middle](https://arxiv.org/abs/2307.03172)," researche
142142

143143
Based on this observation, we can adjust the order of retrieved chunks to improve the answer quality: when retrieving multiple knowledge chunks, chunks with relatively low confidence are placed in the middle, and chunks with relatively high confidence are positioned at both ends.
144144

145-
![](../../../images/advanced_rag/adjust_order.png)
145+
![](../../../pics/advanced_rag/adjust_order.png)
146146

147147
## RAG Pipeline Enhancement
148148

@@ -156,16 +156,16 @@ Some initially retrieved Top-K document chunks are ambiguous and may not answer
156156

157157
We can conduct the reflection using efficient reflection methods such as Natural Language Inference(NLI) models or additional tools like internet searches for verification.
158158

159-
![](../../../images/advanced_rag/self_reflection.png)
159+
![](../../../pics/advanced_rag/self_reflection.png)
160160

161161
This concept of self-reflection has been explored in several papers or projects, including [Self-RAG](https://arxiv.org/pdf/2310.11511.pdf), [Corrective RAG](https://arxiv.org/pdf/2401.15884.pdf), [LangGraph](https://github.com/langchain-ai/langgraph/blob/main/examples/reflexion/reflexion.ipynb), etc.
162162

163163
### Query Routing with an Agent
164164

165165
Sometimes, we don’t have to use a RAG system to answer simple questions as it might result in more misunderstanding and inference from misleading information. In such cases, we can use an agent as a router at the querying stage. This agent assesses whether the query needs to go through the RAG pipeline. If it does, the subsequent RAG pipeline is initiated; otherwise, the LLM directly addresses the query.
166166

167-
![](../../../images/advanced_rag/query_routing.png)
168-
![](../../../images/advanced_rag/query_routing_with_sub_query.png)
167+
![](../../../pics/advanced_rag/query_routing.png)
168+
![](../../../pics/advanced_rag/query_routing_with_sub_query.png)
169169

170170
The agent could take various forms, including an LLM, a small classification model, or even a set of rules.
171171

bootcamp/RAG/bedrock_langchain_zilliz_rag.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@
131131
"The zilliz cloud uri and zilliz api key can be obtained from the [Zilliz cloud console guide](https://docs.zilliz.com/docs/on-zilliz-cloud-console).\n",
132132
"\n",
133133
"In simple terms, you can access them on your zilliz cloud cluster page.\n",
134-
" ![](../../images/zilliz_uri_and_key.png)"
134+
" ![](../../pics/zilliz_uri_and_key.png)"
135135
]
136136
},
137137
{

0 commit comments

Comments
 (0)