Merge branch 'main' into docs_for_vl

SWHL · web-flow · commit 4c294b2c5293 · 2026-02-03T09:26:50.000+08:00
diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.en.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.en.md
@@ -48,9 +48,8 @@ The inference performance under default configurations is not fully optimized an
 Install the MLX-VLM inference framework:
 
 ```shell
-git clone https://github.com/Blaizzy/mlx-vlm.git
-cd mlx-vlm
-pip install -e .
+python -m pip install -U mlx-vlm
+python -m pip install "transformers<5.0.0"
 ```
 
 Start the MLX-VLM inference service:
diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.md
@@ -48,9 +48,8 @@ python -m pip install -U "paddleocr[doc-parser]"
 安装 MLX-VLM 推理框架：
 
 ```shell
-git clone https://github.com/Blaizzy/mlx-vlm.git
-cd mlx-vlm
-pip install -e .
+python -m pip install -U mlx-vlm
+python -m pip install "transformers<5.0.0"
 ```
 
 启动 MLX-VLM 推理服务：
diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md
@@ -238,6 +238,12 @@ paddleocr doc_parser -i ./paddleocr_vl_demo.png --device dcu
 # MetaX GPU
 paddleocr doc_parser -i ./paddleocr_vl_demo.png --device metax_gpu
 
+# Apple Silicon
+paddleocr doc_parser -i ./paddleocr_vl_demo.png --device cpu
+
+# Huawei Ascend NPU 
+# Huawei Ascend NPU please refer to Chapter 3 for inference using PaddlePaddle + vLLM
+
 # Use --use_doc_orientation_classify to enable document orientation classification
 paddleocr doc_parser -i ./paddleocr_vl_demo.png --use_doc_orientation_classify True
 
@@ -654,6 +660,10 @@ pipeline = PaddleOCRVL()
 # pipeline = PaddleOCRVL(device="dcu")
 # MetaX GPU
 # pipeline = PaddleOCRVL(device="metax_gpu")
+# Apple Silicon
+# pipeline = PaddleOCRVL(device="cpu")
+# Huawei Ascend NPU 
+# Huawei Ascend NPU please refer to Chapter 3 for inference using PaddlePaddle + vLLM
 
 # pipeline = PaddleOCRVL(use_doc_orientation_classify=True) # Use use_doc_orientation_classify to enable/disable document orientation classification model
 # pipeline = PaddleOCRVL(use_doc_unwarping=True) # Use use_doc_unwarping to enable/disable document unwarping module
@@ -666,21 +676,14 @@ for res in output:
     res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format
 ```
 
-For PDF files, each page will be processed individually, and a separate Markdown file will be generated for each page. If you wish to perform cross-page table merging, reconstruct multi-level labels, or merge multi-page results, you can achieve this using the following method:
+For PDF files, each page will be processed individually, and a separate Markdown file will be generated for each page. If you wish to perform cross-page table merging, reconstruct multi-level headings, or merge multi-page results, you can achieve this using the following method:
 
 ```python
 from paddleocr import PaddleOCRVL
 
 input_file = "./your_pdf_file.pdf"
 
-# NVIDIA GPU
 pipeline = PaddleOCRVL()
-# KUNLUNXIN XPU
-# pipeline = PaddleOCRVL(device="xpu")
-# HYGON DCU
-# pipeline = PaddleOCRVL(device="dcu")
-# MetaX GPU
-# pipeline = PaddleOCRVL(device="metax_gpu")
 
 output = pipeline.predict(input=input_file)
 
@@ -1443,7 +1446,8 @@ Setting it to <code>None</code> means using the instantiation parameter; otherwi
 - Calling the `save_to_img()` method will save the visualization results to the specified `save_path`. If a directory is specified, visualized images for layout region detection, global OCR, layout reading order, etc., will be saved. If a file is specified, it will be saved directly to that file. (Pipelines typically contain many result images, so it is not recommended to directly specify a specific file path, as multiple images will be overwritten, retaining only the last one.)
 - Calling the `save_to_markdown()` method will save the converted Markdown file to the specified `save_path`. The saved file path will be `save_path/{your_img_basename}.md`. If the input is a PDF file, it is recommended to directly specify a directory; otherwise, multiple markdown files will be overwritten.
 
-Additionally, it also supports obtaining visualized images and prediction results with results through attributes, as follows:<table>
+<li>Additionally, it also supports obtaining visualized images and prediction results with results through attributes, as follows:
+<table>
 <thead>
 <tr>
 <th>Attribute</th>
@@ -1477,7 +1481,7 @@ Additionally, it also supports obtaining visualized images and prediction result
   <li>The prediction result returned by the <code>img</code> attribute is data of dict type. The keys are <code>layout_det_res</code>, <code>overall_ocr_res</code>, <code>text_paragraphs_ocr_res</code>, <code>formula_res_region1</code>, <code>table_cell_img</code>, and <code>seal_res_region1</code>, with corresponding values being <code>Image.Image</code> objects: used to display visualized images of layout region detection, OCR, OCR text paragraphs, formulas, tables, and seal results, respectively. If optional modules are not used, the dict only contains <code>layout_det_res</code>.</li>
   <li>The prediction result returned by the <code>markdown</code> attribute is data of dict type. The keys are <code>markdown_texts</code>, <code>markdown_images</code>, and <code>page_continuation_flags</code>, with corresponding values being markdown text, images displayed in Markdown (<code>Image.Image</code> objects), and a bool tuple used to identify whether the first element on the current page is the start of a paragraph and whether the last element is the end of a paragraph, respectively.</li>
 </ul>
-
+</li>
 </details>
 
 ## 3. Enhancing VLM Inference Performance Using Inference Acceleration Frameworks
@@ -1588,7 +1592,7 @@ The parameters supported by this command are as follows:
 
 ### 3.2 Client Usage Methods
 
-After launching the VLM inference service, the client can call the service through PaddleOCR. **Please note that because the client needs to invoke the sequential model for layout detection, it is still recommended to run the client on GPU or other acceleration devices to achieve more stable and efficient performance. Please refer to Section 1 for the client-side environment configuration. The configuration described in Section 3.1 applies only to starting the service and is not applicable to the client.**
+After launching the VLM inference service, the client can call the service through PaddleOCR. **Please note that because the client needs to call the layout detection model, it is still recommended to run the client on GPU or other acceleration devices to achieve more stable and efficient performance. Please refer to Section 1 for the client-side environment configuration. The configuration described in Section 3.1 applies only to starting the service and is not applicable to the client.**
 
 #### 3.2.1 CLI Invocation
 
diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL.md
@@ -240,6 +240,12 @@ paddleocr doc_parser -i ./paddleocr_vl_demo.png --device dcu
 # 沐曦 GPU
 paddleocr doc_parser -i ./paddleocr_vl_demo.png --device metax_gpu
 
+# Apple Silicon
+paddleocr doc_parser -i ./paddleocr_vl_demo.png --device cpu
+
+# 华为昇腾 NPU 
+# 华为昇腾 NPU 请参考第 3 章节使用 PaddlePaddle + vLLM 的方式进行推理
+
 # 通过 --use_doc_orientation_classify 指定是否使用文档方向分类模型
 paddleocr doc_parser -i ./paddleocr_vl_demo.png --use_doc_orientation_classify True
 
@@ -632,6 +638,10 @@ pipeline = PaddleOCRVL()
 # pipeline = PaddleOCRVL(device="dcu")
 # 沐曦 GPU
 # pipeline = PaddleOCRVL(device="metax_gpu")
+# Apple Silicon
+# pipeline = PaddleOCRVL(device="cpu")
+# 华为昇腾 NPU 
+# 华为昇腾 NPU 请参考第 3 章节使用 PaddlePaddle + vLLM 的方式进行推理
 
 # pipeline = PaddleOCRVL(use_doc_orientation_classify=True) # 通过 use_doc_orientation_classify 指定是否使用文档方向分类模型
 # pipeline = PaddleOCRVL(use_doc_unwarping=True) # 通过 use_doc_unwarping 指定是否使用文本图像矫正模块
@@ -644,22 +654,15 @@ for res in output:
     res.save_to_markdown(save_path="output") ## 保存当前图像的markdown格式的结果
 ```
 
-如果是 PDF 文件，会将 PDF 的每一页单独处理，每一页的 Markdown 文件也会对应单独的结果。如果您希望对多页的推理结果进行跨页表格合并、重建多级标和合并多页结果等需求，可以通过如下方式实现：
+如果是 PDF 文件，会将 PDF 的每一页单独处理，每一页的 Markdown 文件也会对应单独的结果。如果您希望对多页的推理结果进行跨页表格合并、重建多级标题和合并多页结果等需求，可以通过如下方式实现：
 
 ```python
 from paddleocr import PaddleOCRVL
 
 input_file = "./your_pdf_file.pdf"
 output_path = Path("./output")
 
-# 英伟达 GPU
 pipeline = PaddleOCRVL()
-# 昆仑芯 XPU
-# pipeline = PaddleOCRVL(device="xpu")
-# 海光 DCU
-# pipeline = PaddleOCRVL(device="dcu")
-# 沐曦 GPU
-# pipeline = PaddleOCRVL(device="metax_gpu")
 
 output = pipeline.predict(input=input_file)
 
@@ -671,7 +674,6 @@ output = pipeline.restructure_pages(pages_res)
 # output = pipeline.restructure_pages(pages_res, merge_table=True, relevel_titles=True) # 合并跨页表格，重建多级标题
 # output = pipeline.restructure_pages(pages_res, merge_table=True, relevel_titles=True, merge_pages=True) # 合并跨页表格，重建多级标题，合并多页结果为一页
 
-
 for res in output:
     res.print() ## 打印预测的结构化输出
     res.save_to_json(save_path="output") ## 保存当前图像的结构化json结果
@@ -691,19 +693,6 @@ output = pipeline.predict(["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"]
 #     output = pipeline.predict(file)
 ```
 
-如果您需要处理多个文件，**建议将包含文件的目录路径，或者文件路径列表传入 `predict` 方法**，以最大化处理效率。例如：
-
-```python
-# `imgs` 目录中包含多张待处理图像：file1.png、file2.png、file3.png
-# 传入目录路径
-output = pipeline.predict("imgs")
-# 或者传入文件路径列表
-output = pipeline.predict(["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"])
-# 以上两种方式的处理效率高于下列方式：
-# for file in ["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"]:
-#     output = pipeline.predict(file)
-```
-
 **注：**
 
 - 在示例代码中，`use_doc_orientation_classify`、`use_doc_unwarping` 参数默认均设置为 `False`，分别表示关闭文档方向分类、文本图像矫正功能，如果需要使用这些功能，可以手动设置为 `True`。
@@ -1222,6 +1211,8 @@ output = pipeline.predict(["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"]
 <th>参数说明</th>
 <th>参数类型</th>
 <th>默认值</th>
+</tr>
+</thead>
 <tr>
 <td><code>res_list</code></td>
 <td><b>含义：</b>多页 PDF 推理预测出的结果列表。</td>
@@ -1424,68 +1415,7 @@ output = pipeline.predict(["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"]
   </li>
 <li>调用<code>save_to_img()</code> 方法会将可视化结果保存到指定的<code>save_path</code>中，如果指定为目录，则会将版面区域检测可视化图像、全局OCR可视化图像、版面阅读顺序可视化图像等内容保存，如果指定为文件，则直接保存到该文件中。</li>
 <li>调用<code>save_to_markdown()</code> 方法会将转化后的 Markdown 文件保存到指定的<code>save_path</code>中，保存的文件路径为<code>save_path/{your_img_basename}.md</code>，如果输入是 PDF 文件，建议直接指定目录，否责多个 markdown 文件会被覆盖。</li>
-
-    - `page_index`: `(Union[int, None])` 如果输入是PDF文件，则表示当前是PDF的第几页，否则为 `None`
-
-    - `page_count`: `(Union[int, None])` 如果输入是PDF文件，表示当前是PDF的总页数，否则为 `None`
-
-    - `width`: `(int)` 原始输入图像的宽度。
-
-    - `height`: `(int)` 原始输入图像的高度。
-
-    - `model_settings`: `(Dict[str, bool])` 配置 PaddleOCR-VL 所需的模型参数
-
-        - `use_doc_preprocessor`: `(bool)` 控制是否启用文档预处理子产线 
-        - `use_layout_detection`: `(bool)` 控制是否启用版面检测模块
-        - `use_chart_recognition`: `(bool)` 控制是否开启图表识别功能
-        - `format_block_content`: `(bool)` 控制是否在`JSON`中保存格式化后的markdown内容
-        - `markdown_ignore_labels`: `(List[str])` 需要在Markdown中忽略的版面标签
-
-    - `doc_preprocessor_res`: `(Dict[str, Union[List[float], str]])` 文档预处理结果dict，仅当`use_doc_preprocessor=True`时存在
-        - `input_path`: `(str)` 文档预处理子接受的图像路径，当输入为`numpy.ndarray`时，保存为`None`，此处为`None`
-        - `page_index`: `None`，此处的输入为`numpy.ndarray`，所以值为`None`
-        - `model_settings`: `(Dict[str, bool])` 文档预处理子的模型配置参数
-          - `use_doc_orientation_classify`: `(bool)` 控制是否启用文档图像方向分类子模块
-          - `use_doc_unwarping`: `(bool)` 控制是否启用文本图像扭曲矫正子模块
-        - `angle`: `(int)` 文档图像方向分类子模块的预测结果，启用时返回实际角度值
-
-    - `parsing_res_list`: `(List[Dict])` 解析结果的列表，每个元素为一个字典，列表顺序为解析后的阅读顺序。
-        - `block_bbox`: `(np.ndarray)` 版面区域的边界框。
-        - `block_label`: `(str)` 版面区域的标签，例如`text`, `table`等。
-        - `block_content`: `(str)` 内容为版面区域内的内容。
-        - `block_id`: `(int)` 版面区域的索引，用于显示版面排序结果。
-        - `block_order` `(int)` 版面区域的顺序，用于显示版面阅读顺序,对于非排序部分，默认值为 `None`。
-
-- 调用`save_to_json()` 方法会将上述内容保存到指定的 `save_path` 中，如果指定为目录，则保存的路径为`save_path/{your_img_basename}_res.json`，如果指定为文件，则直接保存到该文件中。由于 json 文件不支持保存numpy数组，因此会将其中的 `numpy.array` 类型转换为列表形式。json中的字段内容如下：
-    - `input_path`: `(str)` 待预测图像或者PDF的输入路径
-
-    - `page_index`: `(Union[int, None])` 如果输入是PDF文件，则表示当前是PDF的第几页，否则为 `None`
-
-    - `model_settings`: `(Dict[str, bool])` 配置 PaddleOCR-VL 所需的模型参数
-
-        - `use_doc_preprocessor`: `(bool)` 控制是否启用文档预处理子产线 
-        - `use_layout_detection`: `(bool)` 控制是否启用版面检测模块
-        - `use_chart_recognition`: `(bool)` 控制是否开启图表识别功能
-        - `format_block_content`: `(bool)` 控制是否在`JSON`中保存格式化后的markdown内容
-
-    - `doc_preprocessor_res`: `(Dict[str, Union[List[float], str]])` 文档预处理结果dict，仅当`use_doc_preprocessor=True`时存在
-        - `input_path`: `(str)` 文档预处理子接受的图像路径，当输入为`numpy.ndarray`时，保存为`None`，此处为`None`
-        - `page_index`: `None`，此处的输入为`numpy.ndarray`，所以值为`None`
-        - `model_settings`: `(Dict[str, bool])` 文档预处理子的模型配置参数
-          - `use_doc_orientation_classify`: `(bool)` 控制是否启用文档图像方向分类子模块
-          - `use_doc_unwarping`: `(bool)` 控制是否启用文本图像扭曲矫正子模块
-        - `angle`: `(int)` 文档图像方向分类子模块的预测结果，启用时返回实际角度值
-
-    - `parsing_res_list`: `(List[Dict])` 解析结果的列表，每个元素为一个字典，列表顺序为解析后的阅读顺序。
-        - `block_bbox`: `(np.ndarray)` 版面区域的边界框。
-        - `block_label`: `(str)` 版面区域的标签，例如`text`, `table`等。
-        - `block_content`: `(str)` 内容为版面区域内的内容。
-        - `block_id`: `(int)` 版面区域的索引，用于显示版面排序结果。
-        - `block_order` `(int)` 版面区域的顺序，用于显示版面阅读顺序,对于非排序部分，默认值为 `None`。
-- 调用`save_to_img()` 方法会将可视化结果保存到指定的 `save_path` 中，如果指定为目录，则会将版面区域检测可视化图像、全局OCR可视化图像、版面阅读顺序可视化图像等内容保存，如果指定为文件，则直接保存到该文件中。
-- 调用`save_to_markdown()` 方法会将转化后的 Markdown 文件保存到指定的 `save_path` 中，保存的文件路径为`save_path/{your_img_basename}.md`，如果输入是 PDF 文件，建议直接指定目录，否责多个 markdown 文件会被覆盖。
-
-此外，也支持通过属性获取带结果的可视化图像和预测结果，具体如下：
+<li>此外，也支持通过属性获取带结果的可视化图像和预测结果，具体如下：
 <table>
 <thead>
 <tr>
@@ -1519,7 +1449,7 @@ output = pipeline.predict(["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"]
   <li><code>json</code> 属性获取的预测结果为dict类型的数据，相关内容与调用 <code>save_to_json()</code> 方法保存的内容一致。</li>
   <li><code>img</code> 属性返回的预测结果是一个dict类型的数据。其中，键分别为 <code>ocr_res_img</code> 和 <code>preprocessed_img</code>，对应的值是两个 <code>Image.Image</code> 对象：一个用于显示 OCR 结果的可视化图像，另一个用于展示图像预处理的可视化图像。如果没有使用图像预处理子模块，则dict中只包含 <code>ocr_res_img</code>。</li>
 </ul>
-
+</li>
 </details>
 
 
@@ -1641,7 +1571,7 @@ paddleocr genai_server --model_name PaddleOCR-VL-1.5-0.9B --backend vllm --port
 
 ### 3.2 客户端使用方法
 
-启动 VLM 推理服务后，客户端即可通过 PaddleOCR 调用该服务。**请注意，由于客户端需要调用版面检测的顺序模型，仍建议在 GPU 等加速设备上运行客户端，以获得更稳定和高效的性能。客户端环境配置请参考第 1 节，3.1 节介绍的环境配置仅适用于启动服务，不适用于客户端。**
+启动 VLM 推理服务后，客户端即可通过 PaddleOCR 调用该服务。**请注意，由于客户端需要调用版面检测模型，仍建议在 GPU 等加速设备上运行客户端，以获得更稳定和高效的性能。客户端环境配置请参考第 1 节，3.1 节介绍的环境配置仅适用于启动服务，不适用于客户端。**
 
 #### 3.2.1 CLI 调用
 
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -308,8 +308,8 @@ nav:
     - PaddleOCR-VL-1.5简介: version3.x/algorithm/PaddleOCR-VL/PaddleOCR-VL-1.5.md
     - PaddleOCR-VL简介: version3.x/algorithm/PaddleOCR-VL/PaddleOCR-VL.md
     - PaddleOCR-VL NVIDIA Blackwell 架构 GPU 环境配置教程: version3.x/pipeline_usage/PaddleOCR-VL-NVIDIA-Blackwell.md
-    - PaddleOCR-VL 昆仑芯 XPU 环境配置教程: version3.x/pipeline_usage/PaddleOCR-VL-Kunlunxin-XPU.md
-    - PaddleOCR-VL 海光 DCU 环境配置教程: version3.x/pipeline_usage/PaddleOCR-VL-Hygon-DCU.md
+    - PaddleOCR-VL 昆仑芯 XPU 环境配置教程: version3.x/pipeline_usage/PaddleOCR-VL-KUNLUNXIN-XPU.md
+    - PaddleOCR-VL 海光 DCU 环境配置教程: version3.x/pipeline_usage/PaddleOCR-VL-HYGON-DCU.md
     - PaddleOCR-VL 沐曦 GPU 环境配置教程: version3.x/pipeline_usage/PaddleOCR-VL-MetaX-GPU.md
     - PaddleOCR-VL 天数 GPU 环境配置教程: version3.x/pipeline_usage/PaddleOCR-VL-Iluvatar-GPU.md
     - PaddleOCR-VL 华为昇腾 NPU 环境配置教程: version3.x/pipeline_usage/PaddleOCR-VL-Huawei-Ascend-NPU.md