Skip to content

Commit 3eb3ad9

Browse files
authored
update mlx-vlm version and vl docs (#17605) (#17608)
* update mlx-vlm version * update * update * update vl docs
1 parent cb820ee commit 3eb3ad9

File tree

4 files changed

+26
-36
lines changed

4 files changed

+26
-36
lines changed

docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.en.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,8 @@ The inference performance under default configurations is not fully optimized an
4848
Install the MLX-VLM inference framework:
4949

5050
```shell
51-
git clone https://github.com/Blaizzy/mlx-vlm.git
52-
cd mlx-vlm
53-
pip install -e .
51+
python -m pip install -U mlx-vlm
52+
python -m pip install "transformers<5.0.0"
5453
```
5554

5655
Start the MLX-VLM inference service:

docs/version3.x/pipeline_usage/PaddleOCR-VL-Apple-Silicon.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,8 @@ python -m pip install -U "paddleocr[doc-parser]"
4848
安装 MLX-VLM 推理框架:
4949

5050
```shell
51-
git clone https://github.com/Blaizzy/mlx-vlm.git
52-
cd mlx-vlm
53-
pip install -e .
51+
python -m pip install -U mlx-vlm
52+
python -m pip install "transformers<5.0.0"
5453
```
5554

5655
启动 MLX-VLM 推理服务:

docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,12 @@ paddleocr doc_parser -i ./paddleocr_vl_demo.png --device dcu
238238
# MetaX GPU
239239
paddleocr doc_parser -i ./paddleocr_vl_demo.png --device metax_gpu
240240

241+
# Apple Silicon
242+
paddleocr doc_parser -i ./paddleocr_vl_demo.png --device cpu
243+
244+
# Huawei Ascend NPU
245+
# Huawei Ascend NPU please refer to Chapter 3 for inference using PaddlePaddle + vLLM
246+
241247
# Use --use_doc_orientation_classify to enable document orientation classification
242248
paddleocr doc_parser -i ./paddleocr_vl_demo.png --use_doc_orientation_classify True
243249

@@ -654,6 +660,10 @@ pipeline = PaddleOCRVL()
654660
# pipeline = PaddleOCRVL(device="dcu")
655661
# MetaX GPU
656662
# pipeline = PaddleOCRVL(device="metax_gpu")
663+
# Apple Silicon
664+
# pipeline = PaddleOCRVL(device="cpu")
665+
# Huawei Ascend NPU
666+
# Huawei Ascend NPU please refer to Chapter 3 for inference using PaddlePaddle + vLLM
657667

658668
# pipeline = PaddleOCRVL(use_doc_orientation_classify=True) # Use use_doc_orientation_classify to enable/disable document orientation classification model
659669
# pipeline = PaddleOCRVL(use_doc_unwarping=True) # Use use_doc_unwarping to enable/disable document unwarping module
@@ -666,21 +676,14 @@ for res in output:
666676
res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format
667677
```
668678

669-
For PDF files, each page will be processed individually, and a separate Markdown file will be generated for each page. If you wish to perform cross-page table merging, reconstruct multi-level labels, or merge multi-page results, you can achieve this using the following method:
679+
For PDF files, each page will be processed individually, and a separate Markdown file will be generated for each page. If you wish to perform cross-page table merging, reconstruct multi-level headings, or merge multi-page results, you can achieve this using the following method:
670680

671681
```python
672682
from paddleocr import PaddleOCRVL
673683

674684
input_file = "./your_pdf_file.pdf"
675685

676-
# NVIDIA GPU
677686
pipeline = PaddleOCRVL()
678-
# KUNLUNXIN XPU
679-
# pipeline = PaddleOCRVL(device="xpu")
680-
# HYGON DCU
681-
# pipeline = PaddleOCRVL(device="dcu")
682-
# MetaX GPU
683-
# pipeline = PaddleOCRVL(device="metax_gpu")
684687

685688
output = pipeline.predict(input=input_file)
686689

docs/version3.x/pipeline_usage/PaddleOCR-VL.md

Lines changed: 11 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,12 @@ paddleocr doc_parser -i ./paddleocr_vl_demo.png --device dcu
240240
# 沐曦 GPU
241241
paddleocr doc_parser -i ./paddleocr_vl_demo.png --device metax_gpu
242242

243+
# Apple Silicon
244+
paddleocr doc_parser -i ./paddleocr_vl_demo.png --device cpu
245+
246+
# 华为昇腾 NPU
247+
# 华为昇腾 NPU 请参考第 3 章节使用 PaddlePaddle + vLLM 的方式进行推理
248+
243249
# 通过 --use_doc_orientation_classify 指定是否使用文档方向分类模型
244250
paddleocr doc_parser -i ./paddleocr_vl_demo.png --use_doc_orientation_classify True
245251

@@ -632,6 +638,10 @@ pipeline = PaddleOCRVL()
632638
# pipeline = PaddleOCRVL(device="dcu")
633639
# 沐曦 GPU
634640
# pipeline = PaddleOCRVL(device="metax_gpu")
641+
# Apple Silicon
642+
# pipeline = PaddleOCRVL(device="cpu")
643+
# 华为昇腾 NPU
644+
# 华为昇腾 NPU 请参考第 3 章节使用 PaddlePaddle + vLLM 的方式进行推理
635645

636646
# pipeline = PaddleOCRVL(use_doc_orientation_classify=True) # 通过 use_doc_orientation_classify 指定是否使用文档方向分类模型
637647
# pipeline = PaddleOCRVL(use_doc_unwarping=True) # 通过 use_doc_unwarping 指定是否使用文本图像矫正模块
@@ -644,22 +654,15 @@ for res in output:
644654
res.save_to_markdown(save_path="output") ## 保存当前图像的markdown格式的结果
645655
```
646656

647-
如果是 PDF 文件,会将 PDF 的每一页单独处理,每一页的 Markdown 文件也会对应单独的结果。如果您希望对多页的推理结果进行跨页表格合并、重建多级标和合并多页结果等需求,可以通过如下方式实现:
657+
如果是 PDF 文件,会将 PDF 的每一页单独处理,每一页的 Markdown 文件也会对应单独的结果。如果您希望对多页的推理结果进行跨页表格合并、重建多级标题和合并多页结果等需求,可以通过如下方式实现:
648658

649659
```python
650660
from paddleocr import PaddleOCRVL
651661

652662
input_file = "./your_pdf_file.pdf"
653663
output_path = Path("./output")
654664

655-
# 英伟达 GPU
656665
pipeline = PaddleOCRVL()
657-
# 昆仑芯 XPU
658-
# pipeline = PaddleOCRVL(device="xpu")
659-
# 海光 DCU
660-
# pipeline = PaddleOCRVL(device="dcu")
661-
# 沐曦 GPU
662-
# pipeline = PaddleOCRVL(device="metax_gpu")
663666

664667
output = pipeline.predict(input=input_file)
665668

@@ -671,7 +674,6 @@ output = pipeline.restructure_pages(pages_res)
671674
# output = pipeline.restructure_pages(pages_res, merge_table=True, relevel_titles=True) # 合并跨页表格,重建多级标题
672675
# output = pipeline.restructure_pages(pages_res, merge_table=True, relevel_titles=True, merge_pages=True) # 合并跨页表格,重建多级标题,合并多页结果为一页
673676

674-
675677
for res in output:
676678
res.print() ## 打印预测的结构化输出
677679
res.save_to_json(save_path="output") ## 保存当前图像的结构化json结果
@@ -691,19 +693,6 @@ output = pipeline.predict(["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"]
691693
# output = pipeline.predict(file)
692694
```
693695

694-
如果您需要处理多个文件,**建议将包含文件的目录路径,或者文件路径列表传入 `predict` 方法**,以最大化处理效率。例如:
695-
696-
```python
697-
# `imgs` 目录中包含多张待处理图像:file1.png、file2.png、file3.png
698-
# 传入目录路径
699-
output = pipeline.predict("imgs")
700-
# 或者传入文件路径列表
701-
output = pipeline.predict(["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"])
702-
# 以上两种方式的处理效率高于下列方式:
703-
# for file in ["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"]:
704-
# output = pipeline.predict(file)
705-
```
706-
707696
**注:**
708697

709698
- 在示例代码中,`use_doc_orientation_classify``use_doc_unwarping` 参数默认均设置为 `False`,分别表示关闭文档方向分类、文本图像矫正功能,如果需要使用这些功能,可以手动设置为 `True`

0 commit comments

Comments
 (0)