Skip to content

Commit fdb1165

Browse files
fix doc (#15160)
1 parent b25dcaa commit fdb1165

File tree

5 files changed

+116
-33
lines changed

5 files changed

+116
-33
lines changed

docs/algorithm/formula_recognition/algorithm_rec_ppformulanet.md

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,24 @@
22

33
## 1. 算法简介
44

5-
`PP-FormulaNet` 是百度飞桨自研的公式识别模型,采用 PaddleX 内部自建的 5百万数据集进行训练,在对应测试集上的精度如下:
5+
`PP-FormulaNet` 是由百度飞桨视觉团队开发的一款先进的公式识别模型,支持识别 5 万个常见 LaTeX 源码词汇。其中,PP-FormulaNet-S 版本采用 PP-HGNetV2-B4 作为骨干网络,通过并行掩码和模型蒸馏等技术大幅提升推理速度并保持高识别精度,适用于简单印刷公式和跨行简单印刷公式等场景;而 PP-FormulaNet-L 版本基于 Vary_VIT_B 并经过大规模公式数据集的深入训练,在复杂公式识别方面表现显著提升,适用于简单印刷、复杂印刷和手写公式。
66

7+
为了进一步提升性能,百度飞桨团队在 PP-FormulaNet 的基础上开发了增强版 `PP-FormulaNet_plus`。该版本使用了来自中文学位论文、专业书籍、教材试卷和数学期刊的丰富数据集,大幅提升了识别能力。其中,PP-FormulaNet_plus-M 和 PP-FormulaNet_plus-L 新增了对中文公式的支持,并将最大预测 token 数从 1024 扩大至 2560,显著提升了复杂公式的识别性能;而 PP-FormulaNet_plus-S 则专注于增强英文公式识别能力,使得 PP-FormulaNet_plus 系列模型在处理复杂多样的公式识别任务时表现更加出色。
78

9+
上述模型在对应测试集上的精度如下:
810

9-
| 模型 | 骨干网络 | 配置文件 | SPE-<br/>BLEU↑ | CPE-<br/>BLEU↑ | Easy-<br/>BLEU↑ | Middle-<br/>BLEU↑ | Hard-<br/>BLEU↑| Avg-<br/>BLEU↑ | 下载链接 |
10-
|-----------|------------|------------------|:--------------:|:---------:|:----------:|:----------------:|:---------:|:-----------------:|:-----------------:|
11-
| UniMERNet | Donut Swin | [UniMERNet.yaml](../../../configs/rec/UniMERNet.yaml) | 0.9187 | 0.9252 | 0.8658 | 0.8228 | 0.7740 | 0.8613 |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_unimernet_train.tar)|
12-
| PP-FormulaNet-S | PPHGNetV2_B4 | [PP-FormulaNet-S.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet-S.yaml) | 0.8694 | 0.8071 | 0.9294 | 0.9112 | 0.8391 | 0.8712 |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_s_train.tar)|
13-
| PP-FormulaNet-L | Vary_VIT_B | [PP-FormulaNet-L.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet-L.yaml) | 0.9055 | 0.9206 | 0.9392 | 0.9273 | 0.9141 | 0.9213 |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_l_train.tar )|
14-
| PP-FormulaNet_plus-S | PPHGNetV2_B4 | [PP-FormulaNet_plus-S.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-S.yaml) | - | - | - | - | - | - |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_s_train.tar )|
15-
| PP-FormulaNet_plus-M | PPHGNetV2_B6 | [PP-FormulaNet_plus-M.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-M.yaml) | - | - | - | - | - | - |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_m_train.tar )|
16-
| PP-FormulaNet_plus-L | Vary_VIT_B | [PP-FormulaNet_plus-L.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-L.yaml) | - | - | - | - | - | - |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_l_train.tar )|
11+
| 模型 | 骨干网络 | 配置文件 | En-<br/>BLEU↑ |Zh-<br/>BLEU(%)↑ |OmniDocBench-<br/>BLEU(%)↑ |GPU推理耗时(ms)| 下载链接 |
12+
|-----------|--------|----------------------------------------|:----------------:|:---------:|:-----------------:|:--------------:|:--------------:|
13+
| UniMERNet | Donut Swin | [UniMERNet.yaml](../../../configs/rec/UniMERNet.yaml) | 85.91 | 43.50 | 67.75 | 2266.96 | [训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_unimernet_train.tar)|
14+
| PP-FormulaNet-S | PPHGNetV2_B4 | [PP-FormulaNet-S.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet-S.yaml) | 87.00 | 45.71 | 59.57| 202.25 |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_s_train.tar)|
15+
| PP-FormulaNet-L | Vary_VIT_B | [PP-FormulaNet-L.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet-L.yaml) | 90.36 | 45.78 | 64.81 | 1976.52 |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_l_train.tar )|
16+
| PP-FormulaNet_plus-S | PPHGNetV2_B4 | [PP-FormulaNet_plus-S.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-S.yaml) | 88.71 | 53.32 | 70.54 | 191.69 |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_s_train.tar )|
17+
| PP-FormulaNet_plus-M | PPHGNetV2_B6 | [PP-FormulaNet_plus-M.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-M.yaml) | 91.45 | 89.76 | 72.07 | 1301.56 |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_m_train.tar )|
18+
| PP-FormulaNet_plus-L | Vary_VIT_B | [PP-FormulaNet_plus-L.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-L.yaml) | 92.22 | 90.64 | 72.45 | 1745.25 |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_l_train.tar )|
19+
| LaTeX-OCR | Hybrid ViT |[LaTeX_OCR_rec.yaml](https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/LaTeX_OCR_rec.yaml)| 74.55 | 39.96 | 47.59| 1244.61 |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
1720

18-
其中,SPE、CPE为UniMERNet的简单公式数据集和复杂公式数据集;Easy、Middle、Hard为PaddleX内部自建的简单公式数据集(LaTeX 代码长度 0-64)、中等公式数据集(LaTeX 代码长度 64-256)和复杂公式数据集(LaTeX 代码长度 256+)。
21+
22+
其中,En-BLEU、Zh-BLEU(%)和OmniDocBench-BLEU(%)分别表示英文公式、中文公式以及OmniDocBench的BLEU分数。这里,英文公式的评估数据集包含UniMERNet的简单和复杂公式,以及PaddleX内部自建的简单、中等和复杂公式。中文公式的评估数据集则来自PaddleX自建的中文公式数据集。而 OmniDocBench 评估集由 [OmniDocBench](https://github.com/opendatalab/OmniDocBench) 中的行间公式图片组成。
1923

2024
## 2. 环境配置
2125
请先参考[《运行环境准备》](../../ppocr/environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](../../ppocr/blog/clone.md)克隆项目代码。
@@ -89,7 +93,7 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' --ips=127.0.0.1 tools/t
8993
```shell
9094
# 注意将pretrained_model的路径设置为本地路径。
9195
python3 tools/infer_rec.py -c configs/rec/PP-FormuaNet/PP-FormulaNet-S.yaml \
92-
-o Global.infer_img='./docs/datasets/images/pme_demo/0000099.png'\
96+
-o Global.infer_img='./docs/datasets/images/pme_demo/0000295.png'\
9397
Global.pretrained_model=./rec_ppformulanet_s_train/best_accuracy.pdparams
9498
# 预测文件夹下所有图像时,可修改infer_img为文件夹,如 Global.infer_img='./doc/datasets/pme_demo/'。
9599
```

docs/algorithm/formula_recognition/algorithm_rec_ppformulanet_en.md

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,24 @@
33
## 1. Introduction
44

55

6-
PP-FormulaNet is a formula recognition model independently developed by Baidu PaddlePaddle. It is trained on a self-built dataset of 5 million samples within PaddleX, achieving the following accuracy on the corresponding test set:
6+
`PP-FormulaNet` is an advanced formula recognition model developed by Baidu’s PaddlePaddle Vision Team, supporting the recognition of 50,000 common LaTeX source terms. The PP-FormulaNet-S version uses PP-HGNetV2-B4 as its backbone network, leveraging techniques like parallel masking and model distillation to significantly enhance inference speed while maintaining high recognition accuracy, making it suitable for scenarios involving simple printed formulas and cross-line simple printed formulas. On the other hand, the PP-FormulaNet-L version is based on Vary_VIT_B and has undergone extensive training on a large-scale formula dataset, showing significant improvement in recognizing complex formulas, and is applicable to simple printed, complex printed, and handwritten formulas.
77

8-
| Model | Backbone | config |SPE-<br/>BLEU↑ | CPE-<br/>BLEU↑ | Easy-<br/>BLEU↑ | Middle-<br/>BLEU↑ | Hard-<br/>BLEU↑| Avg-<br/>BLEU↑ | Download link |
9-
|-----------|--------|---------------------------------------------------|:--------------:|:-----------------:|:----------:|:----------------:|:---------:|:-----------------:|:--------------:|
10-
| UniMERNet | Donut Swin | [UniMERNet.yaml](../../../configs/rec/UniMERNet.yaml) | 0.9187 | 0.9252 | 0.8658 | 0.8228 | 0.7740 | 0.8613 |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_unimernet_train.tar)|
11-
| PP-FormulaNet-S | PPHGNetV2_B4 | [PP-FormulaNet-S.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet-S.yaml) | 0.8694 | 0.8071 | 0.9294 | 0.9112 | 0.8391 | 0.8712 |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_s_train.tar)|
12-
| PP-FormulaNet-L | Vary_VIT_B | [PP-FormulaNet-L.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet-L.yaml) | 0.9055 | 0.9206 | 0.9392 | 0.9273 | 0.9141 | 0.9213 |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_l_train.tar )|
13-
| PP-FormulaNet_plus-S | PPHGNetV2_B4 | [PP-FormulaNet_plus-S.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-S.yaml) | - | - | - | - | - | - |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_s_train.tar )|
14-
| PP-FormulaNet_plus-M | PPHGNetV2_B6 | [PP-FormulaNet_plus-M.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-M.yaml) | - | - | - | - | - | - |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_m_train.tar )|
15-
| PP-FormulaNet_plus-L | Vary_VIT_B | [PP-FormulaNet_plus-L.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-L.yaml) | - | - | - | - | - | - |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_l_train.tar )|
8+
To further enhance performance, the Baidu PaddlePaddle team developed an enhanced version called `PP-FormulaNet_plus` based on PP-FormulaNet. This version utilizes a rich dataset sourced from Chinese theses, professional books, textbooks, exam papers, and mathematical journals, greatly improving recognition capability. Among them, PP-FormulaNet_plus-M and PP-FormulaNet_plus-L add support for Chinese formulas and increase the maximum predicted token count from 1024 to 2560, significantly enhancing the recognition performance of complex formulas. Meanwhile, PP-FormulaNet_plus-S focuses on enhancing English formula recognition, making the PP-FormulaNet_plus series models perform more exceptionally in handling complex and diverse formula recognition tasks.
169

17-
Among them, SPE and CPE refer to the simple and complex formula datasets of UniMERNet, respectively. Easy, Middle, and Hard are simple (LaTeX code length 0-64), medium (LaTeX code length 64-256), and complex formula datasets (LaTeX code length 256+) built internally by PaddleX.
10+
The accuracy of the above models on the corresponding test sets is as follows:
11+
12+
| Model | Backbone | config | En-<br/>BLEU↑ |Zh-<br/>BLEU(%)↑ |OmniDocBench-<br/>BLEU(%)↑ | GPU Inference Time (ms)| Download link |
13+
|-----------|--------|----------------------------------------|:----------------:|:---------:|:-----------------:|:--------------:|:--------------:|
14+
| UniMERNet | Donut Swin | [UniMERNet.yaml](../../../configs/rec/UniMERNet.yaml) | 85.91 | 43.50 | 67.75 | 2266.96 | [trained model](https://paddleocr.bj.bcebos.com/contribution/rec_unimernet_train.tar)|
15+
| PP-FormulaNet-S | PPHGNetV2_B4 | [PP-FormulaNet-S.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet-S.yaml) | 87.00 | 45.71 | 59.57| 202.25 |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_s_train.tar)|
16+
| PP-FormulaNet-L | Vary_VIT_B | [PP-FormulaNet-L.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet-L.yaml) | 90.36 | 45.78 | 64.81 | 1976.52 |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_l_train.tar )|
17+
| PP-FormulaNet_plus-S | PPHGNetV2_B4 | [PP-FormulaNet_plus-S.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-S.yaml) | 88.71 | 53.32 | 70.54 | 191.69 |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_s_train.tar )|
18+
| PP-FormulaNet_plus-M | PPHGNetV2_B6 | [PP-FormulaNet_plus-M.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-M.yaml) | 91.45 | 89.76 | 72.07 | 1301.56 |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_m_train.tar )|
19+
| PP-FormulaNet_plus-L | Vary_VIT_B | [PP-FormulaNet_plus-L.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-L.yaml) | 92.22 | 90.64 | 72.45 | 1745.25 |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_l_train.tar )|
20+
| LaTeX-OCR | Hybrid ViT |[LaTeX_OCR_rec.yaml](https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/LaTeX_OCR_rec.yaml)| 74.55 | 39.96 | 47.59| 1244.61 |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
21+
22+
23+
En-BLEU, Zh-BLEU (%), and OmniDocBench-BLEU (%) represent the BLEU scores for English formulas, Chinese formulas, and the OmniDocBench, respectively. The evaluation dataset for English formulas includes simple and complex formulas from UniMERNet, as well as simple, intermediate, and complex formulas from PaddleX’s internally developed dataset. The evaluation dataset for Chinese formulas comes from PaddleX’s internally developed Chinese formula dataset. The OmniDocBench evaluation set consists of inter-line formula images from [OmniDocBench](https://github.com/opendatalab/OmniDocBench).
1824

1925

2026
## 2. Environment
@@ -74,7 +80,7 @@ Prediction:
7480
```shell
7581
# The configuration file used for prediction must match the training
7682
python3 tools/infer_rec.py -c configs/rec/PP-FormuaNet/PP-FormulaNet-S.yaml \
77-
-o Global.infer_img='./docs/datasets/images/pme_demo/0000099.png'\
83+
-o Global.infer_img='./docs/datasets/images/pme_demo/0000295.png'\
7884
Global.pretrained_model=./rec_ppformulanet_s_train/best_accuracy.pdparams
7985
```
8086

ppocr/metrics/rec_metric.py

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,6 @@ def __init__(self, main_indicator="exp_rate", cal_bleu_score=False, **kwargs):
190190
self.e1_right = []
191191
self.e2_right = []
192192
self.e3_right = []
193-
self.editdistance_total_length = 0
194193
self.exp_total_num = 0
195194
self.edit_dist = 0
196195
self.exp_rate = 0
@@ -210,11 +209,12 @@ def __call__(self, preds, batch, **kwargs):
210209
word_pred = preds
211210
word_label = batch
212211
line_right, e1, e2, e3 = 0, 0, 0, 0
213-
lev_dist = []
212+
bleu_list, lev_dist = [], []
214213
for labels, prediction in zip(word_label, word_pred):
215214
if prediction == labels:
216215
line_right += 1
217216
distance = compute_edit_distance(prediction, labels)
217+
bleu_list.append(compute_bleu_score([prediction], [labels]))
218218
lev_dist.append(Levenshtein.normalized_distance(prediction, labels))
219219
if distance <= 1:
220220
e1 += 1
@@ -228,19 +228,17 @@ def __call__(self, preds, batch, **kwargs):
228228
self.edit_dist = sum(lev_dist) # float
229229
self.exp_rate = line_right # float
230230
if self.cal_bleu_score:
231-
self.bleu_score = compute_bleu_score(word_pred, word_label)
231+
self.bleu_score = sum(bleu_list)
232+
self.bleu_right.append(self.bleu_score)
232233
self.e1 = e1
233234
self.e2 = e2
234235
self.e3 = e3
235236
exp_length = len(word_label)
236237
self.edit_right.append(self.edit_dist)
237238
self.exp_right.append(self.exp_rate)
238-
if self.cal_bleu_score:
239-
self.bleu_right.append(self.bleu_score * batch_size)
240239
self.e1_right.append(self.e1)
241240
self.e2_right.append(self.e2)
242241
self.e3_right.append(self.e3)
243-
self.editdistance_total_length = self.editdistance_total_length + exp_length
244242
self.exp_total_num = self.exp_total_num + exp_length
245243

246244
def get_metric(self):
@@ -254,7 +252,7 @@ def get_metric(self):
254252
cur_edit_distance = sum(self.edit_right) / self.exp_total_num
255253
cur_exp_rate = sum(self.exp_right) / self.exp_total_num
256254
if self.cal_bleu_score:
257-
cur_bleu_score = sum(self.bleu_right) / self.editdistance_total_length
255+
cur_bleu_score = sum(self.bleu_right) / self.exp_total_num
258256
cur_exp_1 = sum(self.e1_right) / self.exp_total_num
259257
cur_exp_2 = sum(self.e2_right) / self.exp_total_num
260258
cur_exp_3 = sum(self.e3_right) / self.exp_total_num

0 commit comments

Comments
 (0)