Skip to content

Commit 0caa3e9

Browse files
add_ppformulanet_plus (#15129)
* add_ppformulanet_plus * rename ppformulanet_l_plus2plus_l
1 parent 53432e3 commit 0caa3e9

File tree

12 files changed

+523
-10
lines changed

12 files changed

+523
-10
lines changed
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
Global:
2+
model_name: PP-FormulaNet_plus-L # To use static model for inference.
3+
use_gpu: True
4+
epoch_num: 10
5+
log_smooth_window: 10
6+
print_batch_step: 10
7+
save_model_dir: ./output/rec/pp_formulanet_plus_l/
8+
save_epoch_step: 2
9+
# evaluation is run every 417 iterations (1 epoch)(batch_size = 24) # max_seq_len: 1024
10+
eval_batch_step: [0, 417 ]
11+
cal_metric_during_train: True
12+
pretrained_model:
13+
checkpoints:
14+
save_inference_dir:
15+
use_visualdl: False
16+
infer_img: doc/datasets/pme_demo/0000013.png
17+
infer_mode: False
18+
use_space_char: False
19+
rec_char_dict_path: &rec_char_dict_path ppocr/utils/dict/unimernet_tokenizer
20+
max_new_tokens: &max_new_tokens 2560
21+
input_size: &input_size [768, 768]
22+
save_res_path: ./output/rec/predicts_pp_formulanet_plus_l.txt
23+
allow_resize_largeImg: False
24+
start_ema: True
25+
d2s_train_image_shape: [1,768,768]
26+
27+
Optimizer:
28+
name: AdamW
29+
beta1: 0.9
30+
beta2: 0.999
31+
weight_decay: 0.05
32+
lr:
33+
name: LinearWarmupCosine
34+
learning_rate: 0.0001
35+
36+
Architecture:
37+
model_type: rec
38+
algorithm: PP-FormulaNet_plus-L
39+
in_channels: 3
40+
Transform:
41+
Backbone:
42+
name: Vary_VIT_B_Formula
43+
image_size: 768
44+
encoder_embed_dim: 768
45+
encoder_depth: 12
46+
encoder_num_heads: 12
47+
encoder_global_attn_indexes: [2, 5, 8, 11]
48+
Head:
49+
name: PPFormulaNet_Head
50+
max_new_tokens: *max_new_tokens
51+
decoder_start_token_id: 0
52+
decoder_ffn_dim: 2048
53+
decoder_hidden_size: 512
54+
decoder_layers: 8
55+
temperature: 0.2
56+
do_sample: False
57+
top_p: 0.95
58+
encoder_hidden_size: 1024
59+
is_export: False
60+
length_aware: False
61+
use_parallel: False
62+
parallel_step: 0
63+
64+
Loss:
65+
name: PPFormulaNet_L_Loss
66+
67+
PostProcess:
68+
name: UniMERNetDecode
69+
rec_char_dict_path: *rec_char_dict_path
70+
71+
Metric:
72+
name: LaTeXOCRMetric
73+
main_indicator: exp_rate
74+
cal_bleu_score: True
75+
76+
Train:
77+
dataset:
78+
name: SimpleDataSet
79+
data_dir: ./ocr_rec_latexocr_dataset_example
80+
label_file_list: ["./ocr_rec_latexocr_dataset_example/train.txt"]
81+
transforms:
82+
- UniMERNetImgDecode:
83+
input_size: *input_size
84+
random_padding: True
85+
random_resize: True
86+
random_crop: True
87+
- UniMERNetTrainTransform:
88+
- LatexImageFormat:
89+
- UniMERNetLabelEncode:
90+
rec_char_dict_path: *rec_char_dict_path
91+
max_seq_len: *max_new_tokens
92+
- KeepKeys:
93+
keep_keys: ['image', 'label', 'attention_mask']
94+
95+
loader:
96+
shuffle: False
97+
drop_last: False
98+
batch_size_per_card: 3
99+
num_workers: 0
100+
collate_fn: UniMERNetCollator
101+
102+
Eval:
103+
dataset:
104+
name: SimpleDataSet
105+
data_dir: ./ocr_rec_latexocr_dataset_example
106+
label_file_list: ["./ocr_rec_latexocr_dataset_example/val.txt"]
107+
transforms:
108+
- UniMERNetImgDecode:
109+
input_size: *input_size
110+
- UniMERNetTestTransform:
111+
- LatexImageFormat:
112+
- UniMERNetLabelEncode:
113+
max_seq_len: *max_new_tokens
114+
rec_char_dict_path: *rec_char_dict_path
115+
- KeepKeys:
116+
keep_keys: ['image', 'label', 'attention_mask', 'filename']
117+
loader:
118+
shuffle: False
119+
drop_last: False
120+
batch_size_per_card: 10
121+
num_workers: 0
122+
collate_fn: UniMERNetCollator
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
Global:
2+
model_name: PP-FormulaNet_plus-M # To use static model for inference.
3+
use_gpu: True
4+
epoch_num: 20
5+
log_smooth_window: 10
6+
print_batch_step: 10
7+
save_model_dir: ./output/rec/pp_formulanet_plus_m/
8+
save_epoch_step: 2
9+
# evaluation is run every 179 iterations (1 epoch)(batch_size = 56) # max_seq_len: 1024
10+
eval_batch_step: [0, 179]
11+
cal_metric_during_train: True
12+
pretrained_model:
13+
checkpoints:
14+
save_inference_dir:
15+
use_visualdl: False
16+
infer_img: doc/datasets/pme_demo/0000013.png
17+
infer_mode: False
18+
use_space_char: False
19+
rec_char_dict_path: &rec_char_dict_path ppocr/utils/dict/unimernet_tokenizer
20+
max_new_tokens: &max_new_tokens 2560
21+
input_size: &input_size [384, 384]
22+
save_res_path: ./output/rec/predicts_pp_formulanet_plus_m.txt
23+
allow_resize_largeImg: False
24+
start_ema: True
25+
d2s_train_image_shape: [1,384,384]
26+
27+
Optimizer:
28+
name: AdamW
29+
beta1: 0.9
30+
beta2: 0.999
31+
weight_decay: 0.05
32+
lr:
33+
name: LinearWarmupCosine
34+
learning_rate: 0.0001
35+
36+
Architecture:
37+
model_type: rec
38+
algorithm: PP-FormulaNet_plus-M
39+
in_channels: 3
40+
Transform:
41+
Backbone:
42+
name: PPHGNetV2_B6_Formula
43+
class_num: 1024
44+
45+
Head:
46+
name: PPFormulaNet_Head
47+
max_new_tokens: *max_new_tokens
48+
decoder_start_token_id: 0
49+
decoder_ffn_dim: 2048
50+
decoder_hidden_size: 512
51+
decoder_layers: 6
52+
temperature: 0.2
53+
do_sample: False
54+
top_p: 0.95
55+
encoder_hidden_size: 2048
56+
is_export: False
57+
length_aware: False
58+
use_parallel: False
59+
parallel_step: 0
60+
61+
Loss:
62+
name: PPFormulaNet_L_Loss
63+
64+
PostProcess:
65+
name: UniMERNetDecode
66+
rec_char_dict_path: *rec_char_dict_path
67+
68+
Metric:
69+
name: LaTeXOCRMetric
70+
main_indicator: exp_rate
71+
cal_bleu_score: True
72+
73+
Train:
74+
dataset:
75+
name: SimpleDataSet
76+
data_dir: ./ocr_rec_latexocr_dataset_example
77+
label_file_list: ["./ocr_rec_latexocr_dataset_example/train.txt"]
78+
transforms:
79+
- UniMERNetImgDecode:
80+
input_size: *input_size
81+
random_padding: True
82+
random_resize: True
83+
random_crop: True
84+
- UniMERNetTrainTransform:
85+
- LatexImageFormat:
86+
- UniMERNetLabelEncode:
87+
rec_char_dict_path: *rec_char_dict_path
88+
max_seq_len: *max_new_tokens
89+
- KeepKeys:
90+
keep_keys: ['image', 'label', 'attention_mask']
91+
92+
loader:
93+
shuffle: False
94+
drop_last: False
95+
batch_size_per_card: 14
96+
num_workers: 0
97+
collate_fn: UniMERNetCollator
98+
99+
Eval:
100+
dataset:
101+
name: SimpleDataSet
102+
data_dir: ./ocr_rec_latexocr_dataset_example
103+
label_file_list: ["./ocr_rec_latexocr_dataset_example/val.txt"]
104+
transforms:
105+
- UniMERNetImgDecode:
106+
input_size: *input_size
107+
- UniMERNetTestTransform:
108+
- LatexImageFormat:
109+
- UniMERNetLabelEncode:
110+
max_seq_len: *max_new_tokens
111+
rec_char_dict_path: *rec_char_dict_path
112+
- KeepKeys:
113+
keep_keys: ['image', 'label', 'attention_mask', 'filename']
114+
loader:
115+
shuffle: False
116+
drop_last: False
117+
batch_size_per_card: 30
118+
num_workers: 0
119+
collate_fn: UniMERNetCollator
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
Global:
2+
model_name: PP-FormulaNet_plus-S # To use static model for inference.
3+
use_gpu: True
4+
epoch_num: 20
5+
log_smooth_window: 10
6+
print_batch_step: 10
7+
save_model_dir: ./output/rec/pp_formulanet_plus_s/
8+
save_epoch_step: 2
9+
# evaluation is run every 179 iterations (1 epoch)(batch_size = 56) # max_seq_len: 1024
10+
eval_batch_step: [0, 179]
11+
cal_metric_during_train: True
12+
pretrained_model:
13+
checkpoints:
14+
save_inference_dir:
15+
use_visualdl: False
16+
infer_img: doc/datasets/pme_demo/0000013.png
17+
infer_mode: False
18+
use_space_char: False
19+
rec_char_dict_path: &rec_char_dict_path ppocr/utils/dict/unimernet_tokenizer
20+
max_new_tokens: &max_new_tokens 1024
21+
input_size: &input_size [384, 384]
22+
save_res_path: ./output/rec/predicts_pp_formulanet_plus_s.txt
23+
allow_resize_largeImg: False
24+
start_ema: True
25+
d2s_train_image_shape: [1,384,384]
26+
27+
Optimizer:
28+
name: AdamW
29+
beta1: 0.9
30+
beta2: 0.999
31+
weight_decay: 0.05
32+
lr:
33+
name: LinearWarmupCosine
34+
learning_rate: 0.0001
35+
36+
Architecture:
37+
model_type: rec
38+
algorithm: PP-FormulaNet_plus-S
39+
in_channels: 3
40+
Transform:
41+
Backbone:
42+
name: PPHGNetV2_B4_Formula
43+
class_num: 1024
44+
45+
Head:
46+
name: PPFormulaNet_Head
47+
max_new_tokens: *max_new_tokens
48+
decoder_start_token_id: 0
49+
decoder_ffn_dim: 1536
50+
decoder_hidden_size: 384
51+
decoder_layers: 2
52+
temperature: 0.2
53+
do_sample: False
54+
top_p: 0.95
55+
encoder_hidden_size: 2048
56+
is_export: False
57+
length_aware: True
58+
use_parallel: True,
59+
parallel_step: 3
60+
61+
Loss:
62+
name: PPFormulaNet_S_Loss
63+
parallel_step: 3
64+
65+
PostProcess:
66+
name: UniMERNetDecode
67+
rec_char_dict_path: *rec_char_dict_path
68+
69+
Metric:
70+
name: LaTeXOCRMetric
71+
main_indicator: exp_rate
72+
cal_bleu_score: True
73+
74+
Train:
75+
dataset:
76+
name: SimpleDataSet
77+
data_dir: ./ocr_rec_latexocr_dataset_example
78+
label_file_list: ["./ocr_rec_latexocr_dataset_example/train.txt"]
79+
transforms:
80+
- UniMERNetImgDecode:
81+
input_size: *input_size
82+
random_padding: True
83+
random_resize: True
84+
random_crop: True
85+
- UniMERNetTrainTransform:
86+
- LatexImageFormat:
87+
- UniMERNetLabelEncode:
88+
rec_char_dict_path: *rec_char_dict_path
89+
max_seq_len: *max_new_tokens
90+
- KeepKeys:
91+
keep_keys: ['image', 'label', 'attention_mask']
92+
93+
loader:
94+
shuffle: False
95+
drop_last: False
96+
batch_size_per_card: 14
97+
num_workers: 0
98+
collate_fn: UniMERNetCollator
99+
100+
Eval:
101+
dataset:
102+
name: SimpleDataSet
103+
data_dir: ./ocr_rec_latexocr_dataset_example
104+
label_file_list: ["./ocr_rec_latexocr_dataset_example/val.txt"]
105+
transforms:
106+
- UniMERNetImgDecode:
107+
input_size: *input_size
108+
- UniMERNetTestTransform:
109+
- LatexImageFormat:
110+
- UniMERNetLabelEncode:
111+
max_seq_len: *max_new_tokens
112+
rec_char_dict_path: *rec_char_dict_path
113+
- KeepKeys:
114+
keep_keys: ['image', 'label', 'attention_mask', 'filename']
115+
loader:
116+
shuffle: False
117+
drop_last: False
118+
batch_size_per_card: 30
119+
num_workers: 0
120+
collate_fn: UniMERNetCollator

docs/algorithm/formula_recognition/algorithm_rec_ppformulanet.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@
1111
| UniMERNet | Donut Swin | [UniMERNet.yaml](../../../configs/rec/UniMERNet.yaml) | 0.9187 | 0.9252 | 0.8658 | 0.8228 | 0.7740 | 0.8613 |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_unimernet_train.tar)|
1212
| PP-FormulaNet-S | PPHGNetV2_B4 | [PP-FormulaNet-S.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet-S.yaml) | 0.8694 | 0.8071 | 0.9294 | 0.9112 | 0.8391 | 0.8712 |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_s_train.tar)|
1313
| PP-FormulaNet-L | Vary_VIT_B | [PP-FormulaNet-L.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet-L.yaml) | 0.9055 | 0.9206 | 0.9392 | 0.9273 | 0.9141 | 0.9213 |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_l_train.tar )|
14+
| PP-FormulaNet_plus-S | PPHGNetV2_B4 | [PP-FormulaNet_plus-S.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-S.yaml) | - | - | - | - | - | - |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_s_train.tar )|
15+
| PP-FormulaNet_plus-M | PPHGNetV2_B6 | [PP-FormulaNet_plus-M.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-M.yaml) | - | - | - | - | - | - |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_m_train.tar )|
16+
| PP-FormulaNet_plus-L | Vary_VIT_B | [PP-FormulaNet_plus-L.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-L.yaml) | - | - | - | - | - | - |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_l_train.tar )|
1417

1518
其中,SPE、CPE为UniMERNet的简单公式数据集和复杂公式数据集;Easy、Middle、Hard为PaddleX内部自建的简单公式数据集(LaTeX 代码长度 0-64)、中等公式数据集(LaTeX 代码长度 64-256)和复杂公式数据集(LaTeX 代码长度 256+)。
1619

docs/algorithm/formula_recognition/algorithm_rec_ppformulanet_en.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ PP-FormulaNet is a formula recognition model independently developed by Baidu Pa
1010
| UniMERNet | Donut Swin | [UniMERNet.yaml](../../../configs/rec/UniMERNet.yaml) | 0.9187 | 0.9252 | 0.8658 | 0.8228 | 0.7740 | 0.8613 |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_unimernet_train.tar)|
1111
| PP-FormulaNet-S | PPHGNetV2_B4 | [PP-FormulaNet-S.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet-S.yaml) | 0.8694 | 0.8071 | 0.9294 | 0.9112 | 0.8391 | 0.8712 |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_s_train.tar)|
1212
| PP-FormulaNet-L | Vary_VIT_B | [PP-FormulaNet-L.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet-L.yaml) | 0.9055 | 0.9206 | 0.9392 | 0.9273 | 0.9141 | 0.9213 |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_l_train.tar )|
13+
| PP-FormulaNet_plus-S | PPHGNetV2_B4 | [PP-FormulaNet_plus-S.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-S.yaml) | - | - | - | - | - | - |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_s_train.tar )|
14+
| PP-FormulaNet_plus-M | PPHGNetV2_B6 | [PP-FormulaNet_plus-M.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-M.yaml) | - | - | - | - | - | - |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_m_train.tar )|
15+
| PP-FormulaNet_plus-L | Vary_VIT_B | [PP-FormulaNet_plus-L.yaml](../../../configs/rec/PP-FormuaNet/PP-FormulaNet_plus-L.yaml) | - | - | - | - | - | - |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_ppformulanet_plus_l_train.tar )|
1316

1417
Among them, SPE and CPE refer to the simple and complex formula datasets of UniMERNet, respectively. Easy, Middle, and Hard are simple (LaTeX code length 0-64), medium (LaTeX code length 64-256), and complex formula datasets (LaTeX code length 256+) built internally by PaddleX.
1518

0 commit comments

Comments
 (0)