Skip to content

enhancing recovery_to_doc#14396

Closed
GreatV wants to merge 2 commits intoPaddlePaddle:mainfrom
GreatV:recovery_to_doc
Closed

enhancing recovery_to_doc#14396
GreatV wants to merge 2 commits intoPaddlePaddle:mainfrom
GreatV:recovery_to_doc

Conversation

@GreatV
Copy link
Collaborator

@GreatV GreatV commented Dec 16, 2024

This pull request includes several changes to improve the layout analysis and document generation functionalities in ppstructure/recovery/recovery_to_doc.py, adds new tests, and updates testing configurations. The most important changes include the enhancement of the sorted_layout_boxes function, the addition of new test cases, and updates to the testing configurations.

Enhancements to layout analysis:

  • ppstructure/recovery/recovery_to_doc.py: Improved the sorted_layout_boxes function by adding comments for clarity, refining the criteria for classifying boxes as left or right columns, and ensuring single-column boxes are correctly identified. [1] [2] [3]

New test cases:

  • tests/test_recovery_to_doc.py: Added comprehensive test cases for double-column and single-column document structure analysis and docx generation, including validations for layout detection, column separation, and document content.

Testing configuration updates:

  • pytest.ini: Added configurations to ignore specific deprecation warnings and enabled verbose output for tests.
  • tests/test_paddleocr.py: Removed an unnecessary encoding declaration.

close #14308

@paddle-bot
Copy link

paddle-bot bot commented Dec 16, 2024

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contrib/contributor Contributor-related discussion or task. label Dec 16, 2024
@SWHL
Copy link
Collaborator

SWHL commented Dec 16, 2024

Please provide unit tests for the corresponding case

@GreatV GreatV requested a review from SWHL December 23, 2024 08:32
@GreatV GreatV closed this Feb 5, 2025
@GreatV GreatV deleted the recovery_to_doc branch February 5, 2025 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contrib/contributor Contributor-related discussion or task.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

识别不准确,标题总是分到右边

2 participants