Hope Documents is a Django-based application for automated document scanning and text analysis. Its primary purpose is to validate document number consistency, ensuring that identifiers (such as IDs, registration numbers, or codes) detected in document images match the expected reference values.
The system processes uploaded images, extracts textual information using OCR, and searches for the provided number or ID within the document’s contents. This enables reliable, automated verification of official documents such as ID cards, invoices, and certificates.
- OCR Processing: Extracts text from various image formats and PDF files.
- Command-Line Interface: Easy-to-use CLI for batch processing of documents.
- Configurable OCR Engine: Options to fine-tune Tesseract and OpenCV parameters for better results.
- Django Integration: Seamlessly integrates with Django projects.
To install Hope Documents, you can use pip:
pip install hope-documentsHope Documents provides a command-line interface to extract text from documents.
extract [OPTIONS] FILEPATHS...FILEPATHS...: One or more paths to files or directories to process.
-a,--auto: Automatic mode.-t,--threshold: CV2 threshold value (0-255). Default is 128.-p,--psm: Tesseract Page Segmentation Mode (0-13). Default is 11.-o,--oem: Tesseract OCR Engine Mode (0-3). Default is 3.-n,--number-only: Only extract numbers from the documents.--debug: Enable debug mode.
extract tests/images/ita/ci1.pngThis will output the extracted text from the ci1.png image.
To set up the development environment, you need to have tox installed.
To run the test suite, use the following command:
tox -e d52-py313To check for code style and linting errors, run:
tox lintTo build the documentation, use the following command:
tox docsThe documentation will be generated in the site directory.
This project is licensed under the MIT License - see the LICENSE file for details.