Skip to content

Commit e1b83f6

Browse files
authored
Docs: add user-friendly example for scanned multipage PDFs
Adds a minimal workflow explaining required hOCR usage and common errors.
1 parent 8fe8b5b commit e1b83f6

File tree

1 file changed

+37
-0
lines changed

1 file changed

+37
-0
lines changed

README.rst

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,43 @@ followed by the foreground image, which uses the mask as its alpha layer.
165165

166166
Usage
167167
-----
168+
Example: Re-encoding a Scanned Multipage PDF
169+
--------------------------------------------
170+
171+
Important:
172+
Re-encoding a scanned PDF requires an hOCR file.
173+
If an hOCR file is not provided, the tools may fail with confusing errors.
174+
175+
This section provides a minimal, user-friendly example for a common workflow.
176+
177+
Step 1: Start with a scanned PDF::
178+
179+
scan.pdf
180+
181+
Step 2: Generate an hOCR file
182+
One way to generate an hOCR file is using OCR tools such as ``ocrmypdf``::
183+
184+
ocrmypdf --sidecar scan.hocr scan.pdf scan_searchable.pdf
185+
186+
This command produces:
187+
- ``scan_searchable.pdf`` (PDF with text layer)
188+
- ``scan.hocr`` (hOCR file required for re-encoding)
189+
190+
Step 3: Re-encode the PDF using archive-pdf-tools::
191+
192+
recode_pdf \
193+
--from-pdf scan_searchable.pdf \
194+
--from-hocr scan.hocr \
195+
--out-pdf output.pdf
196+
197+
Common Pitfall
198+
~~~~~~~~~~~~~~
199+
200+
Running ``recode_pdf`` without providing an hOCR file may result in errors such as::
201+
202+
AttributeError: 'NoneType' object has no attribute 'seek'
203+
204+
This indicates that an hOCR file is required for this workflow.
168205

169206
Creating a PDF from a set of images is pretty straightforward::
170207

0 commit comments

Comments
 (0)