Skip to content

Conversation

@emerose
Copy link

@emerose emerose commented Jan 24, 2026

When processing PowerPoint files containing picture shapes that reference external images (rather than embedded images), the python-pptx library raises a ValueError("no embedded image") when accessing the image property.

Previously, this caused the entire document conversion to fail because:

  1. The hasattr(shape, "image") check at line 690 would trigger the property getter, which raises ValueError (hasattr only catches AttributeError, not ValueError)

  2. The exception handler in _handle_pictures() only caught UnidentifiedImageError and OSError, not ValueError

This fix:

  • Removes the unnecessary hasattr check since we already verify the shape type is MSO_SHAPE_TYPE.PICTURE
  • Adds ValueError to the exception handler in _handle_pictures() so that picture shapes with external references are gracefully skipped with a warning instead of crashing the pipeline

When processing PowerPoint files containing picture shapes that reference
external images (rather than embedded images), the python-pptx library
raises a ValueError("no embedded image") when accessing the `image`
property.

Previously, this caused the entire document conversion to fail because:

1. The `hasattr(shape, "image")` check at line 690 would trigger the
   property getter, which raises ValueError (hasattr only catches
   AttributeError, not ValueError)

2. The exception handler in `_handle_pictures()` only caught
   UnidentifiedImageError and OSError, not ValueError

This fix:
- Removes the unnecessary hasattr check since we already verify the
  shape type is MSO_SHAPE_TYPE.PICTURE
- Adds ValueError to the exception handler in `_handle_pictures()` so
  that picture shapes with external references are gracefully skipped
  with a warning instead of crashing the pipeline

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Jan 24, 2026

DCO Check Passed

Thanks @emerose, all your commits are properly signed off. 🎉

@dosubot
Copy link

dosubot bot commented Jan 24, 2026

Related Documentation

Checked 7 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify
Copy link

mergify bot commented Jan 24, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

I, Sam Quigley <quigley@emerose.com>, hereby add my Signed-off-by to this commit: 8b1646f

Signed-off-by: Sam Quigley <quigley@emerose.com>
@codecov
Copy link

codecov bot commented Jan 26, 2026

Codecov Report

❌ Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/backend/mspowerpoint_backend.py 33.33% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant