-
Notifications
You must be signed in to change notification settings - Fork 9.7k
Fix: Add langchain-core dependency to resolve deprecated import issue #17341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix: Add langchain-core dependency to resolve deprecated import issue #17341
Conversation
Added support for Latin characters with diacritics (é, è, à, ç, etc.) and French contractions (n'êtes) in word grouping logic of BaseRecLabelDecode.get_word_info(). This fix ensures that French words are no longer split at accented characters during OCR text recognition.
- Moved test_french_accents.py to tests/ directory following project structure - Removed invalid 'FRENCH' prefix from Unicode name check - Unicode standard only uses 'LATIN' prefix for all Latin-based characters - All French accented characters (é, è, à, ç, etc.) are correctly matched - Verified with comprehensive character set including uppercase/lowercase variants
This commit addresses the ModuleNotFoundError for 'langchain.docstore' that occurs in PaddleX's retriever module. The langchain library deprecated the langchain.docstore.document import path in favor of langchain_core.documents. Changes: - Add langchain-core>=0.1.0 to project dependencies in pyproject.toml This ensures compatibility with the current PaddleX dependency while the langchain ecosystem transitions to the new import structure. The fix resolves CI/CD test failures without introducing breaking changes to the PaddleOCR API.
|
Thanks for your contribution! |
|
@liuhongen1234567 Hi, Fix made as required in the other PR "Prevent auto-splitting of French accented words in text recognition" |
|
Since PaddleOCR does not directly depend on |
Sure thing, I'll submit the PR there 😄 |
Fix: ModuleNotFoundError for langchain.docstore
Description
This PR fixes the CI/CD test failure caused by a deprecated langchain import path in the PaddleX dependency.
Problem
The test suite was failing with:
This error occurs in PaddleX's retriever module at
paddlex/inference/pipelines/components/retriever/base.py:25, which uses the deprecated import:Root Cause
The langchain library deprecated the
langchain.docstore.documentimport path and moved it tolangchain_core.documents. The PaddleX dependency still uses the old import path.Reference: https://reference.langchain.com/python/integrations/langchain_google_community/?h=document#langchain_google_community.DocumentAIWarehouseRetriever
Solution
Added
langchain-core>=0.1.0to the project dependencies inpyproject.toml. This provides the new import path that the updated langchain ecosystem expects while maintaining backward compatibility with PaddleX until it updates its import statements.Changes
pyproject.toml:langchain-core>=0.1.0to dependencies listTesting
The fix ensures that:
Impact
Related Issues
Additional Notes
This is a compatibility fix. The long-term solution would be for PaddleX to update its import statements to use the new langchain_core path. Once PaddleX releases an updated version, this dependency addition will remain harmless as it's part of the langchain ecosystem.