[SPARK-55405][PYTHON][TESTS] Change tests for pa.Array.cast to use golden file#54188
Open
Yicong-Huang wants to merge 3 commits intoapache:masterfrom
Open
[SPARK-55405][PYTHON][TESTS] Change tests for pa.Array.cast to use golden file#54188Yicong-Huang wants to merge 3 commits intoapache:masterfrom
Yicong-Huang wants to merge 3 commits intoapache:masterfrom
Conversation
…nd matrix comparison - repr_type(): normalise PyArrow float names (halffloat→float16, etc.) - setup_timezone(): use hasattr checks so mixin works without Spark - compare_or_generate_golden_matrix(): generic generate-or-compare method for matrix-style golden file tests
- Replace custom _PyArrowCastTestBase with direct GoldenFileTestMixin - Use repr_type() to derive target column names from pa.DataType objects, eliminating parallel name→type mappings - Use compare_or_generate_golden_matrix() from mixin instead of custom _compare_or_generate_golden method - Use clean_result() from mixin in _try_cast - Split source types into isolated test cases (type:case format) - Regenerate golden files (CSV + MD) with repr_type-derived names
JIRA Issue Information=== Sub-task SPARK-55405 === This comment was automatically generated by GitHub Actions |
Move top-level 'import pandas' into the methods that actually use it, so importing GoldenFileTestMixin no longer requires pandas at module load time.
zhengruifeng
approved these changes
Feb 7, 2026
Contributor
zhengruifeng
left a comment
There was a problem hiding this comment.
LGTM pending CI, thanks for the change
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Rewrite
test_pyarrow_array_cast.py(~9000 lines) into ~500 lines usingGoldenFileTestMixinwith golden CSV/MD files. Each source type is split into isolated test cases (e.g.int8:standard,int8:negative,int8:max_min) to prevent edge cases from masking each other.Also enhanced
GoldenFileTestMixin:repr_type(): Arrow DataType support (normalises float names)setup_timezone(): works without Spark sessioncompare_or_generate_golden_matrix(): new reusable matrix comparison methodWhy are the changes needed?
To simplify testing.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
python -m pytest python/pyspark/tests/upstream/pyarrow/test_pyarrow_array_cast.pypasses in both generation and comparison modes.Was this patch authored or co-authored using generative AI tooling?
No.