Skip to content

[SPARK-55405][PYTHON][TESTS] Change tests for pa.Array.cast to use golden file#54188

Open
Yicong-Huang wants to merge 3 commits intoapache:masterfrom
Yicong-Huang:SPARK-55405/refactor/pyarrow-cast-golden-file
Open

[SPARK-55405][PYTHON][TESTS] Change tests for pa.Array.cast to use golden file#54188
Yicong-Huang wants to merge 3 commits intoapache:masterfrom
Yicong-Huang:SPARK-55405/refactor/pyarrow-cast-golden-file

Conversation

@Yicong-Huang
Copy link
Contributor

What changes were proposed in this pull request?

Rewrite test_pyarrow_array_cast.py (~9000 lines) into ~500 lines using GoldenFileTestMixin with golden CSV/MD files. Each source type is split into isolated test cases (e.g. int8:standard, int8:negative, int8:max_min) to prevent edge cases from masking each other.

Also enhanced GoldenFileTestMixin:

  • repr_type(): Arrow DataType support (normalises float names)
  • setup_timezone(): works without Spark session
  • compare_or_generate_golden_matrix(): new reusable matrix comparison method

Why are the changes needed?

To simplify testing.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

python -m pytest python/pyspark/tests/upstream/pyarrow/test_pyarrow_array_cast.py passes in both generation and comparison modes.

Was this patch authored or co-authored using generative AI tooling?

No.

…nd matrix comparison

- repr_type(): normalise PyArrow float names (halffloat→float16, etc.)
- setup_timezone(): use hasattr checks so mixin works without Spark
- compare_or_generate_golden_matrix(): generic generate-or-compare method
  for matrix-style golden file tests
- Replace custom _PyArrowCastTestBase with direct GoldenFileTestMixin
- Use repr_type() to derive target column names from pa.DataType objects,
  eliminating parallel name→type mappings
- Use compare_or_generate_golden_matrix() from mixin instead of custom
  _compare_or_generate_golden method
- Use clean_result() from mixin in _try_cast
- Split source types into isolated test cases (type:case format)
- Regenerate golden files (CSV + MD) with repr_type-derived names
@github-actions
Copy link

github-actions bot commented Feb 6, 2026

JIRA Issue Information

=== Sub-task SPARK-55405 ===
Summary: Change tests for pa.Array.cast to use golden file
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

Move top-level 'import pandas' into the methods that actually use it,
so importing GoldenFileTestMixin no longer requires pandas at module
load time.
Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI, thanks for the change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants