Commit c10fe77
Add tests and fixes for schema resolution bug (#9237)
# Which issue does this PR close?
- Closes #9231.
# Rationale for this change
Avro schema resolution allows a reader schema to represent “nullable”
values using a two-branch union (`["null", T]` or `[T, "null"]`) while
still reading data written with the non-union schema `T` (i.e. without
union discriminants in the encoded data).
In `arrow-avro`, resolving a non-union writer type against a reader
union (notably for array/list item schemas like `items: ["null",
"int"]`) could incorrectly treat the encoded stream as a union and
attempt to decode a union discriminant. This would misalign decoding and
could surface as `ParseError("bad varint")` for certain files (see
#9231).
# What changes are included in this PR?
- Fix schema resolution when the *writer* schema is non-union and the
*reader* schema is a union:
- Special-case two-branch unions containing `null` to be treated as
“nullable” (capturing whether `null` is first or second), and resolve
against the non-null branch.
- Improve matching for general reader unions by attempting to resolve
against each union variant, preferring a direct match, and constructing
the appropriate union resolution mapping for the selected branch.
- Ensure promotions are represented at the union-resolution level
(avoiding nested promotion resolution on the selected union child).
- Add regression coverage for the bug and the fixed behavior:
- `test_resolve_array_writer_nonunion_items_reader_nullable_items`
(schema resolution / codec)
- `test_array_decoding_writer_nonunion_items_reader_nullable_items`
(record decoding; ensures correct byte consumption and decoded values)
- `test_bad_varint_bug_nullable_array_items` (end-to-end reader
regression using a small Avro fixture)
- Add a small compressed Avro fixture under
`arrow-avro/test/data/bad-varint-bug.avro.gz` used by the regression
test.
# Are these changes tested?
Yes. This PR adds targeted unit/integration tests that reproduce the
prior failure mode and validate correct schema resolution and decoding
for nullable-union array items.
# Are there any user-facing changes?
Yes (bug fix): reading Avro files with arrays whose element type is
represented as a nullable union in the reader schema (e.g. `items:
["null", "int"]`) now succeeds instead of failing with `ParseError("bad
varint")`. No public API changes are intended.
---------
Co-authored-by: Mikhail Zabaluev <[email protected]>1 parent 096751f commit c10fe77
File tree
4 files changed
+191
-17
lines changed- arrow-avro
- src
- reader
- test/data
4 files changed
+191
-17
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1529 | 1529 | | |
1530 | 1530 | | |
1531 | 1531 | | |
1532 | | - | |
1533 | | - | |
1534 | | - | |
1535 | | - | |
1536 | | - | |
1537 | | - | |
1538 | | - | |
1539 | | - | |
1540 | | - | |
1541 | | - | |
1542 | | - | |
1543 | | - | |
1544 | | - | |
1545 | | - | |
1546 | | - | |
1547 | | - | |
1548 | | - | |
| 1532 | + | |
| 1533 | + | |
| 1534 | + | |
| 1535 | + | |
| 1536 | + | |
| 1537 | + | |
| 1538 | + | |
| 1539 | + | |
| 1540 | + | |
| 1541 | + | |
| 1542 | + | |
| 1543 | + | |
| 1544 | + | |
| 1545 | + | |
| 1546 | + | |
| 1547 | + | |
| 1548 | + | |
| 1549 | + | |
| 1550 | + | |
| 1551 | + | |
| 1552 | + | |
| 1553 | + | |
| 1554 | + | |
| 1555 | + | |
| 1556 | + | |
| 1557 | + | |
| 1558 | + | |
| 1559 | + | |
| 1560 | + | |
| 1561 | + | |
| 1562 | + | |
| 1563 | + | |
| 1564 | + | |
| 1565 | + | |
| 1566 | + | |
| 1567 | + | |
| 1568 | + | |
| 1569 | + | |
| 1570 | + | |
| 1571 | + | |
| 1572 | + | |
| 1573 | + | |
| 1574 | + | |
| 1575 | + | |
| 1576 | + | |
| 1577 | + | |
| 1578 | + | |
| 1579 | + | |
| 1580 | + | |
| 1581 | + | |
| 1582 | + | |
| 1583 | + | |
| 1584 | + | |
| 1585 | + | |
| 1586 | + | |
| 1587 | + | |
| 1588 | + | |
| 1589 | + | |
| 1590 | + | |
| 1591 | + | |
| 1592 | + | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
1549 | 1597 | | |
1550 | 1598 | | |
1551 | 1599 | | |
| |||
2926 | 2974 | | |
2927 | 2975 | | |
2928 | 2976 | | |
| 2977 | + | |
| 2978 | + | |
| 2979 | + | |
| 2980 | + | |
| 2981 | + | |
| 2982 | + | |
| 2983 | + | |
| 2984 | + | |
| 2985 | + | |
| 2986 | + | |
| 2987 | + | |
| 2988 | + | |
| 2989 | + | |
| 2990 | + | |
| 2991 | + | |
| 2992 | + | |
| 2993 | + | |
| 2994 | + | |
| 2995 | + | |
| 2996 | + | |
| 2997 | + | |
| 2998 | + | |
| 2999 | + | |
| 3000 | + | |
| 3001 | + | |
| 3002 | + | |
| 3003 | + | |
| 3004 | + | |
| 3005 | + | |
| 3006 | + | |
| 3007 | + | |
| 3008 | + | |
| 3009 | + | |
| 3010 | + | |
| 3011 | + | |
| 3012 | + | |
2929 | 3013 | | |
2930 | 3014 | | |
2931 | 3015 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9081 | 9081 | | |
9082 | 9082 | | |
9083 | 9083 | | |
| 9084 | + | |
| 9085 | + | |
| 9086 | + | |
| 9087 | + | |
| 9088 | + | |
| 9089 | + | |
| 9090 | + | |
| 9091 | + | |
| 9092 | + | |
| 9093 | + | |
| 9094 | + | |
| 9095 | + | |
| 9096 | + | |
| 9097 | + | |
| 9098 | + | |
| 9099 | + | |
| 9100 | + | |
| 9101 | + | |
| 9102 | + | |
| 9103 | + | |
| 9104 | + | |
| 9105 | + | |
| 9106 | + | |
| 9107 | + | |
| 9108 | + | |
| 9109 | + | |
| 9110 | + | |
| 9111 | + | |
| 9112 | + | |
| 9113 | + | |
| 9114 | + | |
| 9115 | + | |
| 9116 | + | |
| 9117 | + | |
| 9118 | + | |
| 9119 | + | |
| 9120 | + | |
| 9121 | + | |
| 9122 | + | |
| 9123 | + | |
| 9124 | + | |
| 9125 | + | |
9084 | 9126 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2897 | 2897 | | |
2898 | 2898 | | |
2899 | 2899 | | |
| 2900 | + | |
| 2901 | + | |
| 2902 | + | |
| 2903 | + | |
| 2904 | + | |
| 2905 | + | |
| 2906 | + | |
| 2907 | + | |
| 2908 | + | |
| 2909 | + | |
| 2910 | + | |
| 2911 | + | |
| 2912 | + | |
| 2913 | + | |
| 2914 | + | |
| 2915 | + | |
| 2916 | + | |
| 2917 | + | |
| 2918 | + | |
| 2919 | + | |
| 2920 | + | |
| 2921 | + | |
| 2922 | + | |
| 2923 | + | |
| 2924 | + | |
| 2925 | + | |
| 2926 | + | |
| 2927 | + | |
| 2928 | + | |
| 2929 | + | |
| 2930 | + | |
| 2931 | + | |
| 2932 | + | |
| 2933 | + | |
| 2934 | + | |
| 2935 | + | |
| 2936 | + | |
| 2937 | + | |
| 2938 | + | |
| 2939 | + | |
| 2940 | + | |
| 2941 | + | |
| 2942 | + | |
| 2943 | + | |
| 2944 | + | |
| 2945 | + | |
| 2946 | + | |
| 2947 | + | |
2900 | 2948 | | |
2901 | 2949 | | |
2902 | 2950 | | |
| |||
Binary file not shown.
0 commit comments