Skip to content

Conversation

@klion26
Copy link
Member

@klion26 klion26 commented Jan 14, 2026

Which issue does this PR close?

What changes are included in this PR?

This PR adds lossy in CastOptions, which aims to control precision loss in casts.

Behavior matrix:

safe lossy Behavior
true false Default. Invalid casts return Null, Lossy casts(e.g., float to int) return NULL.
true true Invalid casts return Null, Lossy casts are allowed (truncation occurs).
false false Invalid casts return an error. Lossy casts return an error.
false true Invalid casts return an error. Lossy casts are allowed (truncation occurs).

The generated docs

WeChatWorkScreenshot_511c6389-827c-4a4c-a308-551f285785e4

Are these changes tested?

No tests, this added a new flag.

Are there any user-facing changes?

Yes, changed the CastOptions

@github-actions github-actions bot added arrow Changes to the arrow crate parquet-variant parquet-variant* crates labels Jan 14, 2026
@klion26 klion26 marked this pull request as draft January 14, 2026 09:06
@klion26 klion26 force-pushed the add-lossy-in-cast-option branch 3 times, most recently from d94d8ce to d25024b Compare January 14, 2026 10:47
Behavior matrix:

- safe=true, lossy=false: Invalid casts return NULL, lossy casts return NULL
- safe=true, lossy=true: Invalid casts return NULL, lossy casts allowed
- safe=false, lossy=false: Invalid casts error, lossy casts error
- safe=false, lossy=true: Invalid casts error, lossy casts allowed
@klion26 klion26 force-pushed the add-lossy-in-cast-option branch from d25024b to e9ccb67 Compare January 14, 2026 10:57
@klion26 klion26 marked this pull request as ready for review January 14, 2026 11:32
@klion26
Copy link
Member Author

klion26 commented Jan 14, 2026

@alamb @scovich Please help review this when you're free, thanks.

This wants to add lossy in CastOptions so that we can support lossy cast in variant-get, as this adds new fields in pub struct, so we can only merge this in a major version.

It seems I can't add the api-change tag for this PR. @alamb, please help to add the tag. Thank you.

@tustvold
Copy link
Contributor

tustvold commented Jan 14, 2026

I'm possibly missing some context here, but FWIW we already support lossy casts for decimals. I'm not sure about float-> int casting but I thought it rounded to zero, and only returnee null on overflow.

@scovich
Copy link
Contributor

scovich commented Jan 14, 2026

I'm possibly missing some context here, but FWIW we already support lossy casts for decimals.

As in, losing scale in e.g. Decimal(6, 4) -> Decimal(4, 2) Just Works (rounding to zero, so e.g. 99.9999 becomes 99.99)? What about losing precision, e.g. Decimal(6, 2) -> Decimal(4, 2)? Presumably values larger than 99 would have to become error/null?

@tustvold
Copy link
Contributor

Yes, truncation (i.e. precision loss) is permitted, overflow will error (or NULL).

@klion26
Copy link
Member Author

klion26 commented Jan 16, 2026

Sorry for the late reply, I missed this notification.

Thanks for the reply。 This is raised in variant_get related code, hoping to support conversion like decimal/float to int, and TimeUnit::Nano -> TimeUnit::Second, etc.

I found the Decimal-related tests(test_decimal_to_decimal_decrease_scale_and_precision_unchecked/test_decimal_to_decimal_increase_scale_and_precision_unchecked) in `arrow-cast/src/cast/mod.rs(Copied some test case below)

test behavior for decimal cast

input -> casted_result

  • 99999(Decimal(5, 0)) -> 100(Decimal(3, -3))
  • -99999(Decimal(5, 0)) -> -1(Decimal(1, -5))
  • 123456789(Decimal(10, 2)) -> 12346(5, -2)
  • -9876543210(Decimal(10, 4)) -> -987654(Decimal(7, 0))
  • 9999999(Decimal(7, 4)) -> Err(Invalid argument error: 1000.000 is too large to store in a Decimal128 of precision 6. Max is 999.999)
  • 99999(Decimal(5,0)) -> 9999900000(Decimal(10,5))
  • -99999(Decimal(5, 0)) -> -9999900000(Decimal(10, 5))
  • 99999(Decimal(5, 2)) -> 99999000(10, 5)
  • -99999(Decimal(5, -2)) -> -9999900000(Decimal(10, 3))
  • -12345(Decimal(5, 3)) -> Decimal(6, 5) Err("Invalid argument error: -12.34500 is too small to store in a Decimal128 of precision 6. Min is -9.99999"

Looking at cast_with_option, it seems it has implemented many conversions, so maybe we can directly use or improve cast_with_option? I'll try to figure it out and come back.

@klion26
Copy link
Member Author

klion26 commented Jan 19, 2026

Currently, cast/cast_with_option supports different types of transformations, and lossy transformations are performed regardless of whether safe is true or false (e.g., 3.1f64 -> 3i32, 864000003005(TimeUnit::Milli) -> 864000003(TimeUnit::Second), etc.). Considering that this logic has been in place since its introduction in 2019, modifying it would result in a significant behavior change. Perhaps we should leave it as is and reuse the cast logic in variant_get.

@alamb
Copy link
Contributor

alamb commented Jan 19, 2026

Considering that this logic has been in place since its introduction in 2019, modifying it would result in a significant behavior change. Perhaps we should leave it as is and reuse the cast logic in variant_get.

yes, please let's not change how the cast kernels work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add lossy flag in CastOptions

4 participants