-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: add with_nulls to PrimitiveArray, BooleanArray, and GenericByte… #9141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -93,6 +93,28 @@ impl BooleanArray { | |||||||||||||||||||||||||||||||||||
| Self { values, nulls } | ||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| /// It returns a new array with the same data and a new null buffer. | ||||||||||||||||||||||||||||||||||||
| /// | ||||||||||||||||||||||||||||||||||||
| /// The resulting null buffer is the union of the existing nulls and the provided nulls. | ||||||||||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you @keirsalterego I am not totally sure about this API now that I see it For the main use case of #6528 was to add pre-computed nulls -- so also running it through However, the commentary about arbitrarily setting the null mask (without causing undefined behavior is a good one) I am not sure how to proceed
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how about balancing the safety concern raised by @tustvold providing 2 methods ?
|
||||||||||||||||||||||||||||||||||||
| /// In other words, a slot is valid in the result only if it is valid in BOTH | ||||||||||||||||||||||||||||||||||||
| /// the existing array AND the provided `nulls`. | ||||||||||||||||||||||||||||||||||||
| /// | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
| /// | |
| /// | |
| /// # Example | |
| /// | |
| /// ``` | |
| /// # use arrow_array::{Array, BooleanArray}; | |
| /// # use arrow_buffer::NullBuffer; | |
| /// let array = BooleanArray::from(vec![true, false, true]); | |
| /// let nulls = NullBuffer::from(vec![true, false, true]); | |
| /// | |
| /// let array = array.with_nulls(Some(nulls)); | |
| /// | |
| /// assert_eq!(array.len(), 3); | |
| /// assert!(array.is_valid(0)); | |
| /// assert!(array.is_null(1)); | |
| /// ``` | |
| /// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the let is making line lengths and line counts worse, not better?
| let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref()); | |
| Self { | |
| values: self.values, | |
| nulls: new_nulls, | |
| Self { | |
| values: self.values, | |
| nulls: NullBuffer::union(self.nulls.as_ref(), nulls.as_ref()), |
(more below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @scovich. I'll update the logic to use self.data_type, clean up the code style, and add the missing docs/tests. Pushing changes shortly.
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new with_nulls method lacks test coverage. Consider adding tests to verify behavior such as:
- Merging nulls with an array that has no existing nulls
- Merging nulls with an array that already has nulls
- Passing None as the nulls parameter
- Verifying the panic behavior when null buffer length mismatches
Similar methods in this file have comprehensive test coverage, and tests would help ensure this method works correctly and prevent regressions.
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -154,6 +154,30 @@ impl<T: ByteArrayType> GenericByteArray<T> { | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| }) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
| /// It returns a new array with the same data and a new null buffer. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
| /// | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| /// The resulting null buffer is the union of the existing nulls and the provided nulls. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| /// In other words, a slot is valid in the result only if it is valid in BOTH | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| /// the existing array AND the provided `nulls`. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| /// | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
| /// | |
| /// | |
| /// # Examples | |
| /// | |
| /// ``` | |
| /// use arrow_array::StringArray; | |
| /// use arrow_buffer::NullBuffer; | |
| /// | |
| /// // Create an array with an existing null in the second position | |
| /// let array = StringArray::from(vec![Some("a"), None, Some("c")]); | |
| /// | |
| /// // Create an additional null buffer to combine with the existing one | |
| /// // Here, the third position is marked as null | |
| /// let nulls = NullBuffer::from(vec![true, true, false]); | |
| /// | |
| /// let result = array.with_nulls(Some(nulls)); | |
| /// | |
| /// assert_eq!(result.len(), 3); | |
| /// // Still valid, as it is valid in both null buffers | |
| /// assert!(result.is_valid(0)); | |
| /// // Remains null, as it is null in the original array | |
| /// assert!(result.is_null(1)); | |
| /// // Now null, as it is null in the provided null buffer | |
| /// assert!(result.is_null(2)); | |
| /// ``` | |
| /// |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with PrimitiveArray::with_nulls and to future-proof the implementation, this should use self.data_type instead of T::DATA_TYPE. While they are currently always the same for GenericByteArray, PrimitiveArray supports overriding the data type via with_data_type, and using self.data_type would maintain the pattern of preserving the original instance's data type field.
| data_type: T::DATA_TYPE, | |
| data_type: self.data_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Is this is a valid observation/suggestion? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(update: on a different PR I reviewed yesterday, it was pointed out that reusing the (owned) self.data_type instead of passing T::DATA_TYPE avoids the need to drop it)
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new with_nulls method lacks test coverage. Consider adding tests to verify behavior such as:
- Merging nulls with an array that has no existing nulls
- Merging nulls with an array that already has nulls
- Passing None as the nulls parameter
- Verifying the panic behavior when null buffer length mismatches
Similar methods in this file have comprehensive test coverage, and tests would help ensure this method works correctly and prevent regressions.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -672,6 +672,42 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> { | |
| }) | ||
| } | ||
|
|
||
| /// Returns a new array with the same data and a new null buffer. | ||
| /// | ||
| /// The resulting null buffer is the union of the existing nulls and the provided nulls. | ||
| /// In other words, a slot is valid in the result only if it is valid in BOTH | ||
| /// the existing array AND the provided `nulls`. | ||
| /// | ||
| /// # Panics | ||
| /// | ||
| /// Panics if `nulls` has a different length than the array. | ||
| /// | ||
| /// # Example | ||
| /// | ||
| /// ``` | ||
| /// # use arrow_array::{Int32Array, Array}; | ||
| /// # use arrow_buffer::NullBuffer; | ||
| /// let array = Int32Array::from(vec![Some(1), Some(2), Some(3)]); | ||
| /// // Mask out the second element | ||
| /// let mask = NullBuffer::from(vec![true, false, true]); | ||
| /// let masked = array.with_nulls(Some(mask)); | ||
| /// assert_eq!(masked.values(), &[1, 2, 3]); // Values unchanged | ||
| /// assert!(masked.is_null(1)); // Second element is now null | ||
| /// ``` | ||
|
|
||
| pub fn with_nulls(self, nulls: Option<NullBuffer>) -> Self { | ||
| if let Some(n) = &nulls { | ||
| assert_eq!(n.len(), self.len(), "Null buffer length mismatch"); | ||
| } | ||
|
|
||
| let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref()); | ||
|
|
||
| Self { | ||
| data_type: self.data_type, | ||
| values: self.values, | ||
| nulls: new_nulls, | ||
| } | ||
| } | ||
|
Comment on lines
+698
to
+710
|
||
| /// Create a new [`Scalar`] from `value` | ||
| pub fn new_scalar(value: T::Native) -> Scalar<Self> { | ||
| Scalar::new(Self { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation starts with "It returns" which is grammatically awkward. The sentence should start with "Returns" to match the style of the PrimitiveArray implementation and follow Rust documentation conventions.