Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions arrow-array/src/array/boolean_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,28 @@ impl BooleanArray {
Self { values, nulls }
}

/// It returns a new array with the same data and a new null buffer.
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation starts with "It returns" which is grammatically awkward. The sentence should start with "Returns" to match the style of the PrimitiveArray implementation and follow Rust documentation conventions.

Suggested change
/// It returns a new array with the same data and a new null buffer.
/// Returns a new array with the same data and a new null buffer.

Copilot uses AI. Check for mistakes.
///
/// The resulting null buffer is the union of the existing nulls and the provided nulls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @keirsalterego

I am not totally sure about this API now that I see it

For the main use case of #6528 was to add pre-computed nulls -- so also running it through union seems to circumvent the intention

However, the commentary about arbitrarily setting the null mask (without causing undefined behavior is a good one)

I am not sure how to proceed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about balancing the safety concern raised by @tustvold providing 2 methods ?

  1. with_nulls that would be safe union version and,
  2. with_nulls_unchecked which would be without a union, we can document it as a unsafe or simply add a heavy warning?

/// In other words, a slot is valid in the result only if it is valid in BOTH
/// the existing array AND the provided `nulls`.
///
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for BooleanArray::with_nulls lacks a usage example, unlike PrimitiveArray::with_nulls which includes a comprehensive example. Adding an example would improve consistency across the API and help users understand how to use this method effectively.

Suggested change
///
///
/// # Example
///
/// ```
/// # use arrow_array::{Array, BooleanArray};
/// # use arrow_buffer::NullBuffer;
/// let array = BooleanArray::from(vec![true, false, true]);
/// let nulls = NullBuffer::from(vec![true, false, true]);
///
/// let array = array.with_nulls(Some(nulls));
///
/// assert_eq!(array.len(), 3);
/// assert!(array.is_valid(0));
/// assert!(array.is_null(1));
/// ```
///

Copilot uses AI. Check for mistakes.
/// # Panics
///
/// Panics if `nulls` has a different length than the array.
pub fn with_nulls(self, nulls: Option<NullBuffer>) -> Self {
if let Some(n) = &nulls {
assert_eq!(n.len(), self.len(), "Null buffer length mismatch");
}

let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref());

Self {
values: self.values,
nulls: new_nulls,
Comment on lines +110 to +114
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the let is making line lengths and line counts worse, not better?

Suggested change
let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref());
Self {
values: self.values,
nulls: new_nulls,
Self {
values: self.values,
nulls: NullBuffer::union(self.nulls.as_ref(), nulls.as_ref()),

(more below)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @scovich. I'll update the logic to use self.data_type, clean up the code style, and add the missing docs/tests. Pushing changes shortly.

}
}
Comment on lines +105 to +116
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new with_nulls method lacks test coverage. Consider adding tests to verify behavior such as:

  • Merging nulls with an array that has no existing nulls
  • Merging nulls with an array that already has nulls
  • Passing None as the nulls parameter
  • Verifying the panic behavior when null buffer length mismatches

Similar methods in this file have comprehensive test coverage, and tests would help ensure this method works correctly and prevent regressions.

Copilot uses AI. Check for mistakes.

/// Create a new [`BooleanArray`] with length `len` consisting only of nulls
pub fn new_null(len: usize) -> Self {
Self {
Expand Down
24 changes: 24 additions & 0 deletions arrow-array/src/array/byte_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,30 @@ impl<T: ByteArrayType> GenericByteArray<T> {
})
}

/// It returns a new array with the same data and a new null buffer.
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation starts with "It returns" which is grammatically awkward. The sentence should start with "Returns" to match the style of the PrimitiveArray implementation and follow Rust documentation conventions.

Copilot uses AI. Check for mistakes.
///
/// The resulting null buffer is the union of the existing nulls and the provided nulls.
/// In other words, a slot is valid in the result only if it is valid in BOTH
/// the existing array AND the provided `nulls`.
///
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for GenericByteArray::with_nulls lacks a usage example, unlike PrimitiveArray::with_nulls which includes a comprehensive example. Adding an example would improve consistency across the API and help users understand how to use this method effectively.

Suggested change
///
///
/// # Examples
///
/// ```
/// use arrow_array::StringArray;
/// use arrow_buffer::NullBuffer;
///
/// // Create an array with an existing null in the second position
/// let array = StringArray::from(vec![Some("a"), None, Some("c")]);
///
/// // Create an additional null buffer to combine with the existing one
/// // Here, the third position is marked as null
/// let nulls = NullBuffer::from(vec![true, true, false]);
///
/// let result = array.with_nulls(Some(nulls));
///
/// assert_eq!(result.len(), 3);
/// // Still valid, as it is valid in both null buffers
/// assert!(result.is_valid(0));
/// // Remains null, as it is null in the original array
/// assert!(result.is_null(1));
/// // Now null, as it is null in the provided null buffer
/// assert!(result.is_null(2));
/// ```
///

Copilot uses AI. Check for mistakes.
/// # Panics
///
/// Panics if `nulls` has a different length than the array.
pub fn with_nulls(self, nulls: Option<NullBuffer>) -> Self {
if let Some(n) = &nulls {
assert_eq!(n.len(), self.len(), "Null buffer length mismatch");
}

let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref());

Self {
data_type: T::DATA_TYPE,
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with PrimitiveArray::with_nulls and to future-proof the implementation, this should use self.data_type instead of T::DATA_TYPE. While they are currently always the same for GenericByteArray, PrimitiveArray supports overriding the data type via with_data_type, and using self.data_type would maintain the pattern of preserving the original instance's data type field.

Suggested change
data_type: T::DATA_TYPE,
data_type: self.data_type,

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Is this is a valid observation/suggestion? What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(update: on a different PR I reviewed yesterday, it was pointed out that reusing the (owned) self.data_type instead of passing T::DATA_TYPE avoids the need to drop it)

value_offsets: self.value_offsets,
value_data: self.value_data,
nulls: new_nulls,
}
}
Comment on lines +166 to +179
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new with_nulls method lacks test coverage. Consider adding tests to verify behavior such as:

  • Merging nulls with an array that has no existing nulls
  • Merging nulls with an array that already has nulls
  • Passing None as the nulls parameter
  • Verifying the panic behavior when null buffer length mismatches

Similar methods in this file have comprehensive test coverage, and tests would help ensure this method works correctly and prevent regressions.

Copilot uses AI. Check for mistakes.

/// Create a new [`GenericByteArray`] from the provided parts, without validation
///
/// # Safety
Expand Down
36 changes: 36 additions & 0 deletions arrow-array/src/array/primitive_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -672,6 +672,42 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
})
}

/// Returns a new array with the same data and a new null buffer.
///
/// The resulting null buffer is the union of the existing nulls and the provided nulls.
/// In other words, a slot is valid in the result only if it is valid in BOTH
/// the existing array AND the provided `nulls`.
///
/// # Panics
///
/// Panics if `nulls` has a different length than the array.
///
/// # Example
///
/// ```
/// # use arrow_array::{Int32Array, Array};
/// # use arrow_buffer::NullBuffer;
/// let array = Int32Array::from(vec![Some(1), Some(2), Some(3)]);
/// // Mask out the second element
/// let mask = NullBuffer::from(vec![true, false, true]);
/// let masked = array.with_nulls(Some(mask));
/// assert_eq!(masked.values(), &[1, 2, 3]); // Values unchanged
/// assert!(masked.is_null(1)); // Second element is now null
/// ```

pub fn with_nulls(self, nulls: Option<NullBuffer>) -> Self {
if let Some(n) = &nulls {
assert_eq!(n.len(), self.len(), "Null buffer length mismatch");
}

let new_nulls = NullBuffer::union(self.nulls.as_ref(), nulls.as_ref());

Self {
data_type: self.data_type,
values: self.values,
nulls: new_nulls,
}
}
Comment on lines +698 to +710
Copy link

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new with_nulls method lacks test coverage. Consider adding tests to verify behavior such as:

  • Merging nulls with an array that has no existing nulls
  • Merging nulls with an array that already has nulls
  • Passing None as the nulls parameter
  • Verifying the panic behavior when null buffer length mismatches

Similar methods in this file have comprehensive test coverage, and tests would help ensure this method works correctly and prevent regressions.

Copilot uses AI. Check for mistakes.
/// Create a new [`Scalar`] from `value`
pub fn new_scalar(value: T::Native) -> Scalar<Self> {
Scalar::new(Self {
Expand Down