Skip to content

Conversation

@vigneshsiva11
Copy link

Closes #9154

Rationale for this change

The async parquet reader example under parquet/examples duplicated functionality
that is already exposed through the public ArrowReaderBuilder::try_new API.

Moving this example into the API documentation improves discoverability for users
while avoiding the need to maintain duplicate standalone examples.

What changes are included in this PR?

  • Added an async parquet reading example to the rustdoc comments for
    ArrowReaderBuilder::try_new
  • Removed the redundant async_read_parquet.rs example file from
    parquet/examples

Are these changes tested?

Yes. Existing tests were run locally using:

cargo test -p parquet

Copilot AI review requested due to automatic review settings January 14, 2026 07:40
@github-actions github-actions bot added the parquet Changes to the parquet crate label Jan 14, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to move an async parquet reading example from a standalone example file into the API documentation for ArrowReaderBuilder::try_new. However, there's a fundamental mismatch: the deleted file demonstrated async parquet reading using ParquetRecordBatchStreamBuilder and tokio::fs::File, while the newly added documentation example uses synchronous APIs (std::fs::File and ParquetRecordBatchReaderBuilder).

Changes:

  • Removed the async_read_parquet.rs example file that demonstrated async reading with ParquetRecordBatchStreamBuilder
  • Added a synchronous example to ParquetRecordBatchReaderBuilder::try_new documentation that is incorrectly wrapped in an async function

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
parquet/examples/async_read_parquet.rs Removed standalone async example demonstrating ParquetRecordBatchStreamBuilder with async file I/O
parquet/src/arrow/arrow_reader/mod.rs Added synchronous example incorrectly labeled as async to try_new documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1005 to 1017

/// # Example
/// ```rust
/// use parquet::arrow::arrow_reader::ArrowReaderBuilder;
/// # async fn example() -> parquet::errors::Result<()> {
/// let file = std::fs::File::open("data.parquet")?;
/// let builder = ArrowReaderBuilder::try_new(file)?;
/// let mut reader = builder.build()?;
/// # Ok(())
/// # }
/// ```
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank lines before and after the example section. Standard Rust documentation style typically uses a single blank line between doc comment sections.

Copilot uses AI. Check for mistakes.
Comment on lines 1007 to 1016
/// # Example
/// ```rust
/// use parquet::arrow::arrow_reader::ArrowReaderBuilder;
/// # async fn example() -> parquet::errors::Result<()> {
/// let file = std::fs::File::open("data.parquet")?;
/// let builder = ArrowReaderBuilder::try_new(file)?;
/// let mut reader = builder.build()?;
/// # Ok(())
/// # }
/// ```
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is marked as async (wrapped in async fn example()) but demonstrates synchronous file reading using std::fs::File. The deleted async_read_parquet.rs example actually demonstrated async reading using tokio::fs::File and ParquetRecordBatchStreamBuilder.

If the intent is to provide an async example for ArrowReaderBuilder::try_new, it should use the async API (ParquetRecordBatchStreamBuilder) instead of the synchronous API (ParquetRecordBatchReaderBuilder). Alternatively, if this is meant to be a synchronous example, the async fn example() wrapper should be removed and replaced with a regular function.

Copilot uses AI. Check for mistakes.
/// # async fn example() -> parquet::errors::Result<()> {
/// let file = std::fs::File::open("data.parquet")?;
/// let builder = ArrowReaderBuilder::try_new(file)?;
/// let mut reader = builder.build()?;
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example creates a reader but never demonstrates using it to read data. Consider adding code that actually reads from the reader (e.g., iterating over batches) to provide a more complete demonstration of the API's functionality.

Suggested change
/// let mut reader = builder.build()?;
/// let mut reader = builder.build()?;
///
/// // Read all record batches from the reader
/// while let Some(batch) = reader.next().transpose()? {
/// println!("Read {} rows", batch.num_rows());
/// }

Copilot uses AI. Check for mistakes.

/// # Example
/// ```rust
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded file path "data.parquet" will cause this example to fail when run via cargo test --doc. Examples in documentation should either use files that are known to exist (like those in the test data directory), or be marked with no_run to prevent execution during doc tests.

Suggested change
/// ```rust
/// ```rust,no_run

Copilot uses AI. Check for mistakes.
@vigneshsiva11
Copy link
Author

Thanks for the feedback!

I’ve updated the example to be fully synchronous, added a minimal read loop,
marked it as no_run, and aligned it with standard Rustdoc conventions.
All parquet tests and doctests pass locally.

Please let me know if this looks good now.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @vigneshsiva11


/// # Example
/// ```rust
/// use parquet::arrow::arrow_reader::ArrowReaderBuilder;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an example for a different structure, right?

The doc example is on ParquetRecordBatchReaderBuilder but the example is using ArrowReaderBuilder 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alamb, I’ve pushed an update converting the example into a doctest as suggested.
All checks pass locally (fmt, clippy, test).
Please let me know if you’d like any changes. Thanks!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @vigneshsiva11


impl<T: ChunkReader + 'static> ParquetRecordBatchReaderBuilder<T> {
/// Create a new [`ParquetRecordBatchReaderBuilder`]
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change means the old example isn't run

@alamb alamb self-requested a review January 17, 2026 15:47
@alamb
Copy link
Contributor

alamb commented Jan 17, 2026

@vigneshsiva11 can you check if the example needs to be incorporated ? It looks like maybe the existing docs already cover it

@vigneshsiva11
Copy link
Author

Thanks for the review!

My intention was to improve discoverability by moving the async example into a doctest, since it closely mirrors the sync builder example and keeps the documentation closer to the API.

That said, I agree that the existing docs (e.g. new_with_metadata) may already cover this sufficiently. If you think the async example is redundant, I’m happy to remove it entirely and keep the docs minimal.

Please let me know which direction you prefer.

@alamb
Copy link
Contributor

alamb commented Jan 17, 2026

That said, I agree that the existing docs (e.g. new_with_metadata) may already cover this sufficiently. If you think the async example is redundant, I’m happy to remove it entirely and keep the docs minimal.

I think it covers the functionality sufficiently -- let's just remove the example and not change the existing docs

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vigneshsiva11 -- I took the liberty of refining the example a little and pushed to your branch

/// let mut builder = ParquetRecordBatchReaderBuilder::try_new(file).unwrap();
///
/// // Inspect metadata
/// // The builder has access to ParquetMetaData such
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refined this example with some more comments as well as not calling with_row_groups to keep the example simpler

@vigneshsiva11
Copy link
Author

Thanks a lot, @alamb!

The refinements look great to me, and keeping the example simpler makes sense. I’m happy with this as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consolidate parquet examples into the doc comments

2 participants