-
Notifications
You must be signed in to change notification settings - Fork 1.1k
docs(parquet): move async parquet example into ArrowReaderBuilder docs #9167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR aims to move an async parquet reading example from a standalone example file into the API documentation for ArrowReaderBuilder::try_new. However, there's a fundamental mismatch: the deleted file demonstrated async parquet reading using ParquetRecordBatchStreamBuilder and tokio::fs::File, while the newly added documentation example uses synchronous APIs (std::fs::File and ParquetRecordBatchReaderBuilder).
Changes:
- Removed the
async_read_parquet.rsexample file that demonstrated async reading withParquetRecordBatchStreamBuilder - Added a synchronous example to
ParquetRecordBatchReaderBuilder::try_newdocumentation that is incorrectly wrapped in an async function
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| parquet/examples/async_read_parquet.rs | Removed standalone async example demonstrating ParquetRecordBatchStreamBuilder with async file I/O |
| parquet/src/arrow/arrow_reader/mod.rs | Added synchronous example incorrectly labeled as async to try_new documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| /// # Example | ||
| /// ```rust | ||
| /// use parquet::arrow::arrow_reader::ArrowReaderBuilder; | ||
| /// # async fn example() -> parquet::errors::Result<()> { | ||
| /// let file = std::fs::File::open("data.parquet")?; | ||
| /// let builder = ArrowReaderBuilder::try_new(file)?; | ||
| /// let mut reader = builder.build()?; | ||
| /// # Ok(()) | ||
| /// # } | ||
| /// ``` | ||
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra blank lines before and after the example section. Standard Rust documentation style typically uses a single blank line between doc comment sections.
| /// # Example | ||
| /// ```rust | ||
| /// use parquet::arrow::arrow_reader::ArrowReaderBuilder; | ||
| /// # async fn example() -> parquet::errors::Result<()> { | ||
| /// let file = std::fs::File::open("data.parquet")?; | ||
| /// let builder = ArrowReaderBuilder::try_new(file)?; | ||
| /// let mut reader = builder.build()?; | ||
| /// # Ok(()) | ||
| /// # } | ||
| /// ``` |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is marked as async (wrapped in async fn example()) but demonstrates synchronous file reading using std::fs::File. The deleted async_read_parquet.rs example actually demonstrated async reading using tokio::fs::File and ParquetRecordBatchStreamBuilder.
If the intent is to provide an async example for ArrowReaderBuilder::try_new, it should use the async API (ParquetRecordBatchStreamBuilder) instead of the synchronous API (ParquetRecordBatchReaderBuilder). Alternatively, if this is meant to be a synchronous example, the async fn example() wrapper should be removed and replaced with a regular function.
| /// # async fn example() -> parquet::errors::Result<()> { | ||
| /// let file = std::fs::File::open("data.parquet")?; | ||
| /// let builder = ArrowReaderBuilder::try_new(file)?; | ||
| /// let mut reader = builder.build()?; |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example creates a reader but never demonstrates using it to read data. Consider adding code that actually reads from the reader (e.g., iterating over batches) to provide a more complete demonstration of the API's functionality.
| /// let mut reader = builder.build()?; | |
| /// let mut reader = builder.build()?; | |
| /// | |
| /// // Read all record batches from the reader | |
| /// while let Some(batch) = reader.next().transpose()? { | |
| /// println!("Read {} rows", batch.num_rows()); | |
| /// } |
|
|
||
| /// # Example | ||
| /// ```rust |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded file path "data.parquet" will cause this example to fail when run via cargo test --doc. Examples in documentation should either use files that are known to exist (like those in the test data directory), or be marked with no_run to prevent execution during doc tests.
| /// ```rust | |
| /// ```rust,no_run |
|
Thanks for the feedback! I’ve updated the example to be fully synchronous, added a minimal read loop, Please let me know if this looks good now. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution @vigneshsiva11
|
|
||
| /// # Example | ||
| /// ```rust | ||
| /// use parquet::arrow::arrow_reader::ArrowReaderBuilder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an example for a different structure, right?
The doc example is on ParquetRecordBatchReaderBuilder but the example is using ArrowReaderBuilder 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alamb, I’ve pushed an update converting the example into a doctest as suggested.
All checks pass locally (fmt, clippy, test).
Please let me know if you’d like any changes. Thanks!
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution @vigneshsiva11
|
|
||
| impl<T: ChunkReader + 'static> ParquetRecordBatchReaderBuilder<T> { | ||
| /// Create a new [`ParquetRecordBatchReaderBuilder`] | ||
| /// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change means the old example isn't run
|
@vigneshsiva11 can you check if the example needs to be incorporated ? It looks like maybe the existing docs already cover it |
|
Thanks for the review! My intention was to improve discoverability by moving the async example into a doctest, since it closely mirrors the sync builder example and keeps the documentation closer to the API. That said, I agree that the existing docs (e.g. Please let me know which direction you prefer. |
I think it covers the functionality sufficiently -- let's just remove the example and not change the existing docs |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vigneshsiva11 -- I took the liberty of refining the example a little and pushed to your branch
| /// let mut builder = ParquetRecordBatchReaderBuilder::try_new(file).unwrap(); | ||
| /// | ||
| /// // Inspect metadata | ||
| /// // The builder has access to ParquetMetaData such |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refined this example with some more comments as well as not calling with_row_groups to keep the example simpler
|
Thanks a lot, @alamb! The refinements look great to me, and keeping the example simpler makes sense. I’m happy with this as is. |
Closes #9154
Rationale for this change
The async parquet reader example under
parquet/examplesduplicated functionalitythat is already exposed through the public
ArrowReaderBuilder::try_newAPI.Moving this example into the API documentation improves discoverability for users
while avoiding the need to maintain duplicate standalone examples.
What changes are included in this PR?
ArrowReaderBuilder::try_newasync_read_parquet.rsexample file fromparquet/examplesAre these changes tested?
Yes. Existing tests were run locally using:
cargo test -p parquet