[Parquet] Support skipping pages with mask based evaluation #9118

sdf-jkl · 2026-01-09T03:24:10Z

Which issue does this PR close?

Closes [Parquet] Support skipping pages with mask based evaluation #8845.

Rationale for this change

Check issue.

What changes are included in this PR?

Made next_mask_chunk page aware.
By adding page_offsets to ParquetRecordBatchReader

Are these changes tested?

Should be covered by existing tests from #8733

Are there any user-facing changes?

sdf-jkl · 2026-01-09T20:42:48Z

@alamb, @hhhizzz, @tustvold please review when available.

alamb

Thank you @sdf-jkl -- this actually makes a lot of sense to me 👏

I have a few concerns:

I am worried about the performance overhead of this approach (copying the page index and the loop for each batch) -- I will run some benchmarks to asses this
I do wonder if we have test coverage for this entire situation -- in particular, do we have tests that repeatedly call next_mask_chunk after the first page and make sure we get the right rows?

If the performance looks good, I think we should add some more tests -- maybe @hhhizzz has some ideas on how to do this (or I think I can try and find some time to help out / work with codex to do so)

alamb · 2026-01-10T12:14:02Z

parquet/src/arrow/push_decoder/reader_builder/mod.rs

-/// Using the row selection to skip(4), page2 won't be read at all, so in this
-/// case we can't decode all the rows and apply a mask. To correctly apply the
-/// bit mask, we need all 6 values be read, but page2 is not in memory.
-fn override_selector_strategy_if_needed(


nice -- the idea is to avoid this function 👍

alamb · 2026-01-10T19:21:43Z

parquet/src/arrow/arrow_reader/mod.rs

            array_reader,
            schema: Arc::new(schema),
            read_plan,
+            page_offsets: page_offsets.map(|slice| Arc::new(slice.to_vec())),


So I think this will effectively will copy the entire OffsetIndexMetadataData structure (which I worry could be quite large)

I wonder if we need to find a way to avoid this (e.g. making the entire thing Arc'd in https://github.com/apache/arrow-rs/blob/67e04e758f1e62ec3d78d2f678daf433a4c54e30/parquet/src/file/metadata/mod.rs#L197-L196 somehow 🤔 )

We could store only the &Vec<PageLocation> instead of the entire OffsetIndexMetadataData df9a493

parquet/src/arrow/arrow_reader/selection.rs

alamb · 2026-01-10T19:25:51Z

parquet/src/arrow/arrow_reader/selection.rs

-            while cursor < mask.len() && selected_rows < batch_size {
+            let mut page_end = mask.len();
+            if let Some(pages) = page_locations {
+                for loc in pages {


I am also a little worried that this loop will take too long (it is O(N^2) in the number of pages as each time it looks through all pages

Maybe we could potentially add a PageLocationIterator to the cursor itself (so we know where to pick up)

Maybe a binary search through a vec of page offsets? ~~Would have to construct the vec once beforehand to keep us from rebuilding it.~~

Fixed in df9a493

alamb · 2026-01-10T19:28:55Z

run benchmark arrow_reader_clickbench arrow_reader_row_filter

alamb-ghbot · 2026-01-10T19:29:17Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing bitmask-skip-page (5395dbf) to 964daec diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=bitmask-skip-page
Results will be posted here when complete

alamb-ghbot · 2026-01-10T19:54:36Z

🤖: Benchmark completed

Details

group                                bitmask-skip-page                      main
-----                                -----------------                      ----
arrow_reader_clickbench/async/Q1     1.02      2.4±0.05ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     12.7±0.40ms        ? ?/sec    1.02     12.9±0.30ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     14.6±0.34ms        ? ?/sec    1.00     14.7±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     25.9±0.67ms        ? ?/sec    1.00     25.9±0.32ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     31.2±0.53ms        ? ?/sec    1.00     31.2±0.75ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.01     28.5±0.83ms        ? ?/sec    1.00     28.4±0.43ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.02      5.4±0.12ms        ? ?/sec    1.00      5.3±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.18    145.8±1.75ms        ? ?/sec    1.00    123.1±0.81ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.06    166.9±1.86ms        ? ?/sec    1.00    157.3±2.95ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.03   324.0±13.97ms        ? ?/sec    1.00    313.1±8.51ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.00    407.8±3.02ms        ? ?/sec    1.00    408.5±5.27ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     34.2±0.38ms        ? ?/sec    1.00     34.1±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    100.6±1.00ms        ? ?/sec    1.00    100.2±1.29ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.01     99.0±0.92ms        ? ?/sec    1.00     98.3±0.69ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.01     30.7±0.58ms        ? ?/sec    1.00     30.5±0.64ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.01    109.3±0.58ms        ? ?/sec    1.00    108.5±0.67ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00     85.0±1.17ms        ? ?/sec    1.00     84.8±0.39ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     32.9±0.47ms        ? ?/sec    1.00     32.8±0.30ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     46.2±0.47ms        ? ?/sec    1.00     46.4±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.02     27.8±0.60ms        ? ?/sec    1.00     27.2±0.49ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     22.4±0.72ms        ? ?/sec    1.00     22.5±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.01     11.1±0.09ms        ? ?/sec    1.00     11.1±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.01      2.1±0.04ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00      9.8±0.07ms        ? ?/sec    1.01      9.9±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.4±0.25ms        ? ?/sec    1.00     11.4±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.03     35.0±1.40ms        ? ?/sec    1.00     34.0±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     38.8±0.55ms        ? ?/sec    1.24     48.0±1.00ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     36.2±0.60ms        ? ?/sec    1.22     44.4±0.59ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.02      4.4±0.02ms        ? ?/sec    1.00      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    177.0±2.01ms        ? ?/sec    1.01    177.9±0.95ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    234.5±2.00ms        ? ?/sec    1.01    236.3±2.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    479.1±3.47ms        ? ?/sec    1.01    483.5±4.64ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.02   444.0±13.03ms        ? ?/sec    1.00   436.1±13.35ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     44.8±0.67ms        ? ?/sec    1.03     46.1±0.41ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    155.2±1.67ms        ? ?/sec    1.00    155.1±1.40ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    149.6±1.04ms        ? ?/sec    1.00    149.8±1.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.01     31.4±0.43ms        ? ?/sec    1.00     31.1±0.56ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    153.8±1.14ms        ? ?/sec    1.00    154.0±1.38ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     87.9±1.08ms        ? ?/sec    1.01     88.9±0.72ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     29.1±0.39ms        ? ?/sec    1.01     29.4±0.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     33.9±0.42ms        ? ?/sec    1.00     34.0±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     26.6±0.33ms        ? ?/sec    1.00     26.6±0.60ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.01     29.1±0.61ms        ? ?/sec    1.00     28.8±0.33ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     12.7±0.10ms        ? ?/sec    1.00     12.6±0.08ms        ? ?/sec

alamb-ghbot · 2026-01-10T19:54:40Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing bitmask-skip-page (5395dbf) to 964daec diff
BENCH_NAME=arrow_reader_row_filter
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_row_filter
BENCH_FILTER=
BENCH_BRANCH_NAME=bitmask-skip-page
Results will be posted here when complete

alamb-ghbot · 2026-01-10T20:07:54Z

🤖: Benchmark completed

Details

group                                                                                bitmask-skip-page                      main
-----                                                                                -----------------                      ----
arrow_reader_row_filter/float64 <= 99.0/all_columns/async                            1.00  1711.1±12.59µs        ? ?/sec    1.01  1725.2±10.04µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/sync                             1.00  1807.9±10.76µs        ? ?/sec    1.03  1865.4±19.96µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/async                  1.00  1571.9±23.10µs        ? ?/sec    1.02  1607.7±31.74µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/sync                   1.00  1548.3±27.08µs        ? ?/sec    1.02  1575.1±27.17µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/async              1.00   1528.8±7.49µs        ? ?/sec    1.01  1548.4±19.21µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/sync               1.00  1673.9±17.01µs        ? ?/sec    1.02  1699.9±13.19µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/async    1.00  1354.8±20.42µs        ? ?/sec    1.01  1361.8±13.37µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/sync     1.00  1348.2±12.04µs        ? ?/sec    1.01   1363.6±9.07µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/async                             1.00  1719.1±21.06µs        ? ?/sec    1.00   1712.1±8.82µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/sync                              1.00  1808.9±15.74µs        ? ?/sec    1.01  1829.3±13.16µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/async                   1.00  1571.7±13.46µs        ? ?/sec    1.00  1572.2±11.69µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/sync                    1.00  1534.2±12.04µs        ? ?/sec    1.01  1552.9±10.49µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/async                              1.00    918.8±8.99µs        ? ?/sec    1.03   943.7±38.25µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/sync                               1.00   864.1±17.34µs        ? ?/sec    1.02   878.3±16.94µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/async                    1.00    839.9±5.83µs        ? ?/sec    1.02    855.1±8.58µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/sync                     1.00    848.7±5.97µs        ? ?/sec    1.03    870.4±9.58µs        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/async                                 1.27      3.5±0.06ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/sync                                  1.00      3.6±0.04ms        ? ?/sec    1.00      3.6±0.04ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/async                       1.02      2.6±0.02ms        ? ?/sec    1.00      2.6±0.03ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/sync                        1.00      2.3±0.03ms        ? ?/sec    1.00      2.3±0.06ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/async                                  1.00  1957.2±24.32µs        ? ?/sec    1.01  1972.9±32.45µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/sync                                   1.00      2.0±0.02ms        ? ?/sec    1.01      2.1±0.04ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/async                        1.00  1809.3±44.74µs        ? ?/sec    1.00  1807.9±16.81µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/sync                         1.00  1797.3±13.53µs        ? ?/sec    1.01  1810.9±31.64µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/async                                 1.00  1249.1±15.21µs        ? ?/sec    1.03  1283.2±13.68µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/sync                                  1.00   1252.4±9.50µs        ? ?/sec    1.03  1289.5±11.20µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/async                       1.00  1128.0±13.41µs        ? ?/sec    1.03   1157.9±7.78µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/sync                        1.00   1138.9±9.11µs        ? ?/sec    1.03  1172.3±40.63µs        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/async                             1.02      3.3±0.04ms        ? ?/sec    1.00      3.2±0.03ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/sync                              1.02      3.6±0.02ms        ? ?/sec    1.00      3.6±0.04ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/async                   1.01      2.8±0.01ms        ? ?/sec    1.00      2.8±0.01ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/sync                    1.00      2.5±0.02ms        ? ?/sec    1.00      2.5±0.07ms        ? ?/sec

… into bitmask-skip-page

Dandandan · 2026-01-12T22:11:26Z

run benchmark arrow_reader_clickbench arrow_reader_row_filter

alamb-ghbot · 2026-01-12T22:56:50Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing bitmask-skip-page (6919196) to 964daec diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=bitmask-skip-page
Results will be posted here when complete

alamb-ghbot · 2026-01-12T23:22:29Z

🤖: Benchmark completed

Details

group                                bitmask-skip-page                      main
-----                                -----------------                      ----
arrow_reader_clickbench/async/Q1     1.01      2.3±0.03ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     12.7±0.48ms        ? ?/sec    1.00     12.7±0.19ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     14.3±0.39ms        ? ?/sec    1.01     14.4±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.02     25.7±0.75ms        ? ?/sec    1.00     25.2±0.50ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.02     31.3±0.57ms        ? ?/sec    1.00     30.8±0.52ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.03     28.4±0.63ms        ? ?/sec    1.00     27.7±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.3±0.06ms        ? ?/sec    1.00      5.3±0.12ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    142.9±2.30ms        ? ?/sec    1.03    147.1±6.53ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    164.3±1.11ms        ? ?/sec    1.08    177.8±2.83ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.02    317.8±9.85ms        ? ?/sec    1.00   310.7±35.18ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.00    402.9±3.18ms        ? ?/sec    1.02    412.2±2.91ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.03     34.7±0.65ms        ? ?/sec    1.00     33.7±0.34ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00     98.0±0.64ms        ? ?/sec    1.03    100.9±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00     98.4±0.54ms        ? ?/sec    1.02    100.0±1.22ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     30.7±0.35ms        ? ?/sec    1.00     30.6±0.87ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    108.0±1.62ms        ? ?/sec    1.02    109.7±0.79ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00     84.3±0.75ms        ? ?/sec    1.02     86.1±0.61ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     32.7±0.28ms        ? ?/sec    1.01     33.1±0.42ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     46.3±0.48ms        ? ?/sec    1.00     46.4±0.49ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.02     28.3±0.49ms        ? ?/sec    1.00     27.7±0.34ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.03     23.1±0.69ms        ? ?/sec    1.00     22.4±0.43ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.02     11.3±0.10ms        ? ?/sec    1.00     11.1±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.1±0.01ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00      9.8±0.18ms        ? ?/sec    1.00      9.8±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.04     11.7±0.39ms        ? ?/sec    1.00     11.3±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.11     36.2±1.97ms        ? ?/sec    1.00     32.5±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     40.0±0.55ms        ? ?/sec    1.15     45.9±0.73ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     37.0±0.59ms        ? ?/sec    1.15     42.5±0.81ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.03      4.4±0.05ms        ? ?/sec    1.00      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    172.6±1.37ms        ? ?/sec    1.02    176.7±1.89ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    230.7±2.59ms        ? ?/sec    1.01    233.5±3.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    474.0±5.22ms        ? ?/sec    1.01    478.3±3.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00   440.5±13.95ms        ? ?/sec    1.00   442.2±17.34ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.05     46.7±1.35ms        ? ?/sec    1.00     44.5±0.50ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    153.1±1.65ms        ? ?/sec    1.01    154.2±0.97ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.01    151.2±2.35ms        ? ?/sec    1.00    149.2±1.16ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.09     32.8±0.69ms        ? ?/sec    1.00     30.0±0.60ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    149.7±1.66ms        ? ?/sec    1.03    154.7±1.74ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     86.4±0.84ms        ? ?/sec    1.03     89.0±1.67ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     28.3±0.46ms        ? ?/sec    1.03     29.2±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     32.7±0.34ms        ? ?/sec    1.04     33.9±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     26.3±0.28ms        ? ?/sec    1.00     26.3±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.01     29.1±0.32ms        ? ?/sec    1.00     28.7±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     12.6±0.24ms        ? ?/sec    1.01     12.7±0.43ms        ? ?/sec

alamb-ghbot · 2026-01-12T23:22:32Z

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing bitmask-skip-page (6919196) to 964daec diff
BENCH_NAME=arrow_reader_row_filter
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_row_filter
BENCH_FILTER=
BENCH_BRANCH_NAME=bitmask-skip-page
Results will be posted here when complete

alamb-ghbot · 2026-01-12T23:35:44Z

🤖: Benchmark completed

Details

group                                                                                bitmask-skip-page                       main
-----                                                                                -----------------                       ----
arrow_reader_row_filter/float64 <= 99.0/all_columns/async                            1.00  1737.0±24.43µs        ? ?/sec     1.00  1740.6±12.25µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/all_columns/sync                             1.00  1841.2±16.88µs        ? ?/sec     1.00  1850.3±15.17µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/async                  1.00  1595.2±23.94µs        ? ?/sec     1.00  1589.8±11.48µs        ? ?/sec
arrow_reader_row_filter/float64 <= 99.0/exclude_filter_column/sync                   1.00  1554.2±18.65µs        ? ?/sec     1.02  1579.9±11.57µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/async              1.00  1550.7±10.67µs        ? ?/sec     1.01  1570.4±22.51µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/all_columns/sync               1.00  1710.2±13.28µs        ? ?/sec     1.02  1748.2±19.70µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/async    1.00  1378.1±20.68µs        ? ?/sec     1.01  1388.9±17.42µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0 AND ts >= 9000/exclude_filter_column/sync     1.00  1377.8±10.71µs        ? ?/sec     1.00  1382.7±12.99µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/async                             1.01  1740.5±14.97µs        ? ?/sec     1.00  1725.8±18.37µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/all_columns/sync                              1.00  1842.9±13.91µs        ? ?/sec     1.00  1837.5±16.92µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/async                   1.00  1596.0±16.08µs        ? ?/sec     1.00  1593.0±20.88µs        ? ?/sec
arrow_reader_row_filter/float64 > 99.0/exclude_filter_column/sync                    1.00  1557.3±14.67µs        ? ?/sec     1.01  1575.3±26.16µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/async                              1.01    922.3±8.84µs        ? ?/sec     1.00    911.4±9.30µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/all_columns/sync                               1.01   864.8±12.31µs        ? ?/sec     1.00   852.6±12.39µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/async                    1.02   851.8±17.05µs        ? ?/sec     1.00    835.0±7.01µs        ? ?/sec
arrow_reader_row_filter/int64 == 9999/exclude_filter_column/sync                     1.01   855.7±10.16µs        ? ?/sec     1.00   848.3±26.47µs        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/async                                 1.00      2.7±0.02ms        ? ?/sec     1.42      3.9±0.04ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/all_columns/sync                                  1.00      3.6±0.08ms        ? ?/sec     1.06      3.8±0.17ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/async                       1.00      2.6±0.04ms        ? ?/sec     1.01      2.7±0.11ms        ? ?/sec
arrow_reader_row_filter/int64 > 90/exclude_filter_column/sync                        1.00      2.3±0.03ms        ? ?/sec     1.05      2.4±0.06ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/async                                  1.00  1967.4±107.56µs        ? ?/sec    1.02      2.0±0.03ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/all_columns/sync                                   1.00      2.0±0.01ms        ? ?/sec     1.02      2.1±0.06ms        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/async                        1.00  1772.0±11.49µs        ? ?/sec     1.02  1804.6±13.86µs        ? ?/sec
arrow_reader_row_filter/ts < 9000/exclude_filter_column/sync                         1.00  1783.5±17.77µs        ? ?/sec     1.01  1808.2±11.79µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/async                                 1.00  1257.4±13.21µs        ? ?/sec     1.02   1277.2±8.80µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/all_columns/sync                                  1.00  1275.4±11.35µs        ? ?/sec     1.00  1270.5±14.14µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/async                       1.00  1162.3±13.90µs        ? ?/sec     1.00  1157.2±23.30µs        ? ?/sec
arrow_reader_row_filter/ts >= 9000/exclude_filter_column/sync                        1.02  1172.1±10.33µs        ? ?/sec     1.00  1152.2±21.33µs        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/async                             1.00      3.3±0.03ms        ? ?/sec     1.00      3.3±0.04ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/all_columns/sync                              1.00      3.6±0.05ms        ? ?/sec     1.01      3.6±0.02ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/async                   1.00      2.8±0.02ms        ? ?/sec     1.01      2.8±0.08ms        ? ?/sec
arrow_reader_row_filter/utf8View <> ''/exclude_filter_column/sync                    1.00      2.5±0.01ms        ? ?/sec     1.01      2.6±0.03ms        ? ?/sec

sdf-jkl · 2026-01-13T18:16:22Z

@alamb @Dandandan clickbench q12, 24, 30 show some degradation, but everything else looks like an overall improvement.

hhhizzz · 2026-01-14T08:57:27Z

parquet/src/arrow/push_decoder/reader_builder/mod.rs


-                let reader = ParquetRecordBatchReader::new(array_reader, plan);
+                let reader =
+                    ParquetRecordBatchReader::new(array_reader, plan, page_offsets.cloned());


cloned may cause extra expense here, can we use Arc<[PageLocation]> to avoid that?

It's a big api change to make PageLocation or OffsetIndexMetadataData an Arc inside ParquetMetaData.

If we'd want to make that change, I can open an issue and work up a PR.

I agree with @hhhizzz that copying the offsets here is not good

I thought about it some more, and I think the reason the copy is currently needed is that the decision of should the page be skipped is postponed until the next MaskChunk is needed

One potential idea I had to avoid this, is to use the page index in the ReadPlanBuilder when building, rather than pass in the page index to every call for next_batch.

So maybe that would look something like extending MaskCursor from

/// Cursor for iterating a mask-backed [`RowSelection`] /// /// This is best for dense selections where there are many small skips /// or selections. For example, selecting every other row. #[derive(Debug)] pub struct MaskCursor { mask: BooleanBuffer, /// Current absolute offset into the selection position: usize, }

To also track what ranges should be skipped entirely. Maybe something like

#[derive(Debug)] pub struct MaskCursor { mask: BooleanBuffer, /// Current absolute offset into the selection position: usize, /// Which row ranges should be skipped entirely? skip_ranges: Vec<Range<usize>>, }

That I think would simplify the logic for next_mask_chunk significantly and it would avoid the need to copy the entire page inde

hhhizzz · 2026-01-14T09:08:59Z

Thank you! @sdf-jkl , the code look great, just wondering if we could add more Unit tests.

hhhizzz · 2026-01-14T15:48:29Z

Here's the exists test:

arrow-rs/parquet/src/arrow/async_reader/mod.rs

Line 1218 in 13d497a

async fn test_row_filter_full_page_skip_is_handled_async() {

I think we can just add one more UT to test the skipping page with RowSelectionPolicy set to Mask instead of Auto

…k-skip-page

sdf-jkl · 2026-01-14T18:00:47Z

Here's the exists test:

arrow-rs/parquet/src/arrow/async_reader/mod.rs

Line 1218 in 13d497a

async fn test_row_filter_full_page_skip_is_handled_async() {

I think we can just add one more UT to test the skipping page with RowSelectionPolicy set to Mask instead of Auto

Shouldn't these test resolve to Mask by default? Can't assert it in the tests right now, because the ReaderBuilder is not exposed there.

…-skip-page

hhhizzz · 2026-01-15T00:53:10Z

Here's the exists test:

arrow-rs/parquet/src/arrow/async_reader/mod.rs

Line 1218 in 13d497a

async fn test_row_filter_full_page_skip_is_handled_async() {

I think we can just add one more UT to test the skipping page with RowSelectionPolicy set to Mask instead of Auto

Shouldn't these test resolve to Mask by default? Can't assert it in the tests right now, because the ReaderBuilder is not exposed there.

This test doesn’t cover the Mask strategy directly. Its purpose is to verify that Mask can fall back to Selector when page skipping is involved and the policy is Auto. Since the fallback has been removed, we need additional tests. I added more in this PR: sdf-jkl#1 — feel free to review and merge any parts you want to keep.

Unit tests for Bitmask skip page

sdf-jkl · 2026-01-15T15:36:38Z

Thank you @hhhizzz, I merged the whole thing

alamb · 2026-01-16T22:21:43Z

I hope to review this soon

alamb · 2026-01-22T18:37:21Z

FYI it turns out we hit some bug in this logic when we deployed the new predicate evaluator to production at InfluxData, see

[regression] Error with adaptive predicate pushdown: "Invalid offset in sparse column chunk data: 754, no matching page found." #9239

Thanks to @erratic-pattern we also have a nice test reproducer.

Since it seems to be related to this logic I plan to review this PR now and try and figure out how to help

…sk-skip-page

…k-skip-page

sdf-jkl · 2026-01-22T19:09:59Z

I'll merge main and fix clippy, to make it easier to review. I could also remove the #[should_panic] from the new test.

alamb

Thank you @sdf-jkl and @hhhizzz -- I took a look at this PR and it is looking like it is heading in the right direction

I had some structural suggestions and I also have an idea for some additional coverage (related to predicates).

Please let me know if you are willing to work on this, otherwise I am happy to take over this PR as well (given we are hitting the problem at work, and it is blocking our upgrade)

alamb · 2026-01-22T19:21:49Z

parquet/src/arrow/push_decoder/reader_builder/mod.rs

+                        )
+                    }) {
+                    self.row_group_offset_index(row_group_idx)
+                        .and_then(|columns| columns.first())


I think this is a bug -- it reads the page offsets from the first column rater than the column being read

Maybe something like

self.row_group_offset_index(row_group_idx).and_then(|columns| { columns .iter() .enumerate() .find(|(leaf_idx, _)| self.projection.leaf_included(*leaf_idx)) .map(|(_, column)| column.page_locations())

~~Wouldn't the page offsets be same for every column?~~ It is, thanks!

I think even this should not work, because we actually need to keep page offsets for all projected columns and use it in ReadPlanBuilder(once we move it from ParquetRecordBatchReader)

So I guess we go back to using the whole &[OffsetIndexMetaData]

I plan to find some time this afternoon to work on this PR -- maybe I will come up with something

Another issue with the current implementation is that ParquetRecordBatchReader is working page aware using pages offsets from a single column.

However, read happens for all columns at once, using the same boolean mask(which is col chunk specific).
https://github.com/apache/arrow-rs/pull/9118/changes#diff-850b3a44587149637b8545f66603a2b1252959fd36f7ddc55f37d6b5357816c6L1403

It seems that supporting different page offsets for each col would require us to push page awareness further down into the arrow readers.

alamb · 2026-01-22T19:25:06Z

parquet/src/arrow/arrow_reader/mod.rs

+
                while !mask_cursor.is_empty() {
-                    let Some(mask_chunk) = mask_cursor.next_mask_chunk(batch_size) else {
+                    let Some(mask_chunk) = mask_cursor.next_mask_chunk(batch_size, page_locations)


I expect that this API needs to be extended -- it needs to be able to represent "skip the next N rows without trying to decode them"

As written here I think the first page that doesn't have any rows selected will return None (which will trigger the reader to think it is at the end of the file, even if there is data left)

The reader only thinks it's the end of the file when no further rows remain in mask_cursor. Empty page is handled by initial skip in next_mask_chunk.

alamb · 2026-01-22T19:28:06Z

parquet/src/arrow/push_decoder/reader_builder/mod.rs


-                let reader = ParquetRecordBatchReader::new(array_reader, plan);
+                let reader =
+                    ParquetRecordBatchReader::new(array_reader, plan, page_offsets.cloned());


I agree with @hhhizzz that copying the offsets here is not good

I thought about it some more, and I think the reason the copy is currently needed is that the decision of should the page be skipped is postponed until the next MaskChunk is needed

One potential idea I had to avoid this, is to use the page index in the ReadPlanBuilder when building, rather than pass in the page index to every call for next_batch.

So maybe that would look something like extending MaskCursor from

/// Cursor for iterating a mask-backed [`RowSelection`] /// /// This is best for dense selections where there are many small skips /// or selections. For example, selecting every other row. #[derive(Debug)] pub struct MaskCursor { mask: BooleanBuffer, /// Current absolute offset into the selection position: usize, }

To also track what ranges should be skipped entirely. Maybe something like

#[derive(Debug)] pub struct MaskCursor { mask: BooleanBuffer, /// Current absolute offset into the selection position: usize, /// Which row ranges should be skipped entirely? skip_ranges: Vec<Range<usize>>, }

That I think would simplify the logic for next_mask_chunk significantly and it would avoid the need to copy the entire page inde

parquet/src/arrow/arrow_reader/mod.rs

sdf-jkl · 2026-01-22T19:41:48Z

Definitely willing to work on this, thanks for the review and your input!

alamb · 2026-01-22T22:50:25Z

Awesome -- thanks @sdf-jkl -- I will switch focus for the rest of today and check back in tomorrow.

…k-skip-page

sdf-jkl · 2026-01-23T18:18:22Z

parquet/src/arrow/async_reader/mod.rs

+        let props = WriterProperties::builder()
+            .set_write_batch_size(2)
+            .set_data_page_row_count_limit(2)
+            .build();


I think the reason why tests pass is because page offsets are the same for any column.

We limit pages by row_count, not by size.

It's actually the same in the new test too...

sdf-jkl · 2026-01-23T21:37:22Z

@alamb It seems like I'm on to something with codex. The test passes, but I want to give it a read and a little polish first before sending it your way.

sdf-jkl · 2026-01-23T21:51:41Z

It seems like the issue was caused by different sized pages after all. Bigger types would have more smaller "finer" pages and smaller types would have less bigger "coarser" pages.

If the column with coarse pages was used to enable page awareness we would use it's page offsets.

                             ┏━━━━┓ ┌────────┐            ┌────────┐
- '1' means selected         ┃ 1  ┃ │ Row 0  │            │ Row 0  │
- '0' means filtered         ┃ 1  ┃ │ Row 1  │            │ Row 1  │
                             ┃ 0  ┃ │ Row 2  │  A Page 0  └────────┘
                             ┃ 0  ┃ │ Row 3  │            ┌────────┐
                             ┃ 0  ┃ │ Row 4  │            │ Row 2  │
                             ┃    ┃ └────────┘            │ Row 3  │  B Page 1 (skipped)
                             ┃    ┃ ┌────────┐            └────────┘
                             ┃ 0  ┃ │ Row 5  │            ┌────────┐
                             ┃ 0  ┃ │ Row 6  │  A Page 1  │ Row 4  │
                             ┃ 0  ┃ │ Row 7  │            │ Row 5  │  B Page 2 (skipped)
                             ┃ 1  ┃ │ Row 8  │            └────────┘
                             ┃ 1  ┃ │ Row 9  │            ┌────────┐
                             ┗━━━━┛ └────────┘            │ Row 6  │
                                                          │ Row 7  │  B Page 3 (skipped)
                                                          └────────┘
                                                          ┌────────┐
                                                          │ Row 8  │
                                                          │ Row 9  │  B Page 4 (fetched)
                                                          └────────┘

Mask chunking uses A's coarse boundary:
- Chunk 1 tries to read rows 0–4 (A Page 0)

But Column B has fine pages:
- rows 2–5 are in B Pages 1–2 (skipped)

→ The chunk crosses into unfetched B pages → invalid offset

In the example above col A with "coarse" pages overlaps with "finer" pages in Col B that were skipped during data fetch. This lead to the invalid offsets issue.

sdf-jkl · 2026-01-23T23:10:14Z

parquet/src/arrow/arrow_reader/read_plan.rs

+    /// Add offset index metadata for each column in a row group to this `ReadPlanBuilder`
+    pub fn with_offset_index_metadata(


Using the offsets of the column with the smallest number of rows per page should prevent the invalid offset issue from happening.

sdf-jkl added 3 commits January 7, 2026 16:28

Finding where to start

321a8d3

Seems to work

8acfb4e

Fix

ed2a182

github-actions bot added the parquet Changes to the parquet crate label Jan 9, 2026

sdf-jkl changed the title ~~Bitmask skip page~~ [Parquet] Support skipping pages with mask based evaluation Jan 9, 2026

sdf-jkl and others added 2 commits January 8, 2026 22:24

Merge branch 'main' into bitmask-skip-page

145ecec

Fix async err?

5395dbf

sdf-jkl marked this pull request as ready for review January 9, 2026 20:41

alamb reviewed Jan 10, 2026

View reviewed changes

apache deleted a comment from alamb-ghbot Jan 10, 2026

sdf-jkl added 4 commits January 11, 2026 23:40

Fix complexity from O(n^2) to O(logn)

df9a493

Merge branch 'bitmask-skip-page' of https://github.com/sdf-jkl/arrow-rs…

014dbc9

… into bitmask-skip-page

Pass pagelocation instead offsetindexmetadata

55e0126

Fix clippy

6919196

hhhizzz reviewed Jan 14, 2026

View reviewed changes

Merge branch 'main' of https://github.com/apache/arrow-rs into bitmas…

83dfb46

…k-skip-page

Dandandan and others added 3 commits January 14, 2026 21:57

Merge branch 'main' into bitmask-skip-page

48d93af

Add more tests

8d658b0

Merge remote-tracking branch 'sdf-jkl/bitmask-skip-page' into bitmask…

ee0a75f

…-skip-page

Merge pull request #1 from hhhizzz/bitmask-skip-page

6d35513

Unit tests for Bitmask skip page

cargo fmt

c1876e4

sdf-jkl requested a review from alamb January 15, 2026 23:58

alamb mentioned this pull request Jan 21, 2026

[regression] Error with adaptive predicate pushdown: "Invalid offset in sparse column chunk data: 754, no matching page found." #9239

Open

erratic-pattern mentioned this pull request Jan 21, 2026

[Parquet] Support skipping pages with mask based evaluation #8845

Open

sdf-jkl added 2 commits January 22, 2026 13:42

Merge branch 'main' of https://github.com/sdf-jkl/arrow-rs into bitma…

b6baf2d

…sk-skip-page

Merge branch 'main' of https://github.com/apache/arrow-rs into bitmas…

2aed549

…k-skip-page

fix clippy

6639ac7

alamb reviewed Jan 22, 2026

View reviewed changes

Fix PageIndexPolicy::from() to Required

2fb401f

Merge branch 'main' of https://github.com/apache/arrow-rs into bitmas…

bc07ac0

…k-skip-page

sdf-jkl commented Jan 23, 2026

View reviewed changes

Change page_aware logic

d200ee9

sdf-jkl added 2 commits January 23, 2026 18:03

Make mask use page offsets with smallest pages

717a1cb

cargo fmt

03c8bdb

sdf-jkl commented Jan 23, 2026

View reviewed changes

clippy

1158af0

		/// Add offset index metadata for each column in a row group to this `ReadPlanBuilder`
		pub fn with_offset_index_metadata(

[Parquet] Support skipping pages with mask based evaluation #9118

Are you sure you want to change the base?

[Parquet] Support skipping pages with mask based evaluation #9118

Uh oh!

Conversation

sdf-jkl commented Jan 9, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

sdf-jkl commented Jan 9, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdf-jkl Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdf-jkl Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb-ghbot commented Jan 10, 2026

Uh oh!

alamb-ghbot commented Jan 10, 2026

Uh oh!

alamb-ghbot commented Jan 10, 2026

Uh oh!

alamb-ghbot commented Jan 10, 2026

Uh oh!

Dandandan commented Jan 12, 2026

Uh oh!

alamb-ghbot commented Jan 12, 2026

Uh oh!

alamb-ghbot commented Jan 12, 2026

Uh oh!

alamb-ghbot commented Jan 12, 2026

Uh oh!

alamb-ghbot commented Jan 12, 2026

Uh oh!

sdf-jkl commented Jan 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hhhizzz commented Jan 14, 2026

Uh oh!

hhhizzz commented Jan 14, 2026

Uh oh!

sdf-jkl commented Jan 14, 2026

Uh oh!

hhhizzz commented Jan 15, 2026

Uh oh!

sdf-jkl commented Jan 15, 2026

Uh oh!

alamb commented Jan 16, 2026

Uh oh!

alamb commented Jan 22, 2026

Uh oh!

sdf-jkl commented Jan 22, 2026

Uh oh!

alamb left a comment

sdf-jkl Jan 12, 2026 •

edited

Loading

sdf-jkl Jan 11, 2026 •

edited

Loading

alamb commented Jan 10, 2026 •

edited

Loading

sdf-jkl Jan 22, 2026 •

edited

Loading

alamb Jan 23, 2026 •

edited

Loading

sdf-jkl Jan 23, 2026 •

edited

Loading