feat: metric batch 2s #7613

fengjiachun · 2026-01-26T04:08:31Z

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

An exploration and test of a potential performance optimization, not necessarily to be merged.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR requires documentation updates.
API changes are backward compatible.
Schema or data changes are backward compatible.

Signed-off-by: jeremyhi <[email protected]>

gemini-code-assist · 2026-01-26T04:08:56Z

Summary of Changes

Hello @fengjiachun, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant performance optimization for Prometheus remote write requests by implementing a new batching mechanism. Instead of processing each incoming metric data point immediately, the system now buffers these requests and flushes them to the storage engine either after a configurable time interval or once a maximum number of rows has been accumulated. This change aims to reduce the overhead associated with frequent small writes, leading to more efficient data ingestion.

Highlights

New Batching Mechanism: Introduced a PendingRowsBatcher to buffer Prometheus remote write requests, improving performance by flushing data in larger batches rather than individually.
Configurable Batching Parameters: Added new configuration options, pending_rows_flush_interval (default 2 seconds) and max_batch_rows (default 100,000), to control the batcher's behavior.
Enhanced Error Handling: New error types, BatcherChannelClosed and BatcherQueueFull, were added to handle specific issues related to the batching process, with BatcherQueueFull mapping to a RateLimited status code.
New Metrics for Monitoring: Integrated new Prometheus metrics (PENDING_BATCHES, PENDING_ROWS, FLUSH_TOTAL, FLUSH_ROWS) to provide visibility into the batcher's operational status and performance.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a batching mechanism for Prometheus remote write requests to improve performance. The implementation uses a per-table worker model with batching based on size and time intervals, which is a solid approach. However, I've found a critical issue in how the physical_table is retrieved from the query context, which will likely cause a compilation failure. Additionally, the graceful shutdown implementation could lead to data loss when the server is stopped. I've provided detailed comments on both issues with suggestions for fixes.

gemini-code-assist · 2026-01-26T04:11:39Z

src/servers/src/pending_rows_batcher.rs

+    let physical_table = ctx
+        .extension(PHYSICAL_TABLE_KEY)
+        .unwrap_or(GREPTIME_PHYSICAL_TABLE)
+        .to_string();


The logic for retrieving physical_table from the query context extension appears to be incorrect. The extension is set as a String in http/prom_store.rs, but this code seems to assume it's a &str or has compatible types for unwrap_or.

ctx.extension(PHYSICAL_TABLE_KEY) will likely return an Option<&String>. unwrap_or expects a value of the same type (&String), but GREPTIME_PHYSICAL_TABLE is a &'static str. This will lead to a compilation error due to type mismatch.

The intended logic is likely to use the extension if present, or the default value otherwise. This can be correctly implemented as follows:

Suggested change

let physical_table = ctx

.extension(PHYSICAL_TABLE_KEY)

.unwrap_or(GREPTIME_PHYSICAL_TABLE)

.to_string();

let physical_table = ctx

.extension::<String>(PHYSICAL_TABLE_KEY)

.cloned()

.unwrap_or_else(|| GREPTIME_PHYSICAL_TABLE.to_string());

gemini-code-assist · 2026-01-26T04:11:39Z

src/servers/src/pending_rows_batcher.rs

+impl Drop for PendingRowsBatcher {
+    fn drop(&mut self) {
+        let _ = self.shutdown.send(());
+    }


The current Drop implementation sends a shutdown signal to worker tasks but doesn't wait for them to complete their final flushes. This can lead to data loss during server shutdown, as the main process might exit before the workers have finished writing their pending data.

Additionally, the spawn_flush function creates detached tasks for flushing batches, which are also not tracked. If the server shuts down while these flush tasks are in-flight, they might be terminated prematurely, also resulting in data loss.

To ensure data is not lost on shutdown, I recommend implementing a more robust graceful shutdown mechanism. This would typically involve:

An async fn shutdown(&self) method on PendingRowsBatcher.

Tracking all spawned worker and flush tasks (e.g., using a tokio::task::JoinSet).

The shutdown method would first signal all workers to stop, then wait for all worker and flush tasks to complete.

The server's shutdown sequence would need to be updated to call this new shutdown method.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a43f5f20de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-26T04:13:17Z

src/servers/src/pending_rows_batcher.rs

+                            if batch.row_count == 0 {
+                                batch.created_at = Some(Instant::now());
+                                batch.ctx = Some(ctx);


Avoid batching requests with differing query hints

The batch key only includes catalog/schema/physical_table, and the batch context is captured once on the first submit (batch.ctx = Some(ctx)). If two remote_write requests target the same physical table but carry different QueryContext extensions (e.g., x-greptime-hint-ttl, append_mode, skip_wal), they can be merged into the same batch and flushed using the first request’s context. That means later requests’ hints are ignored, leading to writes with the wrong TTL/WAL or table options. Consider either including relevant context options in the batch key or keeping per-request context on a per-chunk basis.

Useful? React with 👍 / 👎.

Signed-off-by: jeremyhi <[email protected]>

feat: metric batch 2s PoC

a43f5f2

Signed-off-by: jeremyhi <[email protected]>

fengjiachun requested a review from a team as a code owner January 26, 2026 04:08

github-actions bot added size/M docs-not-required This change does not impact docs. labels Jan 26, 2026

gemini-code-assist bot reviewed Jan 26, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Jan 26, 2026

View reviewed changes

fengjiachun added 2 commits January 28, 2026 18:51

chore: max_concurrent_flushes

953d375

Signed-off-by: jeremyhi <[email protected]>

chore: work channel size

9041286

Signed-off-by: jeremyhi <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: metric batch 2s #7613

feat: metric batch 2s #7613

fengjiachun commented Jan 26, 2026

Uh oh!

gemini-code-assist bot commented Jan 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

gemini-code-assist bot Jan 26, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: metric batch 2s #7613

Are you sure you want to change the base?

feat: metric batch 2s #7613

Conversation

fengjiachun commented Jan 26, 2026

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

PR Checklist

Uh oh!

gemini-code-assist bot commented Jan 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant