plugin/decision: upload events as soon as a chunk is ready #8110

sspaink · 2025-12-03T18:38:31Z

Why the changes in this PR are needed?

resolve: #7455

What are the changes in this PR?

This introduces a new trigger mode for the decision log plugin:

decision_logs.reporting.trigger=immediate

The immediate trigger mode will upload events as soon as enough events are received to hit the configured upload limit. If not enough events are received within the configured min-max delay, the events received so far are flushed and uploaded.

For the event buffer type this means if enough events are received they could be uploaded sooner than the configured min-max delay allowing the buffer to empty quicker preventing any dropped events. While for the size buffer uploads could also happen sooner but regardless of the trigger mode dropped events aren't as likely, given the default unlimited size and the fact the events are stored as chunks. Although in the immediate mode both buffer types do allow the chunks of events to be uploaded as a stream, opposed to multiple chunks uploaded in bursts.

Notes to assist PR review:

A contrived example to help demonstrate the benefit, using a small buffer size limit and a long delay time:

Setup the following config (opa-conf.yaml):

services:
  logeater:
    url: http://localhost:8080

status:
  console: true

decision_logs:
  service: logeater
  reporting:
    buffer_type: event
    trigger: periodic
    buffer_size_limit_events: 100
    min_delay_seconds: 10
    max_delay_seconds: 20

Have this simple Rego file (example.rego)

package example

allow if {
    true
}

Run OPA: ./opa_darwin_arm64 run -c opa-conf.yaml --server ./example.rego
Run the logeater service (just a service to receive the logs): go run main.go
Attack OPA with 5000 events: echo 'POST http://localhost:8181/v1/data/example/allow' | vegeta attack --duration=10s -rate=500 | tee results.bin | vegeta report

Now if you check http://localhost:8181/v1/status you will see a shocking metric counter_decision_logs_dropped_buffer_size_limit_exceeded 4800. This is because vegeta is sending 500 requests per second for 10 seconds and the buffer managed to send only 100 events. The other 100 are in the buffer.

Now if you update the config to use the new trigger mode trigger: immediate, checking /v1/status again you will see no events were dropped! You do see some other fun metrics of the encoder attempting to adjust the guessed uncompressed limit:

{
    "counter_enc_uncompressed_limit_scale_down": 7,
    "counter_enc_uncompressed_limit_scale_up": 10
}

These metrics didn't show up for the periodic mode because they are reported by the encoder, which didn't get run enough times to scale the uncompressed limit.

Data

Attacking OPA configured with different buffer types and triggers for 30 seconds also illustrates what I described above. Periodic uploads in bursts and the immediate as a stream. Looks like the encoder stabilizes trying to guess the uncompressed limit sooner with the immediate mode with the event buffer as well. Using the default size limits no events are dropped.

Used an updated logeater service that spits out a graph (code here).

Event, Immediate

buffer_type: event
trigger: immediate
min_delay_seconds: 10
max_delay_seconds: 20

Average Duration between uploads: 970.387093ms
Max Duration between uploads: 983.76925ms

Event, Periodic

buffer_type: event
trigger: periodic
min_delay_seconds: 10
max_delay_seconds: 20

Average Duration between uploads: 593.645934ms
Max Duration between uploads: 16.326999458s

Size, Immediate

buffer_type: size
trigger: immediate
min_delay_seconds: 10
max_delay_seconds: 20

Average Duration between uploads: 974.381334ms
Max Duration between uploads: 3.96508625s

Size, Periodic

Dropped chunks of event, and gaps between uploads

buffer_type: size
trigger: periodic
min_delay_seconds: 10
max_delay_seconds: 20

Average Duration between uploads: 439.323843ms
Max Duration between uploads: 12.64677575s

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

netlify · 2025-12-03T18:41:15Z

✅ Deploy Preview for openpolicyagent ready!

Name	Link
🔨 Latest commit	`9f7c4c6`
🔍 Latest deploy log	https://app.netlify.com/projects/openpolicyagent/deploys/69375e488ec9c10008e6fc8e
😎 Deploy Preview	https://deploy-preview-8110--openpolicyagent.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

Signed-off-by: Sebastian Spaink <3441183+sspaink@users.noreply.github.com>

netlify · 2025-12-15T17:06:59Z

✅ Deploy Preview for openpolicyagent ready!

Name	Link
🔨 Latest commit	`0886246`
🔍 Latest deploy log	https://app.netlify.com/projects/openpolicyagent/deploys/694044cb057d4500089a9847
😎 Deploy Preview	https://deploy-preview-8110--openpolicyagent.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

netlify · 2026-01-05T15:49:17Z

✅ Deploy Preview for openpolicyagent ready!

Name	Link
🔨 Latest commit	`baa066f`
🔍 Latest deploy log	https://app.netlify.com/projects/openpolicyagent/deploys/695bdcc7ada2580008b6590e
😎 Deploy Preview	https://deploy-preview-8110--openpolicyagent.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

johanfylling

Thanks!

Some thoughts/questions.

v1/plugins/logs/plugin.go

v1/plugins/logs/eventBuffer.go

johanfylling · 2026-01-13T13:49:55Z

v1/plugins/logs/plugin.go

-						retry++
-					} else {
-						retry = 0
+				timer := time.NewTimer(delay)


We're not resetting the timer when a flush has been triggered in immediate mode? I.e. if we a have fraction of the timer delay left, and a new log event has triggered an upload, that fraction will be added on top of the next timer delay?

Correct that is the case. I struggled implementing a solution that would reset this timer consistently on immediate upload. I had tried with a new channel that would reset the timer, but then the case where the timer triggers before the channel can send the upload required a mutex that just made things even more complicated. So I decided the added time in this scenario was acceptable given the events should usually be uploaded immediately and not rely on the timer.

v1/plugins/logs/eventBuffer.go

v1/plugins/logs/plugin_test.go

v1/plugins/logs/sizeBuffer.go

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

netlify · 2026-01-13T16:05:09Z

✅ Deploy Preview for openpolicyagent ready!

Name	Link
🔨 Latest commit	`63e874d`
🔍 Latest deploy log	https://app.netlify.com/projects/openpolicyagent/deploys/6977c4f17e9c160008fa6e2d
😎 Deploy Preview	https://deploy-preview-8110--openpolicyagent.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

johanfylling

This solution should work, I think 👍.

Is the reconfiguration concern warranted, and could something be made about it if so?

v1/plugins/logs/sizeBuffer.go

v1/plugins/logs/eventBuffer.go

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

johanfylling

Thanks!

johanfylling · 2026-01-26T13:56:16Z

v1/plugins/logs/eventBuffer.go

+		case item := <-b.buffer:
+			b.immediateRead(ctx, item)
+		case done := <-b.stop:
+			b.flush(ctx)


Is this flush necessary, considering we expect the outer plugin to immediately call Flush() on the buffer anyways?

And when reconfiguring the same buffer, Reconfigure() will move events between buffers, anyways, right?

Moving events between the buffers has made it possible to remove this, deleted 👍

johanfylling · 2026-01-26T14:12:03Z

v1/plugins/logs/sizeBuffer.go

 }

-func (b *sizeBuffer) Reconfigure(bufferSizeLimitBytes int64, uploadSizeLimitBytes int64, maxDecisionsPerSecond *float64) {
+func (b *sizeBuffer) Reconfigure(


Complexity-wise, I wonder if we really need this buffer-specific reconfigure 🤔. If the plugin always tore down the old, flushed, and set up a new buffer on config change, we'd have one less edge-case to worry about.
That might come with some hit to performance, but config changes are rare enough that I'm not sure that's a big concern.
On the other hand, if we keep redesigning this, we'll never get it done 😄. Fine to leave it as-is.

Removing that edge case is much better! getting rid of any complexity in this code is huge win. I can't imagine people are re-configuring often while OPA is running.

I removed the individual reconfigure methods and now it always creates a new instance of the buffer.

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

…ng the plugin Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

…osed and restart main loop Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

plugin/decision: upload events as soon as a chunk is ready

bc859e3

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

sspaink and others added 5 commits December 3, 2025 18:01

fix race condition

c55724f

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

rethink how timer is reset

e23aec0

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

add more tests

2bd69bf

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

Merge branch 'main' into immediatelogs

0181fd1

configure size buffer correctly

c38e43a

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

sspaink marked this pull request as ready for review December 5, 2025 00:23

add immemdiate mode to docs

9f7c4c6

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

sspaink mentioned this pull request Dec 15, 2025

fix(plugin/decision): set the correct limit after upload #8126

Merged

Merge branch 'main' into immediatelogs

43198bf

Signed-off-by: Sebastian Spaink <3441183+sspaink@users.noreply.github.com>

sspaink and others added 3 commits December 15, 2025 11:12

fix tests broken after resolving merge conflicts

9befb56

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

fmt

0886246

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

Merge branch 'main' into immediatelogs

baa066f

johanfylling reviewed Jan 13, 2026

View reviewed changes

resolve feedback

a0f3d0b

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

sspaink added 3 commits January 13, 2026 10:17

update processBufferItem to log both errors

3e5c680

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

upload chunk in separate goroutine

4e7bbc5

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

fix test

d34e267

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

johanfylling reviewed Jan 20, 2026

View reviewed changes

v1/plugins/logs/sizeBuffer.go Show resolved Hide resolved

v1/plugins/logs/eventBuffer.go Outdated Show resolved Hide resolved

sspaink added 3 commits January 20, 2026 16:00

flush out events from buffers during reconfigure in case upload fails

a0e171a

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

check if logger isn't nil

13e79bf

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

copy events using a pointer

a7df67c

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

johanfylling approved these changes Jan 26, 2026

View reviewed changes

sspaink added 3 commits January 26, 2026 09:55

remove calling flush when ending immediate read loop

0bdc929

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

remove individual reconfig, create new struct everytime

bdb4ba6

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

Merge branch 'main' into immediatelogs

f4d50d3

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

sspaink added 3 commits January 26, 2026 10:55

Make sure encoder is flushed when calling flush

3dfecff

Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

fix race condition in test, set the test server channel before starti…

28a9df9

…ng the plugin Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

fix race condition during reconfigure, make sure the timer loop is cl…

63e874d

…osed and restart main loop Signed-off-by: Sebastian Spaink <sebastianspaink@gmail.com>

plugin/decision: upload events as soon as a chunk is ready #8110

Are you sure you want to change the base?

plugin/decision: upload events as soon as a chunk is ready #8110

Uh oh!

Conversation

sspaink commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why the changes in this PR are needed?

What are the changes in this PR?

Notes to assist PR review:

Data

Event, Immediate

Event, Periodic

Size, Immediate

Size, Periodic

Uh oh!

netlify bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for openpolicyagent ready!

Uh oh!

netlify bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for openpolicyagent ready!

Uh oh!

netlify bot commented Jan 5, 2026

✅ Deploy Preview for openpolicyagent ready!

Uh oh!

johanfylling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johanfylling Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

sspaink Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

netlify bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for openpolicyagent ready!

Uh oh!

johanfylling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

johanfylling left a comment

Choose a reason for hiding this comment

Uh oh!

johanfylling Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

johanfylling Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

sspaink Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

johanfylling Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

sspaink Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sspaink commented Dec 3, 2025 •

edited

Loading

netlify bot commented Dec 3, 2025 •

edited

Loading

netlify bot commented Dec 15, 2025 •

edited

Loading

netlify bot commented Jan 13, 2026 •

edited

Loading