-
Notifications
You must be signed in to change notification settings - Fork 1.5k
topdown: eliminate closure allocations in Set and virtual doc enumeration #8242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
topdown: eliminate closure allocations in Set and virtual doc enumeration #8242
Conversation
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
anderseknert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good to me! 👏
Could you rename the benchmark file eval_bench_test.go to align with the naming convention we use for benchmarks? Then OK to merge to me 👍
|
Also, I'm curious to know what the impact on ns/op is here. I'd expect some given the substantial impact on B/op. Did you check that? |
Before optimization (with closures)After optimization (method values + pointer)The optimization not only reduces memory allocations (B/op -3.6%) but also improves execution time by 5-6% on most benchmarks. The minimal increase in allocs/op (+0.0002%) is within measurement noise. Also renamed the benchmark file to follow project conventions. |
|
Thanks, Alex! |
…tion Replace closure allocations in evalTree.enumerate with method values for Set iteration and virtual document traversal. Set enumeration now uses Slice() instead of Iter(callback), and virtual doc enumeration uses enumerateNext helper instead of inline closures. Signed-off-by: alex60217101990 <[email protected]>
Add BenchmarkEnumerateComprehensions to measure memory impact of closure optimizations in evalTree.enumerate with set/array comprehensions over large datasets. Generates 10K nested objects with realistic structure and exercises multiple comprehension patterns. Signed-off-by: alex60217101990 <[email protected]>
Reduce evalTree copying by using pointer in enumerateNext structure and create single instance instead of two. Fields are ordered for optimal memory alignment (interface 16 bytes, then pointers 8 bytes). Rename enumerate_benchmark_test.go to enumerate_bench_test.go to follow project naming conventions. Signed-off-by: alex60217101990 <[email protected]>
32e6066 to
619f9d8
Compare
Addresses #8240 (part 1 of 3, as suggested by @anderseknert)
What
Replaces closure allocations in
evalTree.enumerate()with method values to reduce memory overhead during policy evaluation over large datasets.Applied to Array, Object, Set, and virtual document enumeration.
For Set specifically, also changed from
doc.Iter(callback)todoc.Slice()to eliminate the iterator closure as well.Why
Policies with comprehensions over 10K+ objects allocate closures on every iteration. For example:
With 10K users, this creates 10K+ closures in the enumerate hot path.
Benchmark
Created
BenchmarkEnumerateComprehensionsthat exercises set/array comprehensions over 10K nested objects with multiple levels of field access.Test environment:
Results (benchstat):
Memory profiling (
go tool pprof -alloc_space) shows the real impact:Direct enumerate allocations reduced by 87.5% (160 MB → 20 MB). This eliminates thousands of short-lived closure allocations that were creating GC pressure.
When this helps
Policies with comprehensions over large in-memory datasets:
The benefit scales with dataset size and comprehension count.
Notes
While this isn't the most impactful of the three optimizations in terms of total memory (~3-4%), profiling reveals the real benefit is in allocation behavior. The 87.5% reduction in enumerate allocations (160 MB → 20 MB per query) eliminates thousands of short-lived closures that were creating allocation churn and GC pressure. Under high query load (1000 QPS), this saves ~140 GB/sec of allocation bandwidth, leading to shorter GC pauses and more predictable latency.
Related to the other two PRs mentioned in #8240 (lazyObj optimizations and binding allocations).