-
Notifications
You must be signed in to change notification settings - Fork 1.5k
v1/ast: Add lazy hash evaluation to Array type #8155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
v1/ast: Add lazy hash evaluation to Array type #8155
Conversation
4141a17 to
cb4a46b
Compare
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
That race detected in tests is probably genuine. Can you have a look? Also, let's better discuss improvements like this before diving into it, I could have made you aware of the proneness to data races in this approach 💭 |
d489ff0 to
9dc3e60
Compare
|
@srenatus |
73b2485 to
9f31e14
Compare
johanfylling
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! 👋
Some questions regarding concurrency.
v1/ast/term.go
Outdated
| ground bool | ||
| elems []*Term | ||
| ground bool | ||
| hash atomic.Int64 // cached hash value (atomic for race-free access) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an understanding of how this affects performance for concurrent eval; e.g. when OPA is run as a server and multiple queries might contend over this value?
Maybe there should be a separate path of instantiation for when the array is generated in topdown during evaluation and when the array is coming from storage and is never expected to change. In the latter case, pre-hashing is likely always preferred.
v1/ast/term.go
Outdated
| elems []*Term | ||
| ground bool | ||
| hash atomic.Int64 // cached hash value (atomic for race-free access) | ||
| hashValid atomic.Uint32 // 0 = invalid, 1 = valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though hash and hashValid are atomic, they could still drift in a concurrent context, right? If the array was read from storage without pre-calculated hash, is it possible for one query to get an updated hashValid value, but a stale hash value, making a comparison invalid?
v1/ast/term.go
Outdated
| // Try to atomically set valid flag using CAS | ||
| if arr.hashValid.CompareAndSwap(0, 1) { | ||
| // We won the race, store the hash | ||
| arr.hash.Store(int64(h)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look like it's happening atomically within the swap. Could we not be returning a stale hash value in another routine up top in this func (arr.hashValid.Load() == 1)?
Refactor Array structure to use lazy hash evaluation instead of maintaining a separate hashs slice, reducing memory overhead by 8*N bytes per array. Key changes: - Remove hashs []int slice from Array struct - Add hashValid bool flag to track hash computation state - Implement lazy hash computation in Hash() method - Optimize Append() with incremental hash updates when hash is already computed - Update Copy(), Sorted(), Slice() to work with new hash model - Simplify rehash() to just invalidate the hash cache This optimization reduces memory allocations while maintaining performance through intelligent caching and incremental updates. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
Add comprehensive benchmarks to measure the impact of lazy hash computation optimization for Array operations. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
Eliminate data race in lazy hash computation by using atomic operations for the hashValid flag. This ensures thread-safe access when multiple goroutines call Hash() concurrently on the same Array. Changes: - Use atomic.LoadUint32/StoreUint32 for hashValid flag - Implement CompareAndSwapUint32 for race-free hash computation - Preserve lazy evaluation and O(1) cached access performance Performance impact: - Maintains 3ns hash access time for cached values - 54% faster overall array operations (geomean) - 64% less memory usage across benchmarks - Zero lock contention (lock-free fast path) Passes all tests with -race flag enabled. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
Remove obsolete hashs field initialization in NewArrayWithCapacity that was missed during the merge with upstream main branch. The hashs field was removed as part of the lazy hash computation optimization for Array type. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
Replace raw atomic function calls with atomic types to eliminate data races in concurrent hash access and computation. Changes: - Use atomic.Int64 for hash field instead of int - Use atomic.Uint32 for hashValid field instead of uint32 - Replace atomic.Load/Store/CompareAndSwap functions with methods - Fix data races in Hash(), Copy(), Sorted(), Append(), and rehash() All ast tests pass with race detector. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
Fix linter errors related to copying lock values in Array struct. Changes: - Rewrite Array.Append() to avoid copying atomic fields by value - Change SubResultMap.Update() to accept *ast.Array pointer - Update all call sites to pass pointer instead of dereferenced value - Fix gofmt formatting for atomic field alignment Resolves govet copylocks warnings while maintaining thread-safety. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
Replace separate hash/hashValid atomics with single atomic.Int64 using math.MinInt64 sentinel. Eliminates drift between hash and validity flag, prevents stale reads, ensures atomic operations. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
2f319a0 to
73faa63
Compare
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
@johanfylling Thank you for the thorough review! You're absolutely right about the race condition - very constructive feedback. I've refactored the implementation to fix the issues you pointed out. Replaced the separate Also added concurrent tests covering multiple goroutines accessing the same array simultaneously, which all pass cleanly with When you have a chance, could you take a look? I believe this addresses all the concerns you raised. |
Array Lazy Hash Computation Optimization
Why the changes in this PR are needed?
The current
Arrayimplementation maintains a separatehashs []intslice that stores precomputed hashes for each element. This leads to:Many arrays are created for temporary use and never participate in hash-based operations (e.g., map lookups or set operations).
What are the changes in this PR?
Refactor the
Arraystructure to use lazy hash computation:Key Changes:
Removed
hashs []intfield from theArraystructAdded
hashValid boolflag to track hash computation stateImplemented lazy evaluation in the
Hash()methodIncremental hash updates in
Array.Append()hash += newElement.Hash()Updated all related methods:
NewArray()- no longer computes hashes at creation timeCopy()- copies thehashValidflagSorted()- preserves computed hash (sorting doesn't change hash)Slice()- creates a slice with invalid hashrehash()- simplified to just invalidate the cacheset()- invalidates hash instead of recomputingBenchmark Results
Key Improvements:
Array Creation (ArrayCreation)
Append Operations (ArrayAppend)
Array Copy (ArrayCopy)
Slice Operations (ArraySlice)
Set Operations (ArraySet)
Operations Without Hash Access (ArrayNoHashAccess)
This benchmark demonstrates the real benefit of lazy evaluation - when hash is not needed:
Overall Geometric Mean:
Full benchstat Results
Detailed results are available in: