-
Notifications
You must be signed in to change notification settings - Fork 759
feat(bindings): add support for metric aggregation #5709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| /// point it can be exported. This is only used in the drop impl. | ||
| exporter: std::sync::mpsc::Sender<HandshakeRecord>, | ||
|
|
||
| sample_count: AtomicU64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is sample count? Maybe a comment?
| negotiated_signatures: [AtomicU64; SIGNATURE_COUNT], | ||
|
|
||
| /// sum of handshake duration, including network latency and waiting | ||
| handshake_duration_us: AtomicU64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm not sure I understand the comment. Is this like, an average handshake duration? Seems very obvious that lines 48-51 are counts over time, but I can't tell about 54 and 56... they can't be counts, right?
| ///////////////////// fields from connection /////////////////////// | ||
| //////////////////////////////////////////////////////////////////////// | ||
|
|
||
| conn.selected_signature_scheme() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fields from connection versus fields from event 🤔 what is the difference?
| } | ||
|
|
||
| #[derive(Debug, Clone)] | ||
| pub(crate) struct HandshakeRecord { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a more accurate name for this struct FrozenHandshakeRecord?
| let start = Instant::now(); | ||
| current_record.update(connection, event); | ||
| let elapsed = start.elapsed(); | ||
| println!("{elapsed:?}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
| /// The [`s2n_tls::events::EventSubscriber`] may be invoked concurrently, which | ||
| /// means that multiple threads might be incrementing the current record. To handle | ||
| /// this and ensure that the `S2NMetricRecord` is never flushed while an update | ||
| /// is in progress we use an [`arc_swap::ArcSwap`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of a nit, the arc_swap is in the MetricSubscriberInner, don't know if this comment makes sense on this struct.
| Self { handshake } | ||
| } | ||
| } | ||
| /// The S2NMetricRecord stores various metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you changed the name of S2NMetricRecord? It's now HandshakeRecordInProgress?
| metric_receiver: Receiver<HandshakeRecord>, | ||
| exporter: E, | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm that my mental model of this feature is correct:
- s2n-tls callback triggers on handshake event for each handshake. Current handshake record is updated with new metrics.
- At a certain point the application calls finish_record(). This swaps out the old record with a new one, creates the non-atomic equivalent of the old record, and sends that record to the Exporter, which I guess hasn't really been defined yet.
| const SIGNATURE_COUNT: usize = SIGNATURE_SCHEMES_AVAILABLE_IN_S2N.len(); | ||
| const PROTOCOL_COUNT: usize = VERSIONS_AVAILABLE_IN_S2N.len(); | ||
|
|
||
| /// Metric Record is an opaque type which implements [`metrique_writer::Entry`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it actually implement metrique_writer? I don't see that anywhere.
| /// This is the preferred type for public s2n-tls-metric-subscriber traits and | ||
| /// interfaces. | ||
| // This currently just holds a single struct. In the future we will | ||
| // likely rely on an enum to handle different record types, e.g. SessionResumptionFailure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm you mean hypothetically SessionResumptionFailure is a type of HandshakeRecord? Trying to picture what this would look like.
Did you try to implement this for another "record type" and see if there are any breaking changes?
| mod test_utils; | ||
|
|
||
| pub use crate::record::MetricRecord; | ||
| pub use subscriber::AggregatedMetricsSubscriber; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should we use crate:: for both re-exports (or neither)? Either is fine, but just pointing out for consistency.
| // Assumption: durations are less than 500,000 years, otherwise this cast | ||
| // will panic | ||
| self.handshake_compute_us.fetch_add( | ||
| event.synchronous_time().as_micros() as u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn’t this cast from u128 micros to u64 truncate rather than panic? This large of a number may be unlikely to occur in practice, but using u64::try_from(...) could make that upper bound explicit.
| } | ||
| } | ||
|
|
||
| /// Maps from the the s2n cipher string representation to the array of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfinished comment
| let start = Instant::now(); | ||
| current_record.update(connection, event); | ||
| let elapsed = start.elapsed(); | ||
| println!("{elapsed:?}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a stale print statement.
| /// | ||
| /// The [`s2n_tls::events::EventSubscriber`] may be invoked concurrently, which | ||
| /// means that multiple threads might be incrementing the current record. To handle | ||
| /// this and ensure that the `S2NMetricRecord` is never flushed while an update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S2NMetric Record should be HandshakeRecordInProgress now
Goal
Add the ability to aggregate handshake events into a single metric record.
Why
This functionality will be used to emit EMF records to CloudWatch, giving s2n-tls application a much clearer few of their TLS stack.
How
We want to avoid any locks, so the aggregation is done using arrays of atomic integers.
At a high level, there is an "in-progress" record which may be access from multiple threads. This in-progress records has the counters. Then when the record is finished, we "freeze" it and export it.
Testing
I added unit tests.
Related
Depends on #5708
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.