-
Notifications
You must be signed in to change notification settings - Fork 654
api: OIIO_CONTRACT_ASSERT and other hardening improvements #5006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Any comments on this? |
jessey-git
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That 20% sounds pretty scary actually and I don't always want to hide behind the disk-IO time monster. All CPU cycles are important because OIIO is used for more than just offline/overnight jobs. Sometimes there's an artist staring at a progress bar or doing a viewport playblast that will appreciate faster pixel processing.
I can't do so tonight but I'd want to check some other scenarios of interest to see how they're affected (if at all). It might turn out we should, in the same release as this hardening, also change some important ImageBufAlgo's to use range-fors if possible or similar if they're impacted enough.
src/include/OpenImageIO/dassert.h
Outdated
|
|
||
| // OIIO_ASSERTION_RESPONSE_DEFAULT defines the default response to failed | ||
| // contract assertions. By default, in NONE hardening mode and in release | ||
| // builds, we do nothing. In all other cases, we abort. But any translation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment says that by default we'll do nothing, but it looks like we will actually enforce. Which is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I changed, now by default we enforce. I will update the comment.
| // contract assertions. By default, in NONE hardening mode and in release | ||
| // builds, we do nothing. In all other cases, we abort. But any translation | ||
| // unit (including clients of OIIO) may override this by defining | ||
| // OIIO_ASSERTION_RESPONSE_DEFAULT before including any OIIO headers. But note |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably remove this part of the comment since attempting to set the define like this will most likely lead to a lot of surprises. It'll give a false sense of accomplishment and wouldn't be a complete workaround. Probably best to instead update the build documentation with the correct way to set this for folks building on their own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably remove this part of the comment since attempting to set the define like this will most likely lead to a lot of surprises.
Well, what I'm trying to account for here is that these are public headers that might be used by other projects or apps. There's "the behavior of OIIO internals, determined at build time", and there's also "how MY software behaves when I'm using OIIO's utility classes."
src/libutil/span_test.cpp
Outdated
| bench("span operator[]", [&]() { | ||
| int t = 0; | ||
| for (size_t i = 0; i < f.size(); ++i) | ||
| DoNotOptimize(t += f[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are testing std::array rather than our span it looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, will fix and re-time
The 20% is just on the raw array access of the cheapest possible thing (int/float). In the context of doing anything with those values, it will be less. If you are worried about every last cycle, you can build with it turned off. Or argue that the default should be turned off, which I can easily be convinced is reasonable. But there is an important assumption that underlies all of this that I want to re-emphasize: New versions of gcc/libstdc++ and clang/libc++ already have these checks in operator[] of std::array, std::span, std::vector, and as many other places as they can jam them, and the default is to enforce it (and I believe C++26 more or less mandates it?). Whatever complaints we have about perf are going to be found virtually everywhere, unless care is made to turn them off for the standard library as well. So I'm mostly going on the assumption that we're not doing anything worse than C++ std classes moving forward, and also that there will be pressure on compilers to make stronger inferences about those common constructs in loops and whatnot to be even more effective at optimizing them away entirely when it's clear from the loop bounds/etc that it will never violate the contract.
I'm very much in favor of finding places whose performance is impacted and refactoring them in a way that's guaranteed "safe by design" (in the way that range-for is) rather than empirically range checking for every access. It was always my intention to follow up with that as any problems are discovered. One possible approach is to change the default from 'enforce' to 'none' for now, merge this (i.e. it only does enforcement if you build it that way), and benchmark various things, fixing as I go. When we are more confident that it doesn't meaningfully hurt anything we care about, then bump the default back to 'enforce'. |
Review: we have long had two assertion macros: OIIO_ASSERT which
aborts upon failure in Debug builds and prints but continues in
Release builds, and OIIO_DASSERT which aborts in Debug builds and is
completely inactive for Relase builds.
Inspired by C++26 contracts, and increasingly available "hardening
modes" in major compilers (especially with the LLVM/clang project's
libc++), I'm introducing some new verification helpers.
New macro `OIIO_CONTRACT_ASSERT` more closely mimics C++26
contract_assert in many ways, and perhaps will simply wrap C++
contract_assert when C++26 is on our menu.
Important ways that OIIO_CONTRACT_ASSERT differs from OIIO_ASSERT and
OIIO_DASSERT in a few ways, described below.
* Keeping in line with C++ contracts, there are 4 possible responses
to a failed contract assertion: Ignore, Observe (print only),
Enforce (print and abort) and Quick-Enforce (just abort).
* By default, the contract failure response is Ignore for release
builds and Enforce for debug builds. But it's overrideable
(independent of Release/Debug, and optionally on a
per-translation-unit basis) by setting
OIIO_ASSERTION_RESPONSE_DEFAULT before any OIIO headers are
included.
* Also define hardening levels: None, Fast, Extensive, and Debug,
mimicking the levels of libc++. The idea is that maybe there will
be some CONTRACT_ASSERT checks you only want to do for certain
hardening levels.
* Macros for explicit hardening levels: OIIO_HARDENING_ASSERT_FAST(),
EXTENSIVE(), and DEBUG(), which call CONTRACT_ASSERT only when the
hardening level is what's required or stricter.
I also changed the bounds checking in operator[] of string_view, span,
and image_span to use the contract assertions. Note that this adds a
little bit of overhead, since the default is "enforce" for release
builds. I added some benchmarking that proves that the bounds check
adds only about 20% overhead to an element access for a trivial
`span<float>`.
For more complex things, or code that does more than just repeatedly
access elements with bounds checks, I expect this overhead to be
negligible. Since libc++ and upcoming C++ standards do the same for
most container types, I expect the compilers to get better and better
at eliding these checks when they can determine that it's an in-bounds
access.
Also please note that one way to avoid these extra bounds checks
entirely is to change an index-oriented loop like
span s;
for (size_t i = 0; i < s.size(); ++i)
foo(s[i]); // maybe bounds check on each iteration?
to a range based loop:
span s;
for (auto& v : s)
foo(v);
which is inherently safe and requires no in-loop checks at all.
Signed-off-by: Larry Gritz <lg@larrygritz.com>
I didn't immediately realize this when I read it originally, but just to be clear: none of the IBA functions use image_span. They all use ImageBuf::iterator, which are like range-for and know the bound without needing to check each address. Currently, we use image_span as a convenient way to encapsulate ptr + multi-dim sizes + multi-sim strides in one object. But I don't think there's anyplace where we iterate over it element by element and use operator[] every time. For performance, it's span that we should be worried about. But I fixed the tests last night and am being much more careful now, and on Mac/clang at least, I'm seeing barely any measurable penalty at all from the range check. I will confirm on Linux/gcc today and post the results and an updated PR. But unless you can spot something I've done wrong with my benchmarking methodology, I think (surprisingly) that there may not be a perf hit that we should worry about at all. |
… to IGNORE Signed-off-by: Larry Gritz <lg@larrygritz.com>
|
I have pushed an update -- changed some defaults, beefed up the benchmarks and fixed some flaws with them. I totally replaced the PR description, please reread. I include benchmark results. Now that I'm being much more careful, I'm not seeing a 20% penalty, as I originally reported, under any circumstance. I think I was measuring the wrong thing. Now I'm seeing about 1.5% hit for range checking -- just the benchmarked raw span access, not any performance of code overall in context -- on a Linux box I control, no difference on Linux (with a newer gcc) and Windows (with MSVS) that I don't fully control, no statistically discernible difference on a Mac Intel machine I control, and about 4% on a Mac ARM I do not control (I have a new Mac ARM machine arriving this week, I will re-test on that machine after it arrives). In short, I can barely (and sometimes not at all) demonstrate a slowdown when isolating this one operation with the extra check. I'm inclined to keep the default to be to do the range checking all the time, though still overrideable on people's individual builds if you really want to YOLO. |
Review: we have long had two assertion macros: OIIO_ASSERT which aborts upon failure in Debug builds and prints but continues in Release builds, and OIIO_DASSERT which aborts in Debug builds and is completely inactive for Relase builds.
Inspired by C++26 contracts, and increasingly available "hardening modes" in major compilers (especially with the LLVM/clang project's libc++), I'm introducing some new verification helpers.
New macro
OIIO_CONTRACT_ASSERTmore closely mimics C++26 contract_assert in many ways, and perhaps will simply wrap C++ contract_assert when C++26 is on our menu.Important ways that OIIO_CONTRACT_ASSERT differs from OIIO_ASSERT and OIIO_DASSERT:
Keeping in line with C++ contracts, there are 4 possible responses to a failed contract assertion: Ignore, Observe (print only), Enforce (print and abort) and Quick-Enforce (just abort).
Also define hardening levels: None, Fast, Extensive, and Debug, mimicking the levels of libc++. The idea is that maybe there will be some CONTRACT_ASSERT checks you only want to do for certain hardening levels.
By default, the contract failure response is Enforce, unless it's both a release build and the hardening level is set to None (in which case the response will be Ignore). But it's also overrideable optionally on a per-translation-unit basis by setting OIIO_ASSERTION_RESPONSE_DEFAULT before any OIIO headers are included (though obviously that only applies to inline functions or templates, not to any already-compiled code in the library).
Macros for explicit hardening levels: OIIO_HARDENING_ASSERT_FAST(), EXTENSIVE(), and DEBUG(), which call CONTRACT_ASSERT only when the hardening level is what's required or stricter.
I also changed the bounds checking in operator[] of string_view, span, and image_span to use the contract assertions. Note that this adds a tiny bit of overhead, since the default is "enforce" for release builds (previously, using OIIO_DASSERT, it did no checks for release builds). But the benchmarks seem to idicate that the perf difference is barely measurable.
I added some benchmarking that proves that the bounds check adds a minute overhead to an element access for a trivial
span<float>, maybe even indescernable. Here are benchmarks comparing raw pointer access, std::array access, span access with the new checks, span access carefully bypassing the tests.Linux workstation, gcc-11, on my work computer:
These are the most stable tests I have, with the least trial-to-trial variation, and show about a 1.5% speed hit on the bounds-checked span access itself, which I think will be truly un-measurable in the context of being interleaved with any other operations that you do with the data you pull from the span.
Mac Intel, Apple Clang 17, on my (old) personal laptop: (much more variable timing, probably from MacOS scheduler quirks)
You can see that here there is no obvious penalty, in fact it appears a little faster, but all within the timing uncertainty of the multiple trials, so statistically it's hard to discern any penalty.
And a couple more for good measure from our CI, but note that because these are uncontrolled machines somewhere on the GitHub cloud, the timings might not be as reliable:
Windows, MSVS 2022:
Linux, gcc-14, C++20:
MacOS ARM:
Windows with MSVS and Linux with newer g++ don't appear to show any penalty, and the bracketing of trial times indicates that maybe it's consistent enough to be meaningful? I can't think of anything I'm doing wrong here that would throw off the timing or disable the range checking on these tests.
For MacOS ARM, the span looks like it has about a 4% penalty versus raw pointers? But OTOH, span bounds-checked vs non-checked vs range-for are all the same, so maybe the speed vs raw pointer is something else entirely?
Also please note that a preferred way to avoid these extra bounds checks entirely is to change an index-oriented loop like
to a range based loop:
which should be inherently safe and require no in-loop checks at all.