permit up to 99% assume() failures #4643

ajdavis · 2026-01-18T22:01:15Z

Closes #4623.

hypothesis-python/src/hypothesis/internal/conjecture/engine.py

Liam-DeVoe

Thanks @ajdavis!

I'd love to see two tests:

If we unconditionally assume(False), Hypothesis runs exactly INVALID_THRESHOLD_BASE test cases before erroring.
If we assume(False) for all but n test cases, Hypothesis runs exactly INVALID_THRESHOLD_BASE + n * INVALID_PER_VALID test cases before erroring.

Liam-DeVoe · 2026-01-19T01:54:09Z

hypothesis-python/src/hypothesis/internal/conjecture/engine.py

+# Statistical thresholds for assumption satisfaction rate.
+# We want to stop when we're 99% confident the true valid rate is below 1%.
+#
+# With k valid examples, we need n invalid examples such that:
+#     P(seeing <=k valid in n+k trials | true rate = 1%) <= 1%
+#
+# For k=0: (0.99)^n <= 0.01  →  n >= ln(0.01)/ln(0.99) ~= 459
+# Each additional valid example adds ~153 to the threshold (solving the
+# cumulative binomial for subsequent k values).
+#
+# Formula: stop when invalid_examples > INVALID_THRESHOLD_BASE + INVALID_PER_VALID * valid_examples
+INVALID_THRESHOLD_BASE = 459
+INVALID_PER_VALID = 153


this seems cheap enough that we should define INVALID_THRESHOLD_BASE, INVALID_PER_VALID = _calcuate_thresholds() and compute it at runtime, in case we ever want to change the threshold?

Done. I generated _calculate_thresholds() with Claude, of course, and I don't understand it. But it does produce 459 and 153, and I did a quick experiment to prove to myself that those numbers are reasonable.

fwiw @Zac-HD I sat down with this formula and I'm pretty skeptical we can get a reasonable closed-form expression for INVALID_PER_VALID: https://claude.ai/share/a7c1e28e-8f88-48d2-a9ff-d8bb89461cb4.

I think INVALID_THRESHOLD_BASE is correct, but that an estimation of increment based on vibes (and pegged to a particular confidence rate) isn't particularly useful for future us. I'd advocate for keeping the bayesian closed form of INVALID_THRESHOLD_BASE, but stating the increment as 1 / min_valid_rate rather than a bayesian confidence estimate.

This will be systematically more aggressive about exiting than our stated 99% confidence interval. I think this is fine; we currently had/have no confidence interval on master.

(bonus / longer chat that advocates for sqrt as a better approximation. I remain skeptical and do not stand by this.)

I read both chats and I mostly followed them, though I think you're a bit more knowledgable about stats than me. I gather you don't want to add a SciPy dependency, nor do expensive calculations after each trial to determine whether to continue, but on the other hand we're not sure about any of the suggested approximations. I'm not passionately attached to this PR. I'll leave the final decision to you two.

Zac-HD

Approving on the basis that the approximation seems fine to me, and is at minimum a huge improvement over the status quo - in addition to adjusting the actual rate.

Thanks again @ajdavis!

Liam-DeVoe · 2026-01-28T05:40:54Z

Zac got to this in the middle of my doing a bayesian deep dive 😄: #4650. Thanks a lot for contributing here @ajdavis !

ajdavis requested review from Liam-DeVoe and Zac-HD as code owners January 18, 2026 22:01

permit up to 99% assume() failures HypothesisWorks#4623

76d3207

ajdavis force-pushed the issue-4623-filter-condition branch from 0b28145 to 76d3207 Compare January 18, 2026 22:19

fix termination

3141426

ajdavis commented Jan 19, 2026

View reviewed changes

hypothesis-python/src/hypothesis/internal/conjecture/engine.py Outdated Show resolved Hide resolved

ajdavis changed the title ~~permit up to 99% assume() failures #4623~~ permit up to 99% assume() failures Jan 19, 2026

Liam-DeVoe reviewed Jan 19, 2026

View reviewed changes

ajdavis added 3 commits January 18, 2026 21:46

calc thresholds at load time, add 2 tests

f738506

style

031bd20

response to comments

f69cad6

Zac-HD approved these changes Jan 28, 2026

View reviewed changes

Zac-HD merged commit 42126d6 into HypothesisWorks:master Jan 28, 2026
78 checks passed

Liam-DeVoe mentioned this pull request Jan 28, 2026

Adjust invalid examples stopping rule #4650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

permit up to 99% assume() failures #4643

permit up to 99% assume() failures #4643

ajdavis commented Jan 18, 2026 •

edited by Liam-DeVoe

Loading

Uh oh!

Uh oh!

Liam-DeVoe left a comment

Uh oh!

Liam-DeVoe Jan 19, 2026

Uh oh!

ajdavis Jan 19, 2026

Uh oh!

Liam-DeVoe Jan 27, 2026 •

edited

Loading

Uh oh!

ajdavis Jan 27, 2026

Uh oh!

Zac-HD left a comment

Uh oh!

Uh oh!

Liam-DeVoe commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

permit up to 99% assume() failures #4643

permit up to 99% assume() failures #4643

Conversation

ajdavis commented Jan 18, 2026 • edited by Liam-DeVoe Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Liam-DeVoe left a comment

Choose a reason for hiding this comment

Uh oh!

Liam-DeVoe Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

ajdavis Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Liam-DeVoe Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajdavis Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Zac-HD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Liam-DeVoe commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ajdavis commented Jan 18, 2026 •

edited by Liam-DeVoe

Loading

Liam-DeVoe Jan 27, 2026 •

edited

Loading