Skip to content

Commit 5ff3463

Browse files
zzstoatzzclaude
andcommitted
replace automation action validation eval with lease renewal crash eval
The previous eval tested automation Jinja template type mismatches, which required specialized domain knowledge the agent didn't have access to. This new eval tests diagnosing flow runs that crash due to concurrency lease renewal failure - a real user pain point (prefect#19068, prefect#18839) that can be diagnosed from first principles by reading the crash state message. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent cbad8fc commit 5ff3463

File tree

3 files changed

+79
-165
lines changed

3 files changed

+79
-165
lines changed

evals/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,8 @@ Provider defaults:
5757
| **automations/test_create_reactive_automation** | verifies agent can create reactive automations | ✅ implemented | [#47](https://github.com/PrefectHQ/prefect-mcp-server/pull/47) |
5858
| **automations/test_create_proactive_automation** | verifies agent can create proactive automations | ✅ implemented | - |
5959
| **automations/test_debug_automation_not_firing** | verifies agent can debug why an automation didn't fire due to threshold mismatch | ✅ implemented | [#62](https://github.com/PrefectHQ/prefect-mcp-server/issues/62) |
60-
| **automations/test_debug_action_validation_failure** | verifies agent can identify parameter type mismatches between Jinja templates and deployment schemas | ✅ implemented | [#97](https://github.com/PrefectHQ/prefect-mcp-server/issues/97) |
6160
| **test_trigger_deployment_run** | verifies agent can trigger deployment runs with custom parameters | ✅ implemented | - |
61+
| **test_lease_renewal_crash** | verifies agent can diagnose flow runs that crashed due to concurrency lease renewal failure | ✅ implemented | [#97](https://github.com/PrefectHQ/prefect-mcp-server/issues/97) |
6262
| **rate_limits/test_cloud_direct** | verifies agent can diagnose rate limiting when user asks about 429 errors (Cloud) | ✅ implemented | [#46](https://github.com/PrefectHQ/prefect-mcp-server/issues/46) |
6363
| **rate_limits/test_cloud_no_throttling** | verifies agent correctly rules out rate limiting when no throttling occurred (Cloud) | ✅ implemented | [#46](https://github.com/PrefectHQ/prefect-mcp-server/issues/46) |
6464
| **rate_limits/test_cloud_correlate_logs** | verifies agent can correlate 429 warnings in flow logs with rate limit data (Cloud) | ✅ implemented | [#46](https://github.com/PrefectHQ/prefect-mcp-server/issues/46) |

evals/automations/test_debug_action_validation_failure.py

Lines changed: 0 additions & 164 deletions
This file was deleted.

evals/test_lease_renewal_crash.py

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
"""Eval for diagnosing flow runs that crash due to concurrency lease renewal failure.
2+
3+
Based on real user issues:
4+
- https://github.com/PrefectHQ/prefect/issues/19068
5+
- https://github.com/PrefectHQ/prefect/issues/18839
6+
7+
When a flow run holds a concurrency slot, it must periodically renew the lease.
8+
If renewal fails (network issues, API problems, timeout), Prefect crashes the run
9+
to prevent over-allocation. This is a common production issue that's hard to diagnose
10+
without understanding Prefect's internal lease renewal mechanism.
11+
"""
12+
13+
from collections.abc import Awaitable, Callable
14+
from uuid import uuid4
15+
16+
import pytest
17+
from prefect import flow
18+
from prefect.client.orchestration import PrefectClient
19+
from prefect.client.schemas.objects import FlowRun
20+
from prefect.states import Crashed
21+
from pydantic_ai import Agent
22+
23+
from evals._tools.spy import ToolCallSpy
24+
25+
LEASE_RENEWAL_ERROR = (
26+
"Concurrency lease renewal failed - slots are no longer reserved. "
27+
"Terminating execution to prevent over-allocation."
28+
)
29+
30+
31+
@pytest.fixture
32+
async def crashed_lease_renewal_flow_run(prefect_client: PrefectClient) -> FlowRun:
33+
"""Create a flow run that crashed due to lease renewal failure."""
34+
35+
@flow(name=f"data-pipeline-{uuid4().hex[:8]}")
36+
def data_pipeline():
37+
return "completed"
38+
39+
# Run the flow to create a flow run
40+
state = data_pipeline(return_state=True)
41+
flow_run = await prefect_client.read_flow_run(state.state_details.flow_run_id)
42+
43+
# Force to Crashed state with lease renewal error message
44+
crashed_state = Crashed(message=LEASE_RENEWAL_ERROR)
45+
await prefect_client.set_flow_run_state(
46+
flow_run_id=flow_run.id,
47+
state=crashed_state,
48+
force=True,
49+
)
50+
51+
return await prefect_client.read_flow_run(flow_run.id)
52+
53+
54+
async def test_diagnoses_lease_renewal_failure(
55+
simple_agent: Agent,
56+
crashed_lease_renewal_flow_run: FlowRun,
57+
evaluate_response: Callable[[str, str], Awaitable[None]],
58+
tool_call_spy: ToolCallSpy,
59+
) -> None:
60+
"""Test agent identifies concurrency lease renewal failure as crash cause."""
61+
prompt = (
62+
f"Why did my flow run '{crashed_lease_renewal_flow_run.name}' crash "
63+
"unexpectedly during execution? It was running fine and then suddenly crashed."
64+
)
65+
66+
async with simple_agent:
67+
result = await simple_agent.run(prompt)
68+
69+
await evaluate_response(
70+
"Does the agent correctly identify that the flow run crashed due to "
71+
"concurrency lease renewal failure? The response should mention "
72+
"'lease renewal' or 'concurrency slot' and explain that the run was "
73+
"terminated because the lease could not be renewed.",
74+
result.output,
75+
)
76+
77+
# Agent must use get_flow_runs to retrieve the crash details
78+
tool_call_spy.assert_tool_was_called("get_flow_runs")

0 commit comments

Comments
 (0)