Skip to content

Commit a73231f

Browse files
committed
🔄 synced local 'skyvern/' with remote 'skyvern/'
- Add termination detection to task v2 prompts (Phase 1) - Handle termination in task v2 service logic (Phase 2) - Add termination thought tracking (Phase 3) - add tests <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Adds explicit termination support: tasks can end early when impossible, exposing a termination indicator and reason; plan and task type may be omitted when terminated or goal achieved. * Thoughts and guidance now optionally include an explanation when a goal is impossible. * Completion checks now surface page/extraction context and report nuanced outcomes (achieved, needs more steps, impossible). * **Tests** * Added comprehensive tests covering termination behavior, decision logic, and output fields. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- ELLIPSIS_HIDDEN --> ---- > [!IMPORTANT] > Add termination handling to task v2, allowing early task termination when goals are impossible, with updates to prompts, service logic, and comprehensive tests. > > - **Behavior**: > - Adds termination detection to `task_v2.j2` and `task_v2_check_completion.j2` prompts, allowing tasks to end early when goals are impossible. > - Implements termination handling in `task_v2_service.py`, including `_handle_task_v2_termination()` function to create termination thoughts and mark tasks as terminated. > - Updates `ThoughtType` and `ThoughtScenario` enums in `task_v2.py` to include `termination`. > - **Tests**: > - Adds unit tests in `test_task_v2.py` to cover termination behavior, including response parsing, decision logic, and thought creation. > > <sup>This description was created by </sup>[<img alt="Ellipsis" src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=Skyvern-AI%2Fskyvern-cloud&utm_source=github&utm_medium=referral)<sup> for a3ffc63f5fd32acaf954738ef40f5ca3e8655389. You can [customize](https://app.ellipsis.dev/Skyvern-AI/settings/summaries) this summary. It will automatically update as commits are pushed.</sup> <!-- ELLIPSIS_HIDDEN -->
1 parent a503a19 commit a73231f

File tree

4 files changed

+195
-7
lines changed

4 files changed

+195
-7
lines changed

skyvern/forge/prompts/skyvern/task_v2.j2

Lines changed: 34 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,33 @@ You have access to the following task types to take actions:
66
- loop: this task can be used to generate a list of planning sessions like this. When to use a loop task? Use loop when there are multiple parallel tasks you can do with the same goal. Each task in the loop has the same goal but with different objects/values/targets/variables. Use loop task when it's in a "breadth first search" situation where you can go through a list of values and execute the same task for each value. Examples:
77
- When the goal is "Open up to 10 links from an ecomm search result page, and extract information like the price of each product.", loop task should be used to iterate through a list of links or URLs. In each iteration of the loop, the task will go to the linked page and trigger another planning session with the goal of extracting price information of the product
88
- When the goal is "download 5 documents found on a page", loop task should be used to iterate through a list of document names. Each document will trigger another planning session to download the relative document
9+
{% if enable_termination %}
10+
11+
You may also determine that the user goal is IMPOSSIBLE to achieve. Use termination ONLY when there is CLEAR, EXPLICIT, and UNAMBIGUOUS evidence that the goal cannot ever be accomplished. Be very conservative - when in doubt, continue trying.
12+
13+
CRITICAL: Termination should be rare. Only terminate when the website EXPLICITLY tells you the action is impossible. Examples of when to terminate:
14+
- An explicit error message like "Account not found", "User does not exist", "No results found for [specific query]"
15+
- "Access denied" or "Unauthorized" errors after authentication was attempted
16+
- A 404 page that explicitly says the resource doesn't exist
17+
- Login failed with explicit "Invalid credentials" or "Wrong password" message (not just empty fields or validation errors)
18+
- "File not available", "Out of stock with no restock date", or "This item has been discontinued"
19+
- The website explicitly states: "This action cannot be performed" or "This feature is not available"
20+
21+
Do NOT terminate when:
22+
- The page is still loading, blank, or shows a spinner
23+
- You need to navigate to find the right page or section
24+
- A captcha, verification step, or 2FA appeared (these can be solved)
25+
- A transient network error, timeout, or "try again later" message occurred
26+
- You simply haven't found the right element yet but it might exist elsewhere on the page or site
27+
- The task is difficult but not impossible
28+
- You're on a wrong page and need to navigate back
29+
- Search returned no results but you could try different search terms
30+
- A form submission failed but you could correct the input
31+
- You see a generic error without specific details about why it failed
32+
- The page structure is different than expected but might still contain the needed functionality
33+
34+
When in doubt, DO NOT terminate. Try an alternative approach instead.
35+
{% endif %}
936

1037
MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc.
1138

@@ -16,10 +43,14 @@ Reply in JSON format with the following keys:
1643
"require_extraction": bool, // True if the user goal requires information extraction. False otherwise.
1744
"task_history_information": str, // Think step by step. In task history, what information has been collected that's helpful and relevant to the user goal, and what information is missing if any.
1845
"information_extracted": optional[bool], // True if the needed information has been extracted. False if the needed information has not been extracted. If task history has no "extract" type, that means no data extraction has happened, return false. Null if the user goal does not require information extraction.
19-
"thoughts": str, // Think step by step. What has been done so far and what is the next reasonable mini goal a human can do foreseeably move towards the overall goal.
46+
"thoughts": str, // Think step by step. What has been done so far and what is the next reasonable mini goal a human can do foreseeably move towards the overall goal.{% if enable_termination %} If the goal appears impossible to achieve, explain why.{% endif %}
2047
"user_goal_achieved": bool, // True if the user goal has been completed, false otherwise. If the user wants to extract information and it has not been done, the user goal is not achieved.
21-
"plan": str, // The mini goal to achieve to move towards the user goal. DO NOT come up or hallucinate any data that's not provided in the user goal. Be accurate and precise. Return null if the user goal has been achieved.
22-
"task_type": str, // One of the available task types: navigate, extract, loop
48+
{% if enable_termination %}
49+
"should_terminate": bool, // True ONLY if the user goal is definitively IMPOSSIBLE to achieve. Must have explicit evidence from the page. See termination guidelines above. False if there's any chance the goal can still be achieved.
50+
"termination_reason": str, // If should_terminate is true, quote the EXACT error message or text from the page that proves impossibility. Be specific. Null if should_terminate is false.
51+
{% endif %}
52+
"plan": str, // The mini goal to achieve to move towards the user goal. DO NOT come up or hallucinate any data that's not provided in the user goal. Be accurate and precise. Return null if the user goal has been achieved{% if enable_termination %} or if should_terminate is true{% endif %}.
53+
"task_type": str, // One of the available task types: navigate, extract, loop. Null if user_goal_achieved is true{% if enable_termination %} or should_terminate is true{% endif %}.
2354
"loop_values": list[str], // a list of string values to iterate through for loop task. null if it's not a loop task
2455
"is_loop_value_link": bool, // true if the loop_values is a list of urls to go to before for each planning session inside the loop
2556
}

skyvern/forge/prompts/skyvern/task_v2_check_completion.j2

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
You're to assist the user to achieve the user goal in the web. Given the user goal, the latest screenshot of the page and the mini tasks that have been completed by the user along the way, help decide whether the user goal has been achieved.
1+
You're to assist the user to achieve the user goal in the web. Given the user goal, the latest screenshot of the page and the mini tasks that have been completed by the user along the way, help decide whether the user goal has been achieved{% if enable_termination %}, is impossible to achieve,{% endif %} or needs more steps.
22

33
Reply in JSON format with the following keys:
44
{
@@ -7,9 +7,24 @@ Reply in JSON format with the following keys:
77
"require_extraction": bool, // True if the user goal requires information extraction. False otherwise.
88
"task_history_information": str, // Think step by step. In task history, what information has been collected that's helpful and relevant to the user goal, and what information is missing if any.
99
"information_extracted": optional[bool], // True if the needed information has been extracted. False if the needed information has not been extracted (no extract task in history). Null if the user goal does not require information extraction.
10-
"thoughts": str, // Think step by step. Would completing the tasks in the task history be good enough to achieve the user goal? If more tasks need to be completed to achieve the goal, what would be the next task?
10+
"thoughts": str, // Think step by step. Would completing the tasks in the task history be good enough to achieve the user goal? If more tasks need to be completed to achieve the goal, what would be the next task?{% if enable_termination %} If the goal appears impossible, explain why.{% endif %}
1111
"user_goal_achieved": bool, // True if the user goal has been completed, false otherwise. If the user wants to extract information and it has not been done, the user goal is not achieved. If info extraction is not required, use the task history, assisted by the screenshot to decide if the user goal has been achieved.
12+
{% if enable_termination %}
13+
"should_terminate": bool, // True ONLY if there is CLEAR, EXPLICIT, UNAMBIGUOUS evidence that the goal is IMPOSSIBLE. The page must explicitly state the action cannot be done. False if there's ANY chance the goal can still be achieved.
14+
"termination_reason": str // If should_terminate is true, quote the EXACT error message or text from the page that proves impossibility. Null if should_terminate is false.
15+
{% endif %}
1216
}
17+
{% if enable_termination %}
18+
19+
CRITICAL - Be very conservative about termination. Only terminate when:
20+
- The page shows an EXPLICIT error message stating impossibility (e.g., "Account does not exist", "Product discontinued", "Access permanently denied")
21+
- NOT when the page is loading, blank, or showing a generic error
22+
- NOT when you simply haven't found the element yet
23+
- NOT when a form failed but could be retried with different input
24+
- NOT when you're on a wrong page that could be navigated away from
25+
26+
When in doubt, set should_terminate to false and continue trying.
27+
{% endif %}
1328

1429
User goal:
1530
```

skyvern/forge/sdk/schemas/task_v2.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,14 +117,29 @@ def deserialize_proxy_location(cls, proxy_location: ProxyLocationInput | str) ->
117117

118118

119119
class ThoughtType(StrEnum):
120+
"""
121+
Type of thought recorded during task execution.
122+
123+
Note: Stored as VARCHAR in the database (not a PostgreSQL ENUM), so new values
124+
can be added without database migrations. See observer_thoughts.observer_thought_type column.
125+
"""
126+
120127
plan = "plan"
121128
metadata = "metadata"
122129
user_goal_check = "user_goal_check"
123130
internal_plan = "internal_plan"
124131
failure_describe = "failure_describe"
132+
termination = "termination"
125133

126134

127135
class ThoughtScenario(StrEnum):
136+
"""
137+
Scenario in which a thought was generated.
138+
139+
Note: Stored as VARCHAR in the database (not a PostgreSQL ENUM), so new values
140+
can be added without database migrations. See observer_thoughts.observer_thought_scenario column.
141+
"""
142+
128143
generate_plan = "generate_plan"
129144
user_goal_check = "user_goal_check"
130145
failure_describe = "failure_describe"
@@ -133,6 +148,7 @@ class ThoughtScenario(StrEnum):
133148
extract_loop_values = "extract_loop_values"
134149
generate_task_in_loop = "generate_task_in_loop"
135150
generate_task = "generate_general_task"
151+
termination = "termination"
136152

137153

138154
class Thought(BaseModel):

skyvern/services/task_v2_service.py

Lines changed: 128 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,80 @@ async def _summarize_max_steps_failure_reason(
146146
return ""
147147

148148

149+
async def _handle_task_v2_termination(
150+
task_v2_id: str,
151+
organization_id: str,
152+
workflow_run_id: str,
153+
workflow_id: str,
154+
workflow_permanent_id: str,
155+
termination_reason: str | None,
156+
iteration: int,
157+
source: str | None = None,
158+
) -> TaskV2:
159+
"""
160+
Handle task v2 termination by creating a termination thought and marking the task as terminated.
161+
162+
Args:
163+
task_v2_id: The task v2 ID
164+
organization_id: The organization ID
165+
workflow_run_id: The workflow run ID
166+
workflow_id: The workflow ID
167+
workflow_permanent_id: The workflow permanent ID
168+
termination_reason: The reason for termination (from LLM response)
169+
iteration: The current iteration number
170+
source: Optional source identifier (e.g., "completion_check")
171+
172+
Returns:
173+
The updated TaskV2 object with terminated status
174+
"""
175+
log_message = "Task v2 should terminate"
176+
if source:
177+
log_message = f"Task v2 should terminate according to {source}"
178+
log_message += " - goal is impossible to achieve"
179+
180+
LOG.info(
181+
log_message,
182+
iteration=iteration,
183+
workflow_run_id=workflow_run_id,
184+
termination_reason=termination_reason,
185+
)
186+
187+
# Create a dedicated termination thought for UI visibility
188+
termination_thought = await app.DATABASE.create_thought(
189+
task_v2_id=task_v2_id,
190+
organization_id=organization_id,
191+
workflow_run_id=workflow_run_id,
192+
workflow_id=workflow_id,
193+
workflow_permanent_id=workflow_permanent_id,
194+
thought_type=ThoughtType.termination,
195+
thought_scenario=ThoughtScenario.termination,
196+
thought=termination_reason or "Task goal is impossible to achieve",
197+
)
198+
199+
output: dict[str, Any] = {
200+
"should_terminate": True,
201+
"termination_reason": termination_reason,
202+
"iteration": iteration,
203+
}
204+
if source:
205+
output["source"] = source
206+
207+
await app.DATABASE.update_thought(
208+
thought_id=termination_thought.observer_thought_id,
209+
organization_id=organization_id,
210+
output=output,
211+
)
212+
213+
task_v2 = await mark_task_v2_as_terminated(
214+
task_v2_id=task_v2_id,
215+
workflow_run_id=workflow_run_id,
216+
organization_id=organization_id,
217+
failure_reason=termination_reason or "Task goal is impossible to achieve",
218+
)
219+
220+
return task_v2
221+
222+
149223
async def initialize_task_v2(
150224
organization: Organization,
151225
user_prompt: str,
@@ -526,6 +600,16 @@ async def run_task_v2_helper(
526600
current_run_id,
527601
properties={"organization_id": organization_id, "task_url": task_v2.url},
528602
)
603+
enable_task_v2_termination = await app.EXPERIMENTATION_PROVIDER.is_feature_enabled_cached(
604+
"ENABLE_TASK_V2_TERMINATION",
605+
current_run_id,
606+
properties={"organization_id": organization_id, "task_url": task_v2.url},
607+
)
608+
LOG.info(
609+
"Task v2 termination feature flag",
610+
enable_task_v2_termination=enable_task_v2_termination,
611+
organization_id=organization_id,
612+
)
529613
skyvern_context.set(
530614
SkyvernContext(
531615
organization_id=organization_id,
@@ -702,6 +786,7 @@ async def run_task_v2_helper(
702786
user_goal=user_prompt,
703787
task_history=task_history,
704788
local_datetime=datetime.now(context.tz_info).isoformat(),
789+
enable_termination=bool(enable_task_v2_termination),
705790
)
706791
thought = await app.DATABASE.create_thought(
707792
task_v2_id=task_v2_id,
@@ -730,6 +815,8 @@ async def run_task_v2_helper(
730815
)
731816
# see if the user goal has achieved or not
732817
user_goal_achieved = task_v2_response.get("user_goal_achieved", False)
818+
should_terminate = task_v2_response.get("should_terminate", False)
819+
termination_reason = task_v2_response.get("termination_reason")
733820
observation = task_v2_response.get("page_info", "")
734821
thoughts: str = task_v2_response.get("thoughts", "")
735822
plan = task_v2_response.get("plan", "")
@@ -741,7 +828,12 @@ async def run_task_v2_helper(
741828
thought=thoughts,
742829
observation=observation,
743830
answer=plan,
744-
output={"task_type": task_type, "user_goal_achieved": user_goal_achieved},
831+
output={
832+
"task_type": task_type,
833+
"user_goal_achieved": user_goal_achieved,
834+
"should_terminate": should_terminate,
835+
"termination_reason": termination_reason,
836+
},
745837
)
746838

747839
if user_goal_achieved is True:
@@ -763,6 +855,19 @@ async def run_task_v2_helper(
763855
)
764856
break
765857

858+
# Only handle termination if the feature flag is enabled
859+
if enable_task_v2_termination and should_terminate is True:
860+
task_v2 = await _handle_task_v2_termination(
861+
task_v2_id=task_v2_id,
862+
organization_id=organization_id,
863+
workflow_run_id=workflow_run_id,
864+
workflow_id=workflow_id,
865+
workflow_permanent_id=workflow.workflow_permanent_id,
866+
termination_reason=termination_reason,
867+
iteration=i,
868+
)
869+
return workflow, workflow_run, task_v2
870+
766871
if not plan:
767872
LOG.warning("No plan found in task v2 response", task_v2_response=task_v2_response)
768873
continue
@@ -925,6 +1030,7 @@ async def run_task_v2_helper(
9251030
user_goal=user_prompt,
9261031
task_history=task_history,
9271032
local_datetime=datetime.now(context.tz_info).isoformat(),
1033+
enable_termination=bool(enable_task_v2_termination),
9281034
)
9291035
thought = await app.DATABASE.create_thought(
9301036
task_v2_id=task_v2_id,
@@ -949,12 +1055,18 @@ async def run_task_v2_helper(
9491055
task_history=task_history,
9501056
)
9511057
user_goal_achieved = completion_resp.get("user_goal_achieved", False)
1058+
should_terminate = completion_resp.get("should_terminate", False)
1059+
termination_reason = completion_resp.get("termination_reason")
9521060
thought_content = completion_resp.get("thoughts", "")
9531061
await app.DATABASE.update_thought(
9541062
thought_id=thought.observer_thought_id,
9551063
organization_id=organization_id,
9561064
thought=thought_content,
957-
output={"user_goal_achieved": user_goal_achieved},
1065+
output={
1066+
"user_goal_achieved": user_goal_achieved,
1067+
"should_terminate": should_terminate,
1068+
"termination_reason": termination_reason,
1069+
},
9581070
)
9591071
if user_goal_achieved:
9601072
LOG.info(
@@ -977,6 +1089,20 @@ async def run_task_v2_helper(
9771089
)
9781090
break
9791091

1092+
# Only handle termination if the feature flag is enabled
1093+
if enable_task_v2_termination and should_terminate:
1094+
task_v2 = await _handle_task_v2_termination(
1095+
task_v2_id=task_v2_id,
1096+
organization_id=organization_id,
1097+
workflow_run_id=workflow_run_id,
1098+
workflow_id=workflow_id,
1099+
workflow_permanent_id=workflow.workflow_permanent_id,
1100+
termination_reason=termination_reason,
1101+
iteration=i,
1102+
source="completion_check",
1103+
)
1104+
return workflow, workflow_run, task_v2
1105+
9801106
# total step number validation
9811107
workflow_run_tasks = await app.DATABASE.get_tasks_by_workflow_run_id(workflow_run_id=workflow_run_id)
9821108
total_step_count = await app.DATABASE.get_total_unique_step_order_count_by_task_ids(

0 commit comments

Comments
 (0)