-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Proposal summary
Currently Opik.get_dataset accepts only a name argument. In certain scenarios where a user is interested in running only a subset of test cases in a dataset (say, scoped by a specific tag), it would be very helpful if it was possible to use filter string semantics to scope which dataset items are returned and subsequently evaluated.
Proposed function signature:
def get_dataset(self, name: str, filter_string: str | None) -> dataset.Dataset:
Motivation
When certain test cases fail or require re-evaluation, it is handy to be able to re-run only those cases (especially since agent evaluations can already be long-running processes). The current SDK only returns the dataset item contents, and not any metadata, preventing the user from writing filter logic. Adopting the same semantics for filtering operations used in evaluate_threads would be very helpful in supporting this workflow.