Proposal: introduce evals to 12 factor app #65
+8
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This addition proposes to start adding evaluation tests (evals) to be part of the context of agent app.
General concept:
The main idea of the PR is that evals convey context specific info that corresponds to the alignment of the agent.
Referencing the original 12-factor app, testing is still implicitly supported and encouraged through several of the factors:
Factor 10: Dev/Prod Parity
This encourages keeping development, staging, and production environments as similar as possible. It implies that automated tests should run in environments that closely resemble production.
Factor 5: Build, Release, Run
Since the build phase includes compiling code and running tests, a solid CI/CD pipeline that enforces testing fits naturally here.
Factor 12: Admin Processes
You could technically run test scripts as one-off admin processes.