Release v0.4.0
This release contains several feature improvements and various bug fixes:
- mTLS support in vllm client
- Multilora support
- E2e testing against inference sim
- Multiple report analysis
- New aliases for shared_prefix config fields
- Dependency updates
What's Changed
- Update attribute error when parsing billsum conversations by @rlakhtakia in #297
- feat: add percentiles configuration for request lifecycle metrics reporting by @hhk7734 in #295
- Add mTLS support in vllm client by @unicell in #302
- Add cloudbuild.yaml by @jjk-g in #306
- Fix vllm prefix metrics by @jjk-g in #309
- feat: Multilora support by @changminbark in #315
- Add end-to-end testing using llm-d-inference-sim by @diamondburned in #294
- chore: update README.md with concurrent load generation info by @changminbark in #313
- Enabling multiple report analysis using CLI tool by @SachinVarghese in #307
- fix concurrency higher than set issue. by @zetxqx in #320
- Add Journal of Open Source Software paper on inference-perf by @achandrasekar in #326
- Fix docs to reference std, actual field is std_dev by @jjk-g in #335
- Add Sachin to authors in JOSS paper by @achandrasekar in #334
- Add shuffle to multi-round prompt generation by @elevran in #331
- Add aliases for shared_prefix config fields by @jjk-g in #311
- Update paper with statement of need and references by @achandrasekar in #342
- Add unique random seed to worker by @yangligt2 in #340
- Update transformers by @jjk-g in #336
New Contributors
- @unicell made their first contribution in #302
- @elevran made their first contribution in #331
- @yangligt2 made their first contribution in #340
Full Changelog: v0.3.0...v0.4.0
Docker Image
quay.io/inference-perf/inference-perf:v0.4.0
Python Package
pip install inference-perf==v0.4.0