Release v0.4.0

Latest

Latest

jjk-g released this 06 Feb 22:48

e3e690b

Release v0.4.0

This release contains several feature improvements and various bug fixes:

mTLS support in vllm client
Multilora support
E2e testing against inference sim
Multiple report analysis
New aliases for shared_prefix config fields
Dependency updates

What's Changed

Update attribute error when parsing billsum conversations by @rlakhtakia in #297
feat: add percentiles configuration for request lifecycle metrics reporting by @hhk7734 in #295
Add mTLS support in vllm client by @unicell in #302
Add cloudbuild.yaml by @jjk-g in #306
Fix vllm prefix metrics by @jjk-g in #309
feat: Multilora support by @changminbark in #315
Add end-to-end testing using llm-d-inference-sim by @diamondburned in #294
chore: update README.md with concurrent load generation info by @changminbark in #313
Enabling multiple report analysis using CLI tool by @SachinVarghese in #307
fix concurrency higher than set issue. by @zetxqx in #320
Add Journal of Open Source Software paper on inference-perf by @achandrasekar in #326
Fix docs to reference std, actual field is std_dev by @jjk-g in #335
Add Sachin to authors in JOSS paper by @achandrasekar in #334
Add shuffle to multi-round prompt generation by @elevran in #331
Add aliases for shared_prefix config fields by @jjk-g in #311
Update paper with statement of need and references by @achandrasekar in #342
Add unique random seed to worker by @yangligt2 in #340
Update transformers by @jjk-g in #336

New Contributors

@unicell made their first contribution in #302
@elevran made their first contribution in #331
@yangligt2 made their first contribution in #340

Full Changelog: v0.3.0...v0.4.0

Docker Image

quay.io/inference-perf/inference-perf:v0.4.0

Python Package

pip install inference-perf==v0.4.0

Contributors

unicell, achandrasekar, and 9 other contributors

Assets 2