feat(tts): integrate voice upload API (upstream #1201)#12
Draft
marksverdhai wants to merge 2 commits intoheiervang-technologies:htfrom
Draft
feat(tts): integrate voice upload API (upstream #1201)#12marksverdhai wants to merge 2 commits intoheiervang-technologies:htfrom
marksverdhai wants to merge 2 commits intoheiervang-technologies:htfrom
Conversation
Port the voice upload API (POST /v1/audio/voices) from upstream vllm-project#1201 into the HT branch, adapted to coexist with HT's existing streaming and audio extraction changes. - Add upload_voice(), _load/_save_uploaded_speakers() to serving_speech - Add POST /v1/audio/voices endpoint to api_server - Modify GET /v1/audio/voices to include uploaded voice details - Auto-set ref_audio for uploaded voices in Base task - Add docs/serving/speech_api.md documentation Note: Known upstream review issues (path traversal, metadata locking, validation bypass for built-in voices) are carried as-is for parity and will be addressed in a follow-up. Upstream-PR: vllm-project#1201 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes security and logic issues flagged in upstream PR vllm-project#1201 review: Security: - Sanitize name/consent to alphanumeric/underscore/hyphen only - Validate resolved path stays within upload directory - Remove file_path from API responses (information disclosure) Logic bugs: - Base task validation now correctly requires ref_audio unless voice is specifically an uploaded voice (not just any voice name) - _get_uploaded_audio_data raises ValueError instead of returning None when audio file is missing, preventing silent failures Robustness: - Atomic metadata writes via tempfile + os.replace - File locking (fcntl.flock) on metadata.json reads and writes - Use Path().suffix for file extension extraction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
POST /v1/audio/voices) from upstream vllm-project/vllm-omni#1201 into the HT branchdocs/serving/speech_api.mdfull API documentationChanges
serving_speech.py: Voice upload methods (upload_voice,_load/_save_uploaded_speakers,_get_uploaded_audio_data), uploaded speakers storage initialization, auto-setref_audiofor uploaded voices in Base task, relaxed validation for uploaded voice namesapi_server.py:POST /v1/audio/voicesendpoint, enhancedGET /v1/audio/voicesresponse with uploaded voice detailsREADME.md: API docs for voice upload/list endpointsspeech_api.md: New comprehensive speech API documentationKnown Upstream Review Issues (carried as-is)
These are flagged in the upstream PR review and should be addressed in a follow-up:
name/consentunsanitized in filename constructionmetadata.json— race condition on concurrent uploadsref_audiopasses validation but fails downstreamNonesilently/tmpstorage not persistent across restartsSee upstream review thread for full discussion.
Upstream PR
vllm-project#1201
Test plan
GET /v1/audio/voicesreturns both built-in and uploaded voicesPOST /v1/audio/voicesuploads and persists voice samplesref_audioin Base task requests🤖 Generated with Claude Code