feat(tts): integrate voice upload API (upstream #1201) by marksverdhai · Pull Request #12 · heiervang-technologies/ht-vllm-omni

marksverdhai · 2026-02-06T15:56:55Z

Summary

Port voice upload API (POST /v1/audio/voices) from upstream vllm-project/vllm-omni#1201 into the HT branch
Adapted to coexist with HT's existing streaming and audio extraction changes
Add docs/serving/speech_api.md full API documentation

Changes

serving_speech.py: Voice upload methods (upload_voice, _load/_save_uploaded_speakers, _get_uploaded_audio_data), uploaded speakers storage initialization, auto-set ref_audio for uploaded voices in Base task, relaxed validation for uploaded voice names
api_server.py: POST /v1/audio/voices endpoint, enhanced GET /v1/audio/voices response with uploaded voice details
README.md: API docs for voice upload/list endpoints
speech_api.md: New comprehensive speech API documentation

Known Upstream Review Issues (carried as-is)

These are flagged in the upstream PR review and should be addressed in a follow-up:

Path traversal (security) — name/consent unsanitized in filename construction
No file locking on metadata.json — race condition on concurrent uploads
Base task validation bypass — built-in voice name + no ref_audio passes validation but fails downstream
Silent auto-set failure — missing uploaded audio file returns None silently
/tmp storage not persistent across restarts
File path disclosure — API response returns full server path

See upstream review thread for full discussion.

Upstream PR

vllm-project#1201

Test plan

Verify GET /v1/audio/voices returns both built-in and uploaded voices
Verify POST /v1/audio/voices uploads and persists voice samples
Verify uploaded voice auto-sets ref_audio in Base task requests
Verify existing streaming functionality is unaffected
Address known review issues in follow-up PR

🤖 Generated with Claude Code

Port the voice upload API (POST /v1/audio/voices) from upstream vllm-project#1201 into the HT branch, adapted to coexist with HT's existing streaming and audio extraction changes. - Add upload_voice(), _load/_save_uploaded_speakers() to serving_speech - Add POST /v1/audio/voices endpoint to api_server - Modify GET /v1/audio/voices to include uploaded voice details - Auto-set ref_audio for uploaded voices in Base task - Add docs/serving/speech_api.md documentation Note: Known upstream review issues (path traversal, metadata locking, validation bypass for built-in voices) are carried as-is for parity and will be addressed in a follow-up. Upstream-PR: vllm-project#1201 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fixes security and logic issues flagged in upstream PR vllm-project#1201 review: Security: - Sanitize name/consent to alphanumeric/underscore/hyphen only - Validate resolved path stays within upload directory - Remove file_path from API responses (information disclosure) Logic bugs: - Base task validation now correctly requires ref_audio unless voice is specifically an uploaded voice (not just any voice name) - _get_uploaded_audio_data raises ValueError instead of returning None when audio file is missing, preventing silent failures Robustness: - Atomic metadata writes via tempfile + os.replace - File locking (fcntl.flock) on metadata.json reads and writes - Use Path().suffix for file extension extraction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

marksverdhei and others added 2 commits February 6, 2026 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tts): integrate voice upload API (upstream #1201)#12

feat(tts): integrate voice upload API (upstream #1201)#12
marksverdhai wants to merge 2 commits intoheiervang-technologies:htfrom
marksverdhai:feat/voice-upload-api

marksverdhai commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marksverdhai commented Feb 6, 2026

Summary

Changes

Known Upstream Review Issues (carried as-is)

Upstream PR

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants