Skip to content

feat(tts): integrate voice upload API (upstream #1201)#12

Draft
marksverdhai wants to merge 2 commits intoheiervang-technologies:htfrom
marksverdhai:feat/voice-upload-api
Draft

feat(tts): integrate voice upload API (upstream #1201)#12
marksverdhai wants to merge 2 commits intoheiervang-technologies:htfrom
marksverdhai:feat/voice-upload-api

Conversation

@marksverdhai
Copy link

Summary

  • Port voice upload API (POST /v1/audio/voices) from upstream vllm-project/vllm-omni#1201 into the HT branch
  • Adapted to coexist with HT's existing streaming and audio extraction changes
  • Add docs/serving/speech_api.md full API documentation

Changes

  • serving_speech.py: Voice upload methods (upload_voice, _load/_save_uploaded_speakers, _get_uploaded_audio_data), uploaded speakers storage initialization, auto-set ref_audio for uploaded voices in Base task, relaxed validation for uploaded voice names
  • api_server.py: POST /v1/audio/voices endpoint, enhanced GET /v1/audio/voices response with uploaded voice details
  • README.md: API docs for voice upload/list endpoints
  • speech_api.md: New comprehensive speech API documentation

Known Upstream Review Issues (carried as-is)

These are flagged in the upstream PR review and should be addressed in a follow-up:

  1. Path traversal (security) — name/consent unsanitized in filename construction
  2. No file locking on metadata.json — race condition on concurrent uploads
  3. Base task validation bypass — built-in voice name + no ref_audio passes validation but fails downstream
  4. Silent auto-set failure — missing uploaded audio file returns None silently
  5. /tmp storage not persistent across restarts
  6. File path disclosure — API response returns full server path

See upstream review thread for full discussion.

Upstream PR

vllm-project#1201

Test plan

  • Verify GET /v1/audio/voices returns both built-in and uploaded voices
  • Verify POST /v1/audio/voices uploads and persists voice samples
  • Verify uploaded voice auto-sets ref_audio in Base task requests
  • Verify existing streaming functionality is unaffected
  • Address known review issues in follow-up PR

🤖 Generated with Claude Code

marksverdhei and others added 2 commits February 6, 2026 14:32


Port the voice upload API (POST /v1/audio/voices) from upstream
vllm-project#1201 into the HT branch, adapted to coexist
with HT's existing streaming and audio extraction changes.

- Add upload_voice(), _load/_save_uploaded_speakers() to serving_speech
- Add POST /v1/audio/voices endpoint to api_server
- Modify GET /v1/audio/voices to include uploaded voice details
- Auto-set ref_audio for uploaded voices in Base task
- Add docs/serving/speech_api.md documentation

Note: Known upstream review issues (path traversal, metadata locking,
validation bypass for built-in voices) are carried as-is for parity
and will be addressed in a follow-up.

Upstream-PR: vllm-project#1201
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes security and logic issues flagged in upstream PR vllm-project#1201 review:

Security:
- Sanitize name/consent to alphanumeric/underscore/hyphen only
- Validate resolved path stays within upload directory
- Remove file_path from API responses (information disclosure)

Logic bugs:
- Base task validation now correctly requires ref_audio unless voice
  is specifically an uploaded voice (not just any voice name)
- _get_uploaded_audio_data raises ValueError instead of returning None
  when audio file is missing, preventing silent failures

Robustness:
- Atomic metadata writes via tempfile + os.replace
- File locking (fcntl.flock) on metadata.json reads and writes
- Use Path().suffix for file extension extraction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants