Skip to content

Add NeMo backend for Parakeet support in server whisper#436

Open
basnijholt wants to merge 9 commits intomainfrom
feat/whisper-nemo-parakeet
Open

Add NeMo backend for Parakeet support in server whisper#436
basnijholt wants to merge 9 commits intomainfrom
feat/whisper-nemo-parakeet

Conversation

@basnijholt
Copy link
Owner

Summary

  • add a new nemo ASR backend for agent-cli server whisper with subprocess lifecycle/TTL compatibility
  • add support for NVIDIA Parakeet model shorthand parakeet-tdt-0.6b-v2 (resolved to nvidia/parakeet-tdt-0.6b-v2)
  • wire nemo into backend selection/factory and --download-only
  • update server whisper CLI help/docs with NeMo + Parakeet examples
  • make --backend auto switch to nemo for Parakeet models and return a clear error if Parakeet is requested with non-NeMo backends
  • add tests for NeMo model alias/download helper and CLI help visibility

Validation

  • pre-commit hooks passed on commit (ruff, mypy, jscpd, pylint duplicate-code)
  • pytest -q tests/test_nemo_backend.py tests/test_api_integration.py::test_server_whisper_command_in_cli tests/test_server_whisper.py

@basnijholt
Copy link
Owner Author

Follow-up updates pushed:

  • Added nemo-whisper optional dependency in pyproject.toml and synced:
    • agent_cli/_extras.json
    • agent_cli/_requirements/*.txt (including new nemo-whisper.txt)
    • uv.lock
  • Restored strict command gating on server whisper:
    • @requires_extras("server", "faster-whisper|mlx-whisper|whisper-transformers|nemo-whisper", "wyoming")

Why the decorator was briefly removed:

  • Initially there was no NeMo extra to satisfy requires_extras for --backend nemo, so that check would fail before backend selection.
  • This is now fixed correctly with a dedicated nemo-whisper extra.

Live tests run (real commands):

  1. Generated audio via CLI:
    • agent-cli speak 'This is a live test for parakeet backend validation.' --tts-provider wyoming --tts-wyoming-ip localhost --tts-wyoming-port 10200 --save-file /tmp/live_speak.wav --output-device-name 'Dummy Output'
    • Result: /tmp/live_speak.wav created successfully.
  2. Started dedicated NeMo server on alternate port:
    • agent-cli server whisper --backend nemo --model parakeet-tdt-0.6b-v2 --host 127.0.0.1 --port 11301 --wyoming-port 11300 --no-wyoming
  3. Transcribed generated audio through Parakeet endpoint:
    • curl -X POST http://127.0.0.1:11301/v1/audio/transcriptions -F file=@/tmp/live_speak.wav -F model=parakeet-tdt-0.6b-v2
    • Response text: This is a live test for parakeet back-in validation.
    • First request (model load): ~22.8s
    • Second request (model warm): ~0.57s
  4. Verified CLI path with file input + OpenAI-compatible ASR provider:
    • agent-cli transcribe --from-file /tmp/live_speak.wav --asr-provider openai --asr-openai-base-url http://127.0.0.1:11301/v1 --asr-openai-model parakeet-tdt-0.6b-v2 --json
    • Returned same transcript successfully.

Validation:

  • pre-commit hooks passed on latest commit.
  • Targeted tests passed:
    • tests/test_nemo_backend.py
    • tests/test_api_integration.py::test_server_whisper_command_in_cli
    • tests/test_server_whisper.py

@basnijholt
Copy link
Owner Author

Blocked by upstream NeMo dependency pinning discussion/update: NVIDIA-NeMo/NeMo#15438

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant