Add support for partial transcription prefix in the prompt by azziko · Pull Request #15449 · NVIDIA-NeMo/NeMo

azziko · 2026-02-27T10:33:41Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add support for partial transcription of the current audio input. This is especially useful in the streaming scenarios.

Collection: [ASR]

Changelog

Adds a user turn in the Canary2PromptFormatter.

Usage

Can be used as an input propmt to the top level .transcribe() function. The partially transcribed part is ommited in the hypothesis. Must be used as the last turn:

from nemo.collections.asr.models import ASRModel
from nemo.collections.asr.models.aed_multitask_models import MultiTaskTranscriptionConfig
from nemo.collections.asr.parts.submodules.multitask_decoding import MultiTaskDecodingConfig
from nemo.collections.asr.models.aed_multitask_models import parse_multitask_prompt

model = ASRModel.from_pretrained(model_name="nvidia/canary-1b-v2")
decoding_config = MultiTaskDecodingConfig()
model.change_decoding_strategy(decoding_config)

turns =  [
      {
          "role": "user",
          "slots": {
              "source_lang": "<|en|>",
              "target_lang": "<|en|>",
              "task": "<|transcribe|>",
              "pnc": "<|pnc|>",
          },
      },
      {
          "role": "user_prefix",
          "slots": {
              "prefix": "Partial transcription."
          },
      },
]

prompt = parse_multitask_prompt({"turns": turns})

config = MultiTaskTranscriptionConfig(
    batch_size=1, 
    return_hypotheses=True,
    num_workers=0,
    verbose=False,
    prompt=prompt,
    enable_chunking=False
)

output = model.transcribe("/path/to/your/audio", override_config=config)

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Resolves Manual decoding_input_ids/prefix for the decoder injection #15393

Signed-off-by: azziko <sharipov.wdev@gmail.com>

pzelasko · 2026-02-27T15:49:51Z

Thank you for a very clean usage example! Does this approach work well with the pretrained canary-v2, or did you train your own model with some modifications for streaming? If it's possible to share any numbers, I'd be curious to learn more.

Can you add the tests either to tests/collections/common/prompt_formatters/test_canary_prompt_formatter.py or create a new test_canary2_prompt_formatter.py?

Signed-off-by: azziko <sharipov.wdev@gmail.com>

Signed-off-by: azziko <azziko@users.noreply.github.com>

azziko · 2026-02-27T21:43:16Z

Thank you for a quick review!
I have added a set of separate unit tests for the Canary2PromptFormatter.

For my purposes and tests I have been using the pretrained canary-v2 model. My decoding parameters were as follows(let me know if you would like to know some specific numbers that I might have missed, I will happily share them too):

    strategy: beam
    compute_hypothesis_token_set: true
    preserve_alignments: null
    confidence_cfg:
      preserve_frame_confidence: false
      preserve_token_confidence: false
      preserve_word_confidence: false
      exclude_blank: true
      aggregation: min
      tdt_include_duration: false
      method_cfg:
        name: entropy
        entropy_type: tsallis
        alpha: 0.33
        entropy_norm: exp
        temperature: DEPRECATED
    compute_langs: false
    greedy:
      temperature: null
      max_generation_delta: -1
      preserve_alignments: false
      preserve_token_confidence: false
      confidence_method_cfg:
        name: entropy
        entropy_type: tsallis
        alpha: 0.33
        entropy_norm: exp
        temperature: DEPRECATED
      n_samples: 1
    beam:
      beam_size: 5
      search_type: default
      len_pen: 1.0
      max_generation_delta: -1
      return_best_hypothesis: true
      preserve_alignments: false
      ngram_lm_model: null
      ngram_lm_alpha: 0.0
      boosting_tree:
        model_path: null
        key_phrases_file: null
        key_phrases_list: null
        key_phrase_items_list: null
        context_score: 1.0
        depth_scaling: 1.0
        unk_score: 0.0
        final_eos_score: 1.0
        score_per_phrase: 0.0
        source_lang: en
        use_triton: true
        uniform_weights: false
        use_bpe_dropout: false
        num_of_transcriptions: 5
        bpe_alpha: 0.3
      boosting_tree_alpha: 0.0
    temperature: 1.0
    return_xattn_scores: true

pzelasko · 2026-02-27T21:55:51Z

Thanks. I was just wondering if you have any WER comparison to other approaches or models - I would have expected canary2 to degrade with this technique.

Add support for partial transcription prefix in the prompt

51f78e1

Signed-off-by: azziko <sharipov.wdev@gmail.com>

github-actions bot added the common label Feb 27, 2026

pzelasko self-requested a review February 27, 2026 15:50

pzelasko self-assigned this Feb 27, 2026

github-actions bot added the community-request label Feb 27, 2026

azziko and others added 2 commits February 27, 2026 21:32

Add unit tests for Canary2PromptFormatter

657c35d

Signed-off-by: azziko <sharipov.wdev@gmail.com>

Apply isort and black reformatting

9236465

Signed-off-by: azziko <azziko@users.noreply.github.com>

pzelasko approved these changes Feb 27, 2026

View reviewed changes

pzelasko added the Run CICD label Feb 27, 2026

pzelasko deployed to test February 27, 2026 21:55 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for partial transcription prefix in the prompt#15449

Add support for partial transcription prefix in the prompt#15449
azziko wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
azziko:add-decoder-prefix

azziko commented Feb 27, 2026

Uh oh!

pzelasko commented Feb 27, 2026

Uh oh!

azziko commented Feb 27, 2026

Uh oh!

pzelasko commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

azziko commented Feb 27, 2026

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

pzelasko commented Feb 27, 2026

Uh oh!

azziko commented Feb 27, 2026

Uh oh!

pzelasko commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants