Skip to content

Add support for partial transcription prefix in the prompt#15449

Open
azziko wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
azziko:add-decoder-prefix
Open

Add support for partial transcription prefix in the prompt#15449
azziko wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
azziko:add-decoder-prefix

Conversation

@azziko
Copy link

@azziko azziko commented Feb 27, 2026

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add support for partial transcription of the current audio input. This is especially useful in the streaming scenarios.

Collection: [ASR]

Changelog

  • Adds a user turn in the Canary2PromptFormatter.

Usage

Can be used as an input propmt to the top level .transcribe() function. The partially transcribed part is ommited in the hypothesis. Must be used as the last turn:

from nemo.collections.asr.models import ASRModel
from nemo.collections.asr.models.aed_multitask_models import MultiTaskTranscriptionConfig
from nemo.collections.asr.parts.submodules.multitask_decoding import MultiTaskDecodingConfig
from nemo.collections.asr.models.aed_multitask_models import parse_multitask_prompt

model = ASRModel.from_pretrained(model_name="nvidia/canary-1b-v2")
decoding_config = MultiTaskDecodingConfig()
model.change_decoding_strategy(decoding_config)

turns =  [
      {
          "role": "user",
          "slots": {
              "source_lang": "<|en|>",
              "target_lang": "<|en|>",
              "task": "<|transcribe|>",
              "pnc": "<|pnc|>",
          },
      },
      {
          "role": "user_prefix",
          "slots": {
              "prefix": "Partial transcription."
          },
      },
]

prompt = parse_multitask_prompt({"turns": turns})

config = MultiTaskTranscriptionConfig(
    batch_size=1, 
    return_hypotheses=True,
    num_workers=0,
    verbose=False,
    prompt=prompt,
    enable_chunking=False
)

output = model.transcribe("/path/to/your/audio", override_config=config)

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Signed-off-by: azziko <sharipov.wdev@gmail.com>
@pzelasko
Copy link
Collaborator

Thank you for a very clean usage example! Does this approach work well with the pretrained canary-v2, or did you train your own model with some modifications for streaming? If it's possible to share any numbers, I'd be curious to learn more.

Can you add the tests either to tests/collections/common/prompt_formatters/test_canary_prompt_formatter.py or create a new test_canary2_prompt_formatter.py?

@pzelasko pzelasko self-requested a review February 27, 2026 15:50
@pzelasko pzelasko self-assigned this Feb 27, 2026
azziko and others added 2 commits February 27, 2026 21:32
Signed-off-by: azziko <sharipov.wdev@gmail.com>
Signed-off-by: azziko <azziko@users.noreply.github.com>
@azziko
Copy link
Author

azziko commented Feb 27, 2026

Thank you for a quick review!
I have added a set of separate unit tests for the Canary2PromptFormatter.

For my purposes and tests I have been using the pretrained canary-v2 model. My decoding parameters were as follows(let me know if you would like to know some specific numbers that I might have missed, I will happily share them too):

    strategy: beam
    compute_hypothesis_token_set: true
    preserve_alignments: null
    confidence_cfg:
      preserve_frame_confidence: false
      preserve_token_confidence: false
      preserve_word_confidence: false
      exclude_blank: true
      aggregation: min
      tdt_include_duration: false
      method_cfg:
        name: entropy
        entropy_type: tsallis
        alpha: 0.33
        entropy_norm: exp
        temperature: DEPRECATED
    compute_langs: false
    greedy:
      temperature: null
      max_generation_delta: -1
      preserve_alignments: false
      preserve_token_confidence: false
      confidence_method_cfg:
        name: entropy
        entropy_type: tsallis
        alpha: 0.33
        entropy_norm: exp
        temperature: DEPRECATED
      n_samples: 1
    beam:
      beam_size: 5
      search_type: default
      len_pen: 1.0
      max_generation_delta: -1
      return_best_hypothesis: true
      preserve_alignments: false
      ngram_lm_model: null
      ngram_lm_alpha: 0.0
      boosting_tree:
        model_path: null
        key_phrases_file: null
        key_phrases_list: null
        key_phrase_items_list: null
        context_score: 1.0
        depth_scaling: 1.0
        unk_score: 0.0
        final_eos_score: 1.0
        score_per_phrase: 0.0
        source_lang: en
        use_triton: true
        uniform_weights: false
        use_bpe_dropout: false
        num_of_transcriptions: 5
        bpe_alpha: 0.3
      boosting_tree_alpha: 0.0
    temperature: 1.0
    return_xattn_scores: true

@pzelasko
Copy link
Collaborator

Thanks. I was just wondering if you have any WER comparison to other approaches or models - I would have expected canary2 to degrade with this technique.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Manual decoding_input_ids/prefix for the decoder injection

2 participants