Skip to content

Nemotron cache-aware streaming model fails to generalize to out-of-distribution data after training #15418

@hassan-webm

Description

@hassan-webm

I trained nemotron streaming model model for russian on the golos dataset and shows a large WER gap between in-domain validation data and the FLEURS Russian test set, despite never being trained on FLEURS-style data.

Eval Set WER
In-domain val 7.5%
FLEURS ru test 27%

For reference, a non-streaming Parakeet model finetuned on the same Russian data generalizes well to FLEURS without any read-speech training data, suggesting the generalization gap is at least partially a streaming-specific problem.

Setup:

Base model: nvidia/nemotron-speech-streaming-en-0.6b
Finetuned on: Golos dataset
att_context_size: default, didn't change
Decoding: greedy_batch, cache-aware streaming pipeline

Any help with this will be really appreciated.
@nithinraok @KunalDhawan

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions