Nemotron cache-aware streaming model fails to generalize to out-of-distribution data after training

I trained nemotron streaming model model for russian on the golos dataset and shows a large WER gap between in-domain validation data and the FLEURS Russian test set, despite never being trained on FLEURS-style data.

Eval Set | WER
-- | --
In-domain val  | 7.5%
FLEURS ru test  | 27%

For reference, a non-streaming Parakeet model finetuned on the same Russian data generalizes well to FLEURS without any read-speech training data, suggesting the generalization gap is at least partially a streaming-specific problem.

### Setup:
Base model: `nvidia/nemotron-speech-streaming-en-0.6b`
Finetuned on: Golos dataset
att_context_size: default, didn't change
Decoding: greedy_batch, cache-aware streaming pipeline

Any help with this will be really appreciated. 
@nithinraok @KunalDhawan 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nemotron cache-aware streaming model fails to generalize to out-of-distribution data after training #15418

Setup:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Nemotron cache-aware streaming model fails to generalize to out-of-distribution data after training #15418

Description

Setup:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions