Replies: 2 comments 1 reply
-
|
@mingewang |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Parakeet v2 vs v3 max_duration differences: Key changes:
Why v3 handles longer:
Adjusting for your needs: model:
preprocessor:
max_duration: 30 # Increase for longer audio
# If OOM, try:
encoder:
subsampling_factor: 8 # More aggressiveMemory considerations: # Estimate memory per second
memory_per_sec = model_params * 4 / 1e9 # GB
max_safe = available_gpu_memory / memory_per_secFor very long audio:
Config for long-form: trainer:
precision: bf16 # Save memory
accumulate_grad_batches: 4We train ASR models at RevolutionAI. What's your target audio duration? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
HI
I noticed a difference in model_config.yaml between Parakeet v2 and Parakeet v3 models:
Parakeet v2
train_ds:
max_duration: 40
validation_ds:
max_duration: 40
Parakeet v3
train_ds:
max_duration: 10
validation_ds:
max_duration: 30
Does this mean that Parakeet v3 was trained with training samples truncated to 10 seconds, while Parakeet v2 was trained with up to 40-second samples?
If so, what was the motivation for reducing the training max_duration in v3?
When I try to increase train_ds.max_duration for Parakeet v3 to 30 seconds, I easily run into CUDA OOM errors on an A100 (80GB).
Are there recommended settings (batch size, gradient accumulation, bucketing strategy, etc.) to safely train v3 with longer utterances?
Any guidance on best practices for handling longer audio with Parakeet v3 ( finetune) would be appreciated.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions