Skip to content
Discussion options

You must be logged in to vote

Canary AST vs ASR training — great question!

Short answer: AST-only training CAN work, but joint training is better.

Why:

  1. Shared encoder benefits

    • ASR learns acoustic features
    • These transfer to AST
    • Joint training = stronger encoder
  2. When AST-only works:

    • Pre-trained encoder already exists
    • Fine-tuning for specific domain
    • Limited compute budget
  3. When joint is better:

    • Training from scratch
    • Multiple target languages
    • Maximum accuracy needed

Config for AST-only:

model:
  task: "ast"  # Audio-to-text only
  freeze_encoder: false  # Or true if pre-trained

Config for joint:

model:
  tasks: ["asr", "ast"]
  task_weights:
    asr: 0.5
    ast: 0.5

Recommendation:
If you have an ASR checkpo…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by athaze
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants