Skip to content

Conversation

@kashif
Copy link
Contributor

@kashif kashif commented Jan 25, 2026

No description provided.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f86b90c1b1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

$$\mathcal{L}_{\text{KD}}(\theta) = -\,\mathbb{E}_{x \sim \mathcal{D}}\left[\sum_{t=1}^{T} P_{\phi}(x_t \mid x_{<t}) \log P_{\theta}(x_t \mid x_{<t})\right]. $$ {#eq:knowledge_distillation}
This is sometimes called **offline distillation** because the teacher generates data independently of the student. An alternative is **on-policy distillation**, where the student samples its own outputs and receives feedback from the teacher, which connects distillation to reinforcement learning through the direction of KL divergence minimization [@agarwal2024policy]. See Chapter 16 for details on on-policy distillation and its relationship to RLHF.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Point on-policy distillation reference to correct chapter

This sentence directs readers to Chapter 16 for on-policy distillation details, but Chapter 16 is the evaluation chapter and contains no on-policy distillation content, while the new section lives in Chapter 15. Readers following this cross-reference will end up in the wrong chapter and miss the material entirely. Consider updating the reference to the chapter that actually introduces on-policy distillation.

Useful? React with 👍 / 👎.

@natolambert
Copy link
Owner

@kashif I appreciate the PR. I'm kind of torn about it and how I'd make edits (while being busy), so I may end up sitting on it for a bit 🙏 until i have inspiration

@kashif
Copy link
Contributor Author

kashif commented Jan 27, 2026

no worries!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants