ppo和dpo数据集格式问题 #7000

Youmo1 · 2025-02-19T06:58:27Z

Youmo1
Feb 19, 2025

Reminder

I have read the above rules and searched the existing issues.

System Info

我所理解的dpo的数据格式如下，只有chose和reject：

有以下几个问题：
1.ppo中所提到的得分是在哪部分体现呢？
2.训练奖励模型的数据是只需要chose和reject吗？
3.这个数据集的rating是得分的意思吗，如果要转换成llamafactory支持的格式需要进行怎样的处理。
https://huggingface.co/datasets/MMInstruction/VLFeedback?row=0

Reproduction

萌新一头雾水

Others

No response

Yukinonooo · 2026-01-14T12:47:07Z

Yukinonooo
Jan 14, 2026

这种格式的数据集可以正常跑起来吗？我也要训多模态DPO数据

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ppo和dpo数据集格式问题 #7000

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

ppo和dpo数据集格式问题 #7000

Uh oh!

Youmo1 Feb 19, 2025

Reminder

System Info

Reproduction

Others

Replies: 1 comment

Uh oh!

Yukinonooo Jan 14, 2026

Youmo1
Feb 19, 2025

Yukinonooo
Jan 14, 2026