ppo和dpo数据集格式问题
#7000
Replies: 1 comment
-
|
这种格式的数据集可以正常跑起来吗?我也要训多模态DPO数据 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Reminder
System Info
我所理解的dpo的数据格式如下,只有chose和reject:

有以下几个问题:
1.ppo中所提到的得分是在哪部分体现呢?
2.训练奖励模型的数据是只需要chose和reject吗?
3.这个数据集的rating是得分的意思吗,如果要转换成llamafactory支持的格式需要进行怎样的处理。
https://huggingface.co/datasets/MMInstruction/VLFeedback?row=0
Reproduction
萌新一头雾水
Others
No response
Beta Was this translation helpful? Give feedback.
All reactions