Skip to content

Commit b4a47aa

Browse files
authored
Uppercase the ratio term R_t in PPO objective (#201)
1 parent f2adb7c commit b4a47aa

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

chapters/11-policy-gradients.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,7 @@ $$
287287
J(\theta)
288288
=
289289
\mathbb{E}_{t}\left[
290-
\min\left(r_t(\theta)A_t,\ \text{clip}(r_t(\theta),1-\varepsilon,1+\varepsilon)A_t\right)
290+
\min\left(R_t(\theta)A_t,\ \text{clip}(R_t(\theta),1-\varepsilon,1+\varepsilon)A_t\right)
291291
\right],
292292
\qquad
293293
R_t(\theta)=\frac{\pi_\theta(a_t\mid s_t)}{\pi_{\theta_{\text{old}}}(a_t\mid s_t)}.

0 commit comments

Comments
 (0)