Why mean over all actions sampled in multi outcome sampling

https://github.com/EricSteinberger/Deep-CFR/blob/master/DeepCFR/workers/la/sampling_algorithms/MultiOutcomeSampler.py



as 'aprx_imm_reg' here is computed for every action and put to buffer without being summed up, I have no idea why 
'aprx_imm_reg *= legal_action_mask / n_actions_to_smpl '


I think it is because I could not understand the formula here(v~(I) = * p(a) * |A(I)), and I failed find corresponding part in your paper, 
"""
Last state values are the average, not the sum of all samples of that state since we add
        v~(I) = * p(a) * |A(I)|. Since we sample multiple actions on each traverser node, we have to average over
        their returns like: v~(I) * Sum_a=0_N (v~(I|a) * p(a) * ||A(I)|| / N).
"""

is there any reference for it?


thanks a lot




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why mean over all actions sampled in multi outcome sampling #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Why mean over all actions sampled in multi outcome sampling #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions