This is the implementation for "Common Benchmarks Undervalue the Generalization Power of Programmatic Policies" paper. For more info visit project page. This repository serves as an umbrella for all code implementations and experimental results related to our research. The work is organized into four dedicated submodules, each focusing on a specific environment or set of experiments:
- SparsePolicies: Core implementations and experiments related to
Karel,SparseMaze,Cartpole,Quad, andParallelParkenvironments. - SparsePolicies_Torcs: Experiments related to the
Torcsenvironment. This repository uses a Dockerized version (also Apptainer for Compute Canada) of the Torcs server. - SparsePolicies_ParallelPark: Additional experiments for the
ParallelParkenvironment. - FunSearch: Using this approach for finding programmatic policies for the
SparseMazeenvironment.
After cloning this super-project, initialize and update all submodules with:
git submodule update --init --recursiveEach submodule can then be accessed and run according to its own documentation and setup instructions.
@misc{rajabpour2025commonbenchmarksundervaluegeneralization,
title={Common Benchmarks Undervalue the Generalization Power of Programmatic Policies},
author={Amirhossein Rajabpour and Kiarash Aghakasiri and Sandra Zilles and Levi H. S. Lelis},
year={2025},
eprint={2506.14162},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2506.14162},
}
Amirhossein Rajabpour, Kiarash Aghakasiri, Sandra Zilles, Levi Lelis