AI4LIFE-GROUP

All

29 repositories

med-safety-bench
Public
MedSafetyBench: Evaluating and Improving the Medical Safety of LLMs, NeurIPS 2024
Python
•
MIT License
•4•40•0•0•Updated Dec 4, 2025Dec 4, 2025
temporal-saes
Public
Codebase for Temporal SAEs paper
Python
•
Apache License 2.0
•1•3•0•0•Updated Nov 14, 2025Nov 14, 2025
sae_robustness
Public
Python
•
MIT License
•1•3•0•0•Updated Oct 5, 2025Oct 5, 2025
SpLiCE
Public
Sparse Linear Concept Embeddings
Python
•
Apache License 2.0
•10•129•2•0•Updated Mar 27, 2025Mar 27, 2025
RLHF_Trust
Public
Python
•0•3•1•0•Updated Dec 21, 2024Dec 21, 2024
interp_interv
Public
Code for "Towards Unifying Interpretability and Control: Evaluation via Intervention"
Python
•0•2•0•0•Updated Nov 8, 2024Nov 8, 2024
rocerf_code
Public
Source code for ROCERF
Jupyter Notebook
•
MIT License
•0•0•0•0•Updated Sep 2, 2024Sep 2, 2024
OpenXAI
Public
OpenXAI : Towards a Transparent Evaluation of Model Explanations
benchmark leaderboard reproducibility interpretability explainable-ai explainability
JavaScript
•
MIT License
•45•252•7•1•Updated Aug 17, 2024Aug 17, 2024
LLM_Explainer
Public
Code for paper: Are Large Language Models Post Hoc Explainers?
interpretability xai explainability large-language-models llm
Jupyter Notebook
•
MIT License
•5•34•1•0•Updated Jul 22, 2024Jul 22, 2024
average-case-robustness
Public
Characterizing Data Point Vulnerability via Average-Case Robustness, UAI 2024
robustness adversarial-robustness robustness-verification randomized-smoothing multivariate-normal-distribution
Python
•
MIT License
•0•0•0•0•Updated May 7, 2024May 7, 2024
disagreement-problem
Public
The Disagreement Problem in Explainable ML, TMLR 2025
Jupyter Notebook
•
MIT License
•0•2•0•0•Updated Apr 16, 2024Apr 16, 2024
fair-unlearning
Public
Fair Machine Unlearning: Data Removal while Mitigating Disparities
Python
•
Apache License 2.0
•2•3•0•0•Updated Feb 15, 2024Feb 15, 2024
DiET
Public
Code for "Discriminative Feature Attributions via Distractor Erasure Tuning"
Python
•1•2•0•0•Updated Dec 12, 2023Dec 12, 2023
amplify
Public
Python
•0•1•0•0•Updated Nov 27, 2023Nov 27, 2023
Balanced_Recourse
Public
Jupyter Notebook
•0•0•0•0•Updated Nov 7, 2023Nov 7, 2023
lcnn
Public
Low Curvature Neural Networks (NeurIPS 2022)
Python
•0•0•0•0•Updated Nov 6, 2023Nov 6, 2023
ProbabilisticallyRobustRecourse
Public
"Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness". M. Pawelczyk, T. Datta, J. v.d Heuvel, G. Kasneci, H. Lakkaraju. In…
Python
•
MIT License
•0•0•0•0•Updated Oct 19, 2023Oct 19, 2023
CounterfactualDistanceAttack
Public
"On the Privacy Risks of Algorithmic Recourse". Martin Pawelczyk, Himabindu Lakkaraju* and Seth Neel*. In International Conference on Artificial Intelligence an…
Jupyter Notebook
•0•0•0•0•Updated Oct 19, 2023Oct 19, 2023
In-Context-Unlearning
Public
"In-Context Unlearning: Language Models as Few Shot Unlearners". Martin Pawelczyk, Seth Neel* and Himabindu Lakkaraju*; arXiv preprint: arXiv:2310.07579; 2023.
Jupyter Notebook
•0•0•0•0•Updated Oct 19, 2023Oct 19, 2023
robust-grads
Public
Code for https://arxiv.org/abs/2306.06716
Python
•0•1•0•0•Updated Jun 22, 2023Jun 22, 2023
UAI22_DataPoisoningAttacksonOff-PolicyPolicyEvaluationMethods_RL
Public
DOPE: Data Poisoning Attacks on Off-Policy Policy Evaluation Methods
Python
•0•0•0•0•Updated May 9, 2023May 9, 2023
GraphXAI
Public
GraphXAI: Resource to support the development and evaluation of GNN explainers
Python
•
MIT License
•35•1•0•0•Updated Mar 18, 2023Mar 18, 2023
lfa
Public
Local function approximation (LFA) framework, NeurIPS 2022
function-approximation interpretability explainable-ai explainable-ml explainability faithful-explanation
Python
•4•5•0•0•Updated Feb 6, 2023Feb 6, 2023
arxiv-latex-cleaner
Public
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
Python
•
Apache License 2.0
•382•0•0•0•Updated Sep 21, 2022Sep 21, 2022
ROAR
Public
Jupyter Notebook
•2•5•0•0•Updated Jan 26, 2022Jan 26, 2022
unified_representation
Public
Python
•
MIT License
•0•1•0•0•Updated Dec 17, 2021Dec 17, 2021
fair_ranking_effectiveness_on_outcomes
Public
AIES 2021 Paper: Does Fair Ranking Imporve Minority Outcomes?
Jupyter Notebook
•0•1•0•0•Updated Dec 5, 2021Dec 5, 2021
rise-against-distribution-shift
Public
Code base for robust learning for an intersection of causal and adversarial shifts
Python
•0•3•0•0•Updated Nov 25, 2021Nov 25, 2021
nifty
Public
Code for paper https://arxiv.org/abs/2102.13186
Python
•
MIT License
•13•1•0•0•Updated Apr 3, 2021Apr 3, 2021