A toolkit for generating and validating ambiguous emotional utterances with corresponding audio representations.
This project creates a dataset of emotionally ambiguous sentences, where the same text can be interpreted with different emotions based on the tone. The pipeline:
- Generates ambiguous utterances
- Filters candidates for quality
- Creates emotional responses to these utterances
- Generates audio files with different emotional tones
- Validates the generated audio using multiple emotion recognition models
- Python 3.8+
- CUDA-capable GPU
- Hugging Face API key
- OpenAI API key (for GPT4o validation)
- Gemini API key (for Gemini validation)
- F5-TTS for audio generation
git clone <repository-url>
cd emotional-counterfactual-data
pip install -r requirements.txtSet required API keys:
export HF_ACCESS_KEY="your_huggingface_token"
export OPENAI_API_KEY="your_openai_key"
export GEMINI_API_KEY="your_gemini_key"python generate_ambig.pyCreates utterances.jsonl with emotionally ambiguous sentences.
python refilter_candidates.pyProduces filtered_utterances.jsonl with high-quality examples.
python generate_responses.pyCreates responses.jsonl with appropriate responses for each emotion.
bash filter_to_unique.shGenerates unique_sentences.txt and unique_responses.jsonl.
Requires audio reference files in a references directory:
references/man/[emotion].wavreferences/woman/[emotion].wav
python generate_audio.pyCreates audio files in the generated_audio directory.
python validate_responses.pyValidates generated audio using emotion recognition models (DiVA, Qwen2, Gemini, GPT4o).
python compute_stats.pyAnalyzes model performance and generates the final filtered dataset.
The final data contains:
- Original sentences with their intended emotions
- Audio files with different emotional tones
- Model predictions for each audio sample
- Statistical analysis of model performance