-
Notifications
You must be signed in to change notification settings - Fork 256
Add support for InternLM2 model architecture #1958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit adds support for exporting InternLM2 models to ONNX format. Key changes: - Add InternLM2Model class in src/python/py/models/builders/internlm.py - Register InternLM2ForCausalLM architecture in builder.py - Implement grouped/interleaved QKV weight splitting for GQA - Map InternLM2-specific attribute names to base model equivalents - Add documentation and example in examples/python/internlm2/ InternLM2 uses a Llama-based architecture with grouped query attention and a unique grouped/interleaved QKV weight layout. The implementation correctly handles this layout during weight extraction. Tested with: - InternLM2-1.8B (FP32, INT4 RTN, INT4 AWQ) - Model generates coherent text and valid code Model hub: https://huggingface.co/internlm/internlm2-1_8b Paper: https://arxiv.org/abs/2403.17297
|
@amdrajeevp1 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
| def __init__(self, config, io_dtype, onnx_dtype, ep, cache_dir, extra_options): | ||
| super().__init__(config, io_dtype, onnx_dtype, ep, cache_dir, extra_options) | ||
| # InternLM2 is based on Llama architecture, so use 'llama' as model_type for GenAI compatibility | ||
| self.model_type = "LlamaForCausalLM" |
Check warning
Code scanning / CodeQL
Overwriting attribute in super-class or sub-class Warning
Model
|
Thanks for your contribution! Can you also make the following additions for InternLM in alphabetical order?
|
- Add comprehensive MULTI_SIZE_SUPPORT.md documenting 1.8B/7B/20B compatibility - Add export scripts for InternLM2-7B (Bash and PowerShell) - Update README with model size comparison table - Add hardware requirements and performance estimates - Include GPU export examples for 7B model The implementation is architecture-based and works with all InternLM2 sizes: - Dynamically reads config parameters (heads, layers, dimensions) - Adaptive weight splitting based on GQA ratios - No hardcoded model sizes Tested: InternLM2-1.8B Compatible: InternLM2-7B, InternLM2-20B, all Chat variants
- Merge MULTI_SIZE_SUPPORT.md into README.md for single comprehensive guide - Remove export_7b.ps1 and export_7b.sh scripts (examples already in README) - Streamline documentation structure - All export commands and multi-size information now in one place
Summary
This commit adds support for exporting InternLM2 models to ONNX format.
Key changes:
Architecture Details
InternLM2 uses a Llama-based architecture with grouped query attention and a unique grouped/interleaved QKV weight layout. The implementation correctly handles this layout during weight extraction.
Tested with:
Tested with InternLM2-1.8B:
References