-
Notifications
You must be signed in to change notification settings - Fork 484
Description
C++ Binding for Magika
Motivation
I needed fast, native file type detection for my terminal editor written in C++. While Magika has excellent Python and Rust implementations, there wasn't a C++ binding available. Rather than shell out to the CLI or embed Python, I built native C++ bindings using ONNX Runtime.
What I've Built
A complete C++ implementation that:
- ✅ Matches Python accuracy - Uses the same ONNX model with identical feature extraction
- ✅ High performance - ~5ms inference, 1000+ files/sec throughput
- ✅ Clean API - PIMPL pattern, exception-safe, follows modern C++ practices
- ✅ Full CLI - Feature parity with Rust CLI (recursive, JSON output, colors, etc.)
- ✅ Cross-platform - Tested on Linux, macOS (planned: Windows)
- ✅ Well documented - Comprehensive README, API docs, and examples
- ✅ Production ready - Already integrated into my editor
Implementation Highlights
Architecture:
- ONNX Runtime C++ API for inference
- Feature extraction ported from Python (lstrip/rstrip, padding, block_size)
- PIMPL pattern to hide ONNX dependencies from public headers
- CMake build system with proper install targets
Key Files:
cpp/
├── lib/ # Core library
│ ├── include/magika/ # Public headers
│ │ ├── magika.hpp
│ │ └── types.hpp
│ └── src/ # Implementation
│ └── magika.cpp
├── cli/ # Command-line tool
│ └── main.cpp
├── examples/ # Usage examples
└── README.md # Full documentation
Testing:
- Validated against Python implementation on
tests_data/ - Matches detection results for all test files
- Handles edge cases (empty files, binary files, large files)
Example Usage
Library:
#include <magika/magika.hpp>
magika::Magika detector("/path/to/models/standard_v3_3");
auto result = detector.identify_path("test.py");
std::cout << "Type: " << result.content_type << "\n" // "python"
<< "MIME: " << result.mime_type << "\n" // "text/x-python"
<< "Group: " << result.group << "\n" // "code"
<< "Confidence: " << result.score << "\n"; // 0.998CLI:
$ build/cli/magika -r ~/projects --json
142 python
89 javascript
56 cpp
23 markdownIntegration Example
Already working in my terminal editor:
// Auto-detect file type on load
auto result = detector.identify_path(filename);
if (result.content_type == "python") {
enable_python_highlighting();
}Questions for Maintainers
Before submitting a full PR, I'd appreciate guidance on:
-
Interest Level: Would you accept a C++ binding as an official part of Magika?
-
Dependency Management: How should ONNX Runtime be handled?
- Current: User provides path via CMake (
-DONNXRUNTIME_DIR=...) - Alternative: FetchContent to auto-download
- Alternative: System package (apt-get, brew, vcpkg)
- Current: User provides path via CMake (
-
Project Structure:
- Should it be in
cpp/directory (likepython/,js/)? - Or separate repo initially?
- Should it be in
-
CI/Testing:
- Add GitHub Actions for C++ builds (Linux, macOS, Windows)?
- Integration tests comparing against Python output?
- What coverage is expected?
-
API Design: Any changes needed to match project conventions?
- Current:
Magikaclass withidentify_path(),identify_bytes() - Exception-based error handling vs. Result types?
- Current:
-
Documentation:
- Is the current README sufficient?
- Should API docs be generated (Doxygen)?
Why This Adds Value
- Performance-critical applications: Game engines, embedded systems, high-throughput servers
- Native integration: Editors (Vim, Emacs plugins), file managers, backup tools
- Ecosystem gap: Go and JavaScript have bindings, C++ is a natural fit alongside Rust
Preview
Working code available at: https://github.com/moisnx/magika/tree/main/cpp
Next Steps
If you're interested, I can:
- Clean up any remaining issues based on feedback
- Add comprehensive CI workflows
- Submit a formal PR with detailed changelog
- Write integration guide for common C++ build systems
Looking forward to your thoughts! Happy to hop on a call to discuss if helpful.