ModelGarden-QNN-LiteRT Android Chat

A premium on-device LLM chat application for Android, powered by Google LiteRT (formerly TensorFlow Lite). Now supporting multiple Small Language Models (SLMs) including Gemma 3n and Qwen 3.

Gemma 3n is Google's latest family of models enabling efficient AI on everyday devices.

"E2B": Effective ~2 Billion parameters.
Performance: Capable of 30-50+ tokens/sec on modern mobile processors.

Qwen 3 0.6B is a highly efficient, compact model from the LiteRT Community.

Ultra-Lightweight: Only 0.6 Billion parameters.
High Speed: Extremely fast on-device inference suitable for instant chat.

🚀 Features

Multi-Model Support: Switch between Gemma 3n (Int4) and Qwen 3 0.6B (Int4) on the fly using the toolbar spinner.
Built-in Benchmarking: Real-time display of Time To First Token (TTFT), Generation Speed (tokens/sec), and response length.
Secure Downloads:
- Models are downloaded directly within the app.
- Hugging Face Token support for accessing gated/private models.
LiteRT-LM Engine: Latest Google AI Edge runtime with robust fallback (GPU -> CPU) to ensure stability across devices.
Modern Premium UI:
- Deep Blue & Soft Gray aesthetic.
- Streaming responses with performance metrics.
- Custom vector avatars and markdown support.

📊 Benchmarks (Samsung S24 Ultra)

Metric	Qwen 3 0.6B (Int4)	Gemma 3n (Int4)
Time To First Token	~690 ms	~630 ms
Generation Speed	~28 tokens/sec	~16 tokens/sec
Use Case	Quick Chat, Speed	Depth, Reasoning

🛠️ Setup & Installation

Prerequisites

Android Studio Ladybug (or newer).
Android Device (Android 10+ recommended).
~2GB free storage.

1. Clone the Repository

git clone https://github.com/carrycooldude/ModelGarden-QNN-LiteRT.git
cd ModelGarden-QNN-LiteRT

2. Build & Install

Open the project in Android Studio and run:

./gradlew installDebug

3. Usage

Launch the App: The app will check for the default model (Gemma 3n).
Download on Device: If the model is missing, the app will attempt to download it automatically.
- Note: If you see a 401/403/404 error, click the menu (three dots) -> Set HF Token and enter your Hugging Face API token.
Switch Models: Use the dropdown in the top bar to try Qwen 3.
Benchmark: Watch the green text above the input bar to see how fast your device runs!

⚠️ Notes on Hardware Acceleration

The app attempts to use GPU delegates by default.
If GPU initialization fails (common with some model architectures on specific SoCs), it automatically falls back to CPU, which is slower but more compatible.
QNN (NPU) support is experimental and depends on specific device binaries.

🎥 Demo

Gemma3n-Video.mp4

📜 License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
app		app
com/google/ai/edge/litertlm		com/google/ai/edge/litertlm
docs/images		docs/images
gradle		gradle
.gitignore		.gitignore
BENCHMARK_REPORT.md		BENCHMARK_REPORT.md
BLOG_POST.md		BLOG_POST.md
Comp.class		Comp.class
ERROR_REPORT.md		ERROR_REPORT.md
LICENSE		LICENSE
MessageC.class		MessageC.class
MsgComp.class		MsgComp.class
NPU_INTEGRATION_LOG.md		NPU_INTEGRATION_LOG.md
README.md		README.md
build.gradle.kts		build.gradle.kts
build.log		build.log
build_080.log		build_080.log
build_080_2.log		build_080_2.log
build_080_create_fail.log		build_080_create_fail.log
build_141_debug.log		build_141_debug.log
build_200.log		build_200.log
build_alpha01.log		build_alpha01.log
build_alpha01_2.log		build_alpha01_2.log
build_down_ver_2.log		build_down_ver_2.log
build_down_ver_3.log		build_down_ver_3.log
build_dynamic_ver.log		build_dynamic_ver.log
build_error.log		build_error.log
build_gpu_ver.log		build_gpu_ver.log
build_gpu_ver_2.log		build_gpu_ver_2.log
build_mix_fail.log		build_mix_fail.log
build_ver_check.log		build_ver_check.log
deps.log		deps.log
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
logcat_dump.txt		logcat_dump.txt
logcat_gemma3n_fail.txt		logcat_gemma3n_fail.txt
logcat_gemma3n_fail_141.txt		logcat_gemma3n_fail_141.txt
logcat_gemma3n_fail_utf8.txt		logcat_gemma3n_fail_utf8.txt
logcat_hybrid_fail.txt		logcat_hybrid_fail.txt
loud_log_build.txt		loud_log_build.txt
settings.gradle.kts		settings.gradle.kts
stable_debug.txt		stable_debug.txt
temp_080.zip		temp_080.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModelGarden-QNN-LiteRT Android Chat

🚀 Features

📊 Benchmarks (Samsung S24 Ultra)

🛠️ Setup & Installation

Prerequisites

1. Clone the Repository

2. Build & Install

3. Usage

⚠️ Notes on Hardware Acceleration

🎥 Demo

📜 License

About

Uh oh!

Releases 2

Packages

Languages

License

carrycooldude/ModelGarden-QNN-LiteRT

Folders and files

Latest commit

History

Repository files navigation

ModelGarden-QNN-LiteRT Android Chat

🚀 Features

📊 Benchmarks (Samsung S24 Ultra)

🛠️ Setup & Installation

Prerequisites

1. Clone the Repository

2. Build & Install

3. Usage

⚠️ Notes on Hardware Acceleration

🎥 Demo

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages