File tree Expand file tree Collapse file tree 13 files changed +1597
-2
lines changed
Expand file tree Collapse file tree 13 files changed +1597
-2
lines changed Original file line number Diff line number Diff line change @@ -6,7 +6,7 @@ Tutorials
66
77This section walks through tutorials to get you started on quantizing models.
88
9- AIMET is packed with out-of-the-box quantization techniques to studing detailed quantization impact of each layer.
9+ AIMET is packed with out-of-the-box quantization techniques to studying detailed quantization impact of each layer.
1010
1111This section will walk you through how you can get out-of-the-box techniques to get model with best in class accuracy and
1212how you take this further ahead with advanced techniques depending on your use cases.
@@ -17,6 +17,7 @@ how you take this further ahead with advanced techniques depending on your use c
1717
1818 Quantization Workflow <quantization_workflow >
1919 Quantization Simulation <quantsim >
20+ Quantization Recipes for LLMs <quantization_recipe >
2021 Example Notebooks <notebooks >
2122 Running Quantized Models on-device <on_target_inference >
2223 Debugging Guide <debugging_guidelines >
Original file line number Diff line number Diff line change 1+ meta-llama/Llama-3.2-1B-Instruct
2+ ================================
3+
4+ Precision settings:
5+
6+ - Weights: INT4, except for:
7+ - ``LM Head ``: INT8
8+ - Activations: INT16, except for:
9+ - ``KV Cache ``: INT8
10+
11+ Hyperparameters:
12+
13+ - AdaScale: ``num_batches=128 ``, ``num_iterations=2048 ``
14+ - SequentialMSE: ``num_batches=20 ``
15+ - Calibration: ``num_batches=20 ``
16+
17+
18+ .. list-table ::
19+ :widths: 50 18 18 3 3 5 3
20+ :header-rows: 1
21+
22+ * - Technique
23+ - Quantized With
24+ - Evaluated On
25+ - PPL
26+ - MMLU
27+ - Time (hh:mm: ss)
28+ - CUDA (GB)
29+ * - FP32
30+ - N/A
31+ - Both
32+ - 12.14
33+ - 46.06
34+ - 00:00:14
35+ - 6.34
36+ * - PCQ + SpinQuant + AdaScale
37+ - ``aimet-torch ``
38+ - ``aimet-onnx ``
39+ - 13.67
40+ - 42.25
41+ - 02:31:06
42+ - 20.89
43+ * - PCQ + SpinQuant + AdaScale
44+ - ``aimet-onnx ``
45+ - ``aimet-onnx ``
46+ - 13.68
47+ - 41.82
48+ - 01:53:17
49+ - 46.38
50+ * - LPBQ + SequentialMSE
51+ - ``aimet-torch ``
52+ - ``aimet-onnx ``
53+ - 14.07
54+ - 43.09
55+ - 00:44:38
56+ - 28.52
57+ * - LPBQ + SequentialMSE
58+ - ``aimet-onnx ``
59+ - ``aimet-onnx ``
60+ - 13.84
61+ - 43.53
62+ - 00:20:44
63+ - 34.79
Original file line number Diff line number Diff line change 1+ meta-llama/Llama-3.2-3B-Instruct
2+ ================================
3+
4+ Precision settings:
5+
6+ - Weights: INT4, except for:
7+ - ``LM Head ``: INT8
8+ - Activations: INT16, except for:
9+ - ``KV Cache ``: INT8
10+
11+ Hyperparameters:
12+
13+ - AdaScale: ``num_batches=128 ``, ``num_iterations=1024 ``
14+ - SequentialMSE: ``num_batches=20 ``
15+ - Calibration: ``num_batches=20 ``
16+
17+
18+ .. list-table ::
19+ :widths: 50 18 18 3 3 5 3
20+ :header-rows: 1
21+
22+ * - Technique
23+ - Quantized With
24+ - Evaluated On
25+ - PPL
26+ - MMLU
27+ - Time (hh:mm: ss)
28+ - CUDA (GB)
29+ * - FP32
30+ - N/A
31+ - Both
32+ - 10.13
33+ - 60.74
34+ - 00:00:10
35+ - 13.90
36+ * - PCQ + SpinQuant + AdaScale
37+ - ``aimet-torch ``
38+ - ``aimet-onnx ``
39+ - 11.01
40+ - 58.09
41+ - 06:35:22
42+ - 41.24
43+ * - PCQ + AdaScale
44+ - ``aimet-onnx ``
45+ - ``aimet-onnx ``
46+ - 11.14
47+ - 56.79
48+ - 04:49:36
49+ - 47.35
50+ * - LPBQ + SequentialMSE
51+ - ``aimet-torch ``
52+ - ``aimet-onnx ``
53+ - 10.69
54+ - 59.08
55+ - 02:41:44
56+ - 51.11
57+ * - LPBQ + SequentialMSE
58+ - ``aimet-onnx ``
59+ - ``aimet-onnx ``
60+ - 10.55
61+ - 59.29
62+ - 01:13:12
63+ - 59.41
Original file line number Diff line number Diff line change 1+ microsoft/Phi-3.5-mini-instruct
2+ ===============================
3+
4+ Precision settings:
5+
6+ - Weights: INT4, except for:
7+ - ``LM Head ``: INT8
8+ - Activations: INT16, except for:
9+ - ``KV Cache ``: INT8
10+
11+ Hyperparameters:
12+
13+ - AdaScale: ``num_batches=128 ``, ``num_iterations=256 ``
14+ - SequentialMSE: ``num_batches=20 ``
15+ - Calibration: ``num_batches=20 ``
16+
17+
18+ .. list-table ::
19+ :widths: 50 18 18 3 3 5 3
20+ :header-rows: 1
21+
22+ * - Technique
23+ - Quantized With
24+ - Evaluated On
25+ - PPL
26+ - MMLU
27+ - Time (hh:mm: ss)
28+ - CUDA (GB)
29+ * - FP32
30+ - N/A
31+ - Both
32+ - 5.77
33+ - 68.89
34+ - 00:00:08
35+ - 16.17
36+ * - PCQ + SpinQuant + AdaScale
37+ - ``aimet-torch ``
38+ - ``aimet-onnx ``
39+ - 6.58
40+ - 62.62
41+ - 04:16:53
42+ - 48.03
43+ * - PCQ + SpinQuant + AdaScale
44+ - ``aimet-onnx ``
45+ - ``aimet-onnx ``
46+ - 6.50
47+ - 62.51
48+ - 01:51:43
49+ - 61.85
50+ * - LPBQ + SequentialMSE
51+ - ``aimet-torch ``
52+ - ``aimet-onnx ``
53+ - 6.45
54+ - 64.63
55+ - 02:03:41
56+ - 37.64
57+ * - LPBQ + SequentialMSE
58+ - ``aimet-onnx ``
59+ - ``aimet-onnx ``
60+ - 6.41
61+ - 63.90
62+ - 01:32:36
63+ - 75.62
Original file line number Diff line number Diff line change 1+ Qwen/Qwen2.5-0.5B-Instruct
2+ ==========================
3+
4+ Precision settings:
5+
6+ - Weights: INT4, except for:
7+ - ``LM Head ``: INT8
8+ - Activations: INT16
9+
10+ Hyperparameters:
11+
12+ - AdaScale: ``num_batches=128 ``, ``num_iterations=2048 ``
13+ - SequentialMSE: ``num_batches=20 ``
14+ - Calibration: ``num_batches=20 ``
15+
16+
17+ .. list-table ::
18+ :widths: 50 18 18 3 3 5 3
19+ :header-rows: 1
20+
21+ * - Technique
22+ - Quantized With
23+ - Evaluated On
24+ - PPL
25+ - MMLU
26+ - Time (hh:mm: ss)
27+ - CUDA (GB)
28+ * - FP32
29+ - N/A
30+ - Both
31+ - 13.14
32+ - 46.30
33+ - 00:00:13
34+ - 3.68
35+ * - PCQ + SpinQuant + AdaScale
36+ - ``aimet-torch ``
37+ - ``aimet-onnx ``
38+ - 13.89
39+ - 44.19
40+ - 03:19:37
41+ - 13.37
42+ * - PCQ + SpinQuant + AdaScale
43+ - ``aimet-onnx ``
44+ - ``aimet-onnx ``
45+ - 13.82
46+ - 42.65
47+ - 01:16:54
48+ - 34.01
49+ * - LPBQ + SequentialMSE
50+ - ``aimet-torch ``
51+ - ``aimet-onnx ``
52+ - 15.32
53+ - 42.33
54+ - 00:22:39
55+ - 14.25
56+ * - LPBQ + SequentialMSE
57+ - ``aimet-onnx ``
58+ - ``aimet-onnx ``
59+ - 15.30
60+ - 43.26
61+ - 00:11:33
62+ - 20.43
Original file line number Diff line number Diff line change 1+ Qwen/Qwen2.5-1.5B-Instruct
2+ ==========================
3+
4+ Precision settings:
5+
6+ - Weights: INT4, except for:
7+ - ``LM Head ``: INT8
8+ - Activations: INT16
9+
10+ Hyperparameters:
11+
12+ - AdaScale: ``num_batches=128 ``, ``num_iterations=1024 ``
13+ - SequentialMSE: ``num_batches=20 ``
14+ - Calibration: ``num_batches=20 ``
15+
16+
17+ .. list-table ::
18+ :widths: 50 18 18 3 3 5 3
19+ :header-rows: 1
20+
21+ * - Technique
22+ - Quantized With
23+ - Evaluated On
24+ - PPL
25+ - MMLU
26+ - Time (hh:mm: ss)
27+ - CUDA (GB)
28+ * - FP32
29+ - N/A
30+ - Both
31+ - 12.41
32+ - 54.65
33+ - 00:00:10
34+ - 7.78
35+ * - PCQ + SpinQuant + AdaScale
36+ - ``aimet-torch ``
37+ - ``aimet-onnx ``
38+ - 13.57
39+ - 49.81
40+ - 03:03:17
41+ - 22.62
42+ * - PCQ + SpinQuant + AdaScale
43+ - ``aimet-onnx ``
44+ - ``aimet-onnx ``
45+ - 13.35
46+ - 50.27
47+ - 02:13:33
48+ - 42.97
49+ * - LPBQ + SequentialMSE
50+ - ``aimet-torch ``
51+ - ``aimet-onnx ``
52+ - 14.86
53+ - 49.25
54+ - 01:07:43
55+ - 26.01
56+ * - LPBQ + SequentialMSE
57+ - ``aimet-onnx ``
58+ - ``aimet-onnx ``
59+ - 14.33
60+ - 49.97
61+ - 00:37:52
62+ - 34.40
Original file line number Diff line number Diff line change 1+ Qwen/Qwen3-4B
2+ =============
3+
4+ Precision settings:
5+
6+ - Weights: INT4, except for:
7+ - ``LM Head ``: INT8
8+ - Activations: INT16, except for:
9+ - ``KV Cache ``: INT8
10+
11+ Hyperparameters:
12+
13+ - AdaScale: ``num_batches=128 ``, ``num_iterations=512 ``
14+ - SequentialMSE: ``num_batches=20 ``
15+ - Calibration: ``num_batches=20 ``
16+
17+
18+ .. list-table ::
19+ :widths: 50 18 18 3 3 5 3
20+ :header-rows: 1
21+
22+ * - Technique
23+ - Quantized With
24+ - Evaluated On
25+ - PPL
26+ - MMLU
27+ - Time (hh:mm: ss)
28+ - CUDA (GB)
29+ * - FP32
30+ - N/A
31+ - Both
32+ - 12.41
33+ - 70.06
34+ - 00:00:10
35+ - 17.02
36+ * - PCQ + SpinQuant + AdaScale
37+ - ``aimet-torch ``
38+ - ``aimet-onnx ``
39+ - 13.85
40+ - 65.07
41+ - 06:41:32
42+ - 47.71
43+ * - PCQ + AdaScale
44+ - ``aimet-onnx ``
45+ - ``aimet-onnx ``
46+ - 13.79
47+ - 62.33
48+ - 04:34:22
49+ - 71.3
50+ * - LPBQ + SequentialMSE
51+ - ``aimet-torch ``
52+ - ``aimet-onnx ``
53+ - 13.10
54+ - 65.66
55+ - 02:41:48
56+ - 39.42
57+ * - LPBQ + SequentialMSE
58+ - ``aimet-onnx ``
59+ - ``aimet-onnx ``
60+ - 12.77
61+ - 65.36
62+ - 01:35:29
63+ - 63.61
You can’t perform that action at this time.
0 commit comments