quic
diff --git a/‎Docs/tutorials/index.rst‎
Lines changed: 2 additions & 1 deletion b/‎Docs/tutorials/index.rst‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎Docs/tutorials/models/llama-3.2-1b.rst‎
Lines changed: 63 additions & 0 deletions b/‎Docs/tutorials/models/llama-3.2-1b.rst‎
Lines changed: 63 additions & 0 deletions
diff --git a/‎Docs/tutorials/models/llama-3.2-3b.rst‎
Lines changed: 63 additions & 0 deletions b/‎Docs/tutorials/models/llama-3.2-3b.rst‎
Lines changed: 63 additions & 0 deletions
diff --git a/‎Docs/tutorials/models/phi-3.5-mini.rst‎
Lines changed: 63 additions & 0 deletions b/‎Docs/tutorials/models/phi-3.5-mini.rst‎
Lines changed: 63 additions & 0 deletions
diff --git a/‎Docs/tutorials/models/qwen-2.5-0.5b.rst‎
Lines changed: 62 additions & 0 deletions b/‎Docs/tutorials/models/qwen-2.5-0.5b.rst‎
Lines changed: 62 additions & 0 deletions
diff --git a/‎Docs/tutorials/models/qwen-2.5-1.5b.rst‎
Lines changed: 62 additions & 0 deletions b/‎Docs/tutorials/models/qwen-2.5-1.5b.rst‎
Lines changed: 62 additions & 0 deletions
diff --git a/‎Docs/tutorials/models/qwen-3-4b.rst‎
Lines changed: 63 additions & 0 deletions b/‎Docs/tutorials/models/qwen-3-4b.rst‎
Lines changed: 63 additions & 0 deletions
@@ -6,7 +6,7 @@ Tutorials
 
 This section walks through tutorials to get you started on quantizing models.
 
-AIMET is packed with out-of-the-box quantization techniques to studing detailed quantization impact of each layer.
+AIMET is packed with out-of-the-box quantization techniques to studying detailed quantization impact of each layer.
 
 This section will walk you through how you can get out-of-the-box techniques to get model with best in class accuracy and
 how you take this further ahead with advanced techniques depending on your use cases.
@@ -17,6 +17,7 @@ how you take this further ahead with advanced techniques depending on your use c
 
   Quantization Workflow <quantization_workflow>
   Quantization Simulation <quantsim>
+  Quantization Recipes for LLMs <quantization_recipe>
   Example Notebooks <notebooks>
   Running Quantized Models on-device <on_target_inference>
   Debugging Guide <debugging_guidelines>
@@ -0,0 +1,63 @@
+meta-llama/Llama-3.2-1B-Instruct
+================================
+
+Precision settings:
+
+- Weights: INT4, except for:
+    - ``LM Head``: INT8
+- Activations: INT16, except for:
+    - ``KV Cache``: INT8
+
+Hyperparameters:
+
+- AdaScale: ``num_batches=128``, ``num_iterations=2048``
+- SequentialMSE: ``num_batches=20``
+- Calibration: ``num_batches=20``
+
+
+.. list-table::
+    :widths: 50 18 18 3 3 5 3
+    :header-rows: 1
+
+    * - Technique
+      - Quantized With
+      - Evaluated On
+      - PPL
+      - MMLU
+      - Time (hh:mm:ss)
+      - CUDA (GB)
+    * - FP32
+      - N/A
+      - Both
+      - 12.14
+      - 46.06
+      - 00:00:14
+      - 6.34
+    * - PCQ + SpinQuant + AdaScale
+      - ``aimet-torch``
+      - ``aimet-onnx``
+      - 13.67
+      - 42.25
+      - 02:31:06
+      - 20.89
+    * - PCQ + SpinQuant + AdaScale
+      - ``aimet-onnx``
+      - ``aimet-onnx``
+      - 13.68
+      - 41.82
+      - 01:53:17
+      - 46.38
+    * - LPBQ + SequentialMSE
+      - ``aimet-torch``
+      - ``aimet-onnx``
+      - 14.07
+      - 43.09
+      - 00:44:38
+      - 28.52
+    * - LPBQ + SequentialMSE
+      - ``aimet-onnx``
+      - ``aimet-onnx``
+      - 13.84
+      - 43.53
+      - 00:20:44
+      - 34.79
@@ -0,0 +1,63 @@
+meta-llama/Llama-3.2-3B-Instruct
+================================
+
+Precision settings:
+
+- Weights: INT4, except for:
+    - ``LM Head``: INT8
+- Activations: INT16, except for:
+    - ``KV Cache``: INT8
+
+Hyperparameters:
+
+- AdaScale: ``num_batches=128``, ``num_iterations=1024``
+- SequentialMSE: ``num_batches=20``
+- Calibration: ``num_batches=20``
+
+
+.. list-table::
+    :widths: 50 18 18 3 3 5 3
+    :header-rows: 1
+
+    * - Technique
+      - Quantized With
+      - Evaluated On
+      - PPL
+      - MMLU
+      - Time (hh:mm:ss)
+      - CUDA (GB)
+    * - FP32
+      - N/A
+      - Both
+      - 10.13
+      - 60.74
+      - 00:00:10
+      - 13.90
+    * - PCQ + SpinQuant + AdaScale
+      - ``aimet-torch``
+      - ``aimet-onnx``
+      - 11.01
+      - 58.09
+      - 06:35:22
+      - 41.24
+    * - PCQ + AdaScale
+      - ``aimet-onnx``
+      - ``aimet-onnx``
+      - 11.14
+      - 56.79
+      - 04:49:36
+      - 47.35
+    * - LPBQ + SequentialMSE
+      - ``aimet-torch``
+      - ``aimet-onnx``
+      - 10.69
+      - 59.08
+      - 02:41:44
+      - 51.11
+    * - LPBQ + SequentialMSE
+      - ``aimet-onnx``
+      - ``aimet-onnx``
+      - 10.55
+      - 59.29
+      - 01:13:12
+      - 59.41
@@ -0,0 +1,63 @@
+microsoft/Phi-3.5-mini-instruct
+===============================
+
+Precision settings:
+
+- Weights: INT4, except for:
+    - ``LM Head``: INT8
+- Activations: INT16, except for:
+    - ``KV Cache``: INT8
+
+Hyperparameters:
+
+- AdaScale: ``num_batches=128``, ``num_iterations=256``
+- SequentialMSE: ``num_batches=20``
+- Calibration: ``num_batches=20``
+
+
+.. list-table::
+    :widths: 50 18 18 3 3 5 3
+    :header-rows: 1
+
+    * - Technique
+      - Quantized With
+      - Evaluated On
+      - PPL
+      - MMLU
+      - Time (hh:mm:ss)
+      - CUDA (GB)
+    * - FP32
+      - N/A
+      - Both
+      - 5.77
+      - 68.89
+      - 00:00:08
+      - 16.17
+    * - PCQ + SpinQuant + AdaScale
+      - ``aimet-torch``
+      - ``aimet-onnx``
+      - 6.58
+      - 62.62
+      - 04:16:53
+      - 48.03
+    * - PCQ + SpinQuant + AdaScale
+      - ``aimet-onnx``
+      - ``aimet-onnx``
+      - 6.50
+      - 62.51
+      - 01:51:43
+      - 61.85
+    * - LPBQ + SequentialMSE
+      - ``aimet-torch``
+      - ``aimet-onnx``
+      - 6.45
+      - 64.63
+      - 02:03:41
+      - 37.64
+    * - LPBQ + SequentialMSE
+      - ``aimet-onnx``
+      - ``aimet-onnx``
+      - 6.41
+      - 63.90
+      - 01:32:36
+      - 75.62
@@ -0,0 +1,62 @@
+Qwen/Qwen2.5-0.5B-Instruct
+==========================
+
+Precision settings:
+
+- Weights: INT4, except for:
+    - ``LM Head``: INT8
+- Activations: INT16
+
+Hyperparameters:
+
+- AdaScale: ``num_batches=128``, ``num_iterations=2048``
+- SequentialMSE: ``num_batches=20``
+- Calibration: ``num_batches=20``
+
+
+.. list-table::
+    :widths: 50 18 18 3 3 5 3
+    :header-rows: 1
+
+    * - Technique
+      - Quantized With
+      - Evaluated On
+      - PPL
+      - MMLU
+      - Time (hh:mm:ss)
+      - CUDA (GB)
+    * - FP32
+      - N/A
+      - Both
+      - 13.14
+      - 46.30
+      - 00:00:13
+      - 3.68
+    * - PCQ + SpinQuant + AdaScale
+      - ``aimet-torch``
+      - ``aimet-onnx``
+      - 13.89
+      - 44.19
+      - 03:19:37
+      - 13.37
+    * - PCQ + SpinQuant + AdaScale
+      - ``aimet-onnx``
+      - ``aimet-onnx``
+      - 13.82
+      - 42.65
+      - 01:16:54
+      - 34.01
+    * - LPBQ + SequentialMSE
+      - ``aimet-torch``
+      - ``aimet-onnx``
+      - 15.32
+      - 42.33
+      - 00:22:39
+      - 14.25
+    * - LPBQ + SequentialMSE
+      - ``aimet-onnx``
+      - ``aimet-onnx``
+      - 15.30
+      - 43.26
+      - 00:11:33
+      - 20.43
@@ -0,0 +1,62 @@
+Qwen/Qwen2.5-1.5B-Instruct
+==========================
+
+Precision settings:
+
+- Weights: INT4, except for:
+    - ``LM Head``: INT8
+- Activations: INT16
+
+Hyperparameters:
+
+- AdaScale: ``num_batches=128``, ``num_iterations=1024``
+- SequentialMSE: ``num_batches=20``
+- Calibration: ``num_batches=20``
+
+
+.. list-table::
+    :widths: 50 18 18 3 3 5 3
+    :header-rows: 1
+
+    * - Technique
+      - Quantized With
+      - Evaluated On
+      - PPL
+      - MMLU
+      - Time (hh:mm:ss)
+      - CUDA (GB)
+    * - FP32
+      - N/A
+      - Both
+      - 12.41
+      - 54.65
+      - 00:00:10
+      - 7.78
+    * - PCQ + SpinQuant + AdaScale
+      - ``aimet-torch``
+      - ``aimet-onnx``
+      - 13.57
+      - 49.81
+      - 03:03:17
+      - 22.62
+    * - PCQ + SpinQuant + AdaScale
+      - ``aimet-onnx``
+      - ``aimet-onnx``
+      - 13.35
+      - 50.27
+      - 02:13:33
+      - 42.97
+    * - LPBQ + SequentialMSE
+      - ``aimet-torch``
+      - ``aimet-onnx``
+      - 14.86
+      - 49.25
+      - 01:07:43
+      - 26.01
+    * - LPBQ + SequentialMSE
+      - ``aimet-onnx``
+      - ``aimet-onnx``
+      - 14.33
+      - 49.97
+      - 00:37:52
+      - 34.40
@@ -0,0 +1,63 @@
+Qwen/Qwen3-4B
+=============
+
+Precision settings:
+
+- Weights: INT4, except for:
+    - ``LM Head``: INT8
+- Activations: INT16, except for:
+    - ``KV Cache``: INT8
+
+Hyperparameters:
+
+- AdaScale: ``num_batches=128``, ``num_iterations=512``
+- SequentialMSE: ``num_batches=20``
+- Calibration: ``num_batches=20``
+
+
+.. list-table::
+    :widths: 50 18 18 3 3 5 3
+    :header-rows: 1
+
+    * - Technique
+      - Quantized With
+      - Evaluated On
+      - PPL
+      - MMLU
+      - Time (hh:mm:ss)
+      - CUDA (GB)
+    * - FP32
+      - N/A
+      - Both
+      - 12.41
+      - 70.06
+      - 00:00:10
+      - 17.02
+    * - PCQ + SpinQuant + AdaScale
+      - ``aimet-torch``
+      - ``aimet-onnx``
+      - 13.85
+      - 65.07
+      - 06:41:32
+      - 47.71
+    * - PCQ + AdaScale
+      - ``aimet-onnx``
+      - ``aimet-onnx``
+      - 13.79
+      - 62.33
+      - 04:34:22
+      - 71.3
+    * - LPBQ + SequentialMSE
+      - ``aimet-torch``
+      - ``aimet-onnx``
+      - 13.10
+      - 65.66
+      - 02:41:48
+      - 39.42
+    * - LPBQ + SequentialMSE
+      - ``aimet-onnx``
+      - ``aimet-onnx``
+      - 12.77
+      - 65.36
+      - 01:35:29
+      - 63.61