Remove triton optimization config, causing error for multi gpu inference (#2079)

tzemicheal · web-flow · commit e6a1170e526f · 2025-01-10T22:14:28.000Z
When running Triton inference for SID & Phishing detection pipeline using multi-gpu on `nvcr.io/nvidia/morpheus/morpheus-tritonserver-models:24.11`. It result on segment fault. The TRT optimization line at the config.pbtxt of the models is causing `tritonserver:24.11` to fail with following error. This PR address the issue to run when all gpu is selected for inference. > 2024-12-09 23:24:38.378753895 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. Closes #2028 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md). - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Tad ZeMicheal (https://github.com/tzemicheal) - David Gardner (https://github.com/dagardner-nv) Approvers: - https://github.com/hsin-c - David Gardner (https://github.com/dagardner-nv) URL: #2079
diff --git a/models/triton-model-repo/phishing-bert-onnx/config.pbtxt b/models/triton-model-repo/phishing-bert-onnx/config.pbtxt
@@ -28,10 +28,3 @@ dynamic_batching {
   max_queue_delay_microseconds: 50000
 }
 
-optimization { execution_accelerators {
-  gpu_execution_accelerator : [ {
-    name : "tensorrt"
-    parameters { key: "precision_mode" value: "FP16" }
-    parameters { key: "max_workspace_size_bytes" value: "1073741824" }
-    }]
-}}
diff --git a/models/triton-model-repo/sid-minibert-onnx/config.pbtxt b/models/triton-model-repo/sid-minibert-onnx/config.pbtxt
@@ -28,10 +28,3 @@ dynamic_batching {
   max_queue_delay_microseconds: 50000
 }
 
-optimization { execution_accelerators {
-  gpu_execution_accelerator : [ {
-    name : "tensorrt"
-    parameters { key: "precision_mode" value: "FP16" }
-    parameters { key: "max_workspace_size_bytes" value: "1073741824" }
-    }]
-}}