Skip to content

Commit e1b2b59

Browse files
committed
Updated docs for multi-rank limitation
1 parent 9e2a1f9 commit e1b2b59

File tree

1 file changed

+75
-21
lines changed
  • projects/rocprofiler-compute/docs/how-to/profile

1 file changed

+75
-21
lines changed

projects/rocprofiler-compute/docs/how-to/profile/mode.rst

Lines changed: 75 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1232,26 +1232,6 @@ to your output directory. The following example is run on the host `amd-ryzen`:
12321232
├── roofline.csv
12331233
└── sysinfo.csv
12341234
1235-
.. note::
1236-
For profiling multi-rank workloads with MPI communication, use ``--iteration-multiplexing`` and do not turn on PC sampling with `-b 21`.
1237-
1238-
.. warning::
1239-
MPI launchers (``mpirun``, ``mpiexec``, ``srun``, ``orterun``) must wrap the
1240-
``rocprof-compute`` command, not appear after ``--``. The following is **incorrect**:
1241-
1242-
.. code-block:: shell-session
1243-
1244-
$ rocprof-compute profile --name my_app -- mpirun -n 4 ./my_application # WRONG
1245-
1246-
Instead, use the correct form where the MPI launcher wraps ``rocprof-compute``:
1247-
1248-
.. code-block:: shell-session
1249-
1250-
$ mpirun -n 4 rocprof-compute profile --name my_app -- ./my_application # CORRECT
1251-
1252-
If you use an MPI launcher after ``--``, an error will be raised with guidance
1253-
on the correct usage.
1254-
12551235
ROCm Compute Profiler supports the following libraries, APIs and job schedulers:
12561236

12571237
* OpenMPI
@@ -1270,4 +1250,78 @@ specify the output directory as follows:
12701250

12711251
.. code-block:: shell-session
12721252
1273-
$ mpirun -n 4 rocprof-compute profile --output-directory /tmp/mpi_profile/%env{MY_MPI_RANK}% -- ./my_mpi_application
1253+
$ mpirun -n 4 rocprof-compute profile --output-directory /tmp/mpi_profile/%env{MY_MPI_RANK}% -- ./my_mpi_application
1254+
1255+
Limitations and Recommendations
1256+
-------------------------------
1257+
1258+
When profiling multi-rank applications, be aware of the following limitations:
1259+
1260+
**MPI Launcher Placement**
1261+
1262+
MPI launchers (``mpirun``, ``mpiexec``, ``srun``, ``orterun``) must wrap the
1263+
``rocprof-compute`` command, not appear after ``--``. The following is **incorrect**:
1264+
1265+
.. code-block:: shell-session
1266+
1267+
$ rocprof-compute profile --name my_app -- mpirun -n 4 ./my_application # WRONG
1268+
1269+
Instead, use the correct form where the MPI launcher wraps ``rocprof-compute``:
1270+
1271+
.. code-block:: shell-session
1272+
1273+
$ mpirun -n 4 rocprof-compute profile --name my_app -- ./my_application # CORRECT
1274+
1275+
If you use an MPI launcher after ``--``, an error will be raised with guidance
1276+
on the correct usage.
1277+
1278+
**Application Replay Mode (Default)**
1279+
1280+
By default, ROCm Compute Profiler uses application replay mode, which runs the
1281+
workload multiple times to collect all performance counters. This mode fails
1282+
for MPI applications because running the application multiple times results in
1283+
multiple ``MPI_Init`` and ``MPI_Finalize`` calls, which is not permitted by the
1284+
MPI specification.
1285+
1286+
**PC Sampling**
1287+
1288+
PC sampling (block 21) may fail to collect data for multi-rank applications with
1289+
MPI communication due to synchronization requirements.
1290+
1291+
**Recommended Single-Pass Modes**
1292+
1293+
For multi-rank applications with MPI communication, use one of these single-pass
1294+
profiling modes:
1295+
1296+
* ``--iteration-multiplexing``: Collects all counters in a single application run
1297+
by distributing counter collection across kernel dispatches. Recommended for
1298+
applications with sufficient kernel dispatch counts.
1299+
1300+
.. code-block:: shell-session
1301+
1302+
$ mpirun -n 4 rocprof-compute profile --name my_mpi_app --iteration-multiplexing -- ./my_mpi_app
1303+
1304+
* ``--block <N>``: Profiles only specific metric block(s), reducing the number of
1305+
counters collected to fit in a single pass.
1306+
1307+
.. code-block:: shell-session
1308+
1309+
$ mpirun -n 4 rocprof-compute profile --name my_mpi_app --block 0 -- ./my_mpi_app
1310+
1311+
* ``--set <name>``: Profiles a predefined counter set that fits in a single pass.
1312+
1313+
.. code-block:: shell-session
1314+
1315+
$ mpirun -n 4 rocprof-compute profile --name my_mpi_app --set compute_thruput_util -- ./my_mpi_app
1316+
1317+
**Multi-Node Profiling**
1318+
1319+
When profiling across multiple nodes, ensure that:
1320+
1321+
* Output directories are accessible from all nodes (shared filesystem), or
1322+
* Use node-specific output directories with ``%hostname%`` placeholder
1323+
1324+
.. code-block:: shell-session
1325+
1326+
$ mpirun -n 8 --hostfile hosts.txt rocprof-compute profile \
1327+
--output-directory /shared/profiles/%hostname%/%rank% -- ./my_mpi_app

0 commit comments

Comments
 (0)