Skip to content

Conversation

@ethan-puyaubreau
Copy link
Contributor

@ethan-puyaubreau ethan-puyaubreau commented Aug 15, 2025

This PR introduces the foundational infrastructure for energy profiling tools in Kokkos:

Features:

  • Timing Infrastructure: Timing system with TimingInfo structure and region tracking
  • State Management: Thread-safe EnergyProfilerState singleton for managing active/completed timing regions
  • Region Types: Support for ParallelFor, ParallelReduce, ParallelScan, DeepCopy, and UserRegion profiling
  • CSV Export: timing_export module for exporting timing data to CSV files with summary statistics

Implementation:

  • Main Library: kp_energy_profiler - Core profiling functionality
  • Utilities: timing_utils.hpp/cpp - State management and helper functions
  • Export: timing_export.hpp/cpp - CSV export and summary generation
  • No External Dependencies: Self-contained implementation using only standard C++ libraries

Architecture:
This infrastructure provides a clean separation between timing collection, state management, and data export, making it easy for future energy monitoring providers to integrate their functionality. The design supports both synchronous and asynchronous profiling patterns (if combined with #300).

Copy link
Contributor

@JBludau JBludau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass

@ethan-puyaubreau
Copy link
Contributor Author

@dalg24 @masterleinad Hello! I would need some review on this PR, being the baseline blocks needed the daemon system and energy measurement tools (that would also need review #300)

Copy link
Contributor

@JBludau JBludau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are getting there

Comment on lines 92 to 94
// Stack-based timing for robust region/kernel tracking
void start_region(const std::string& name, RegionType type, uint64_t id = 0);
void end_region();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine but noting that if you can't correlate start and end of a region via some identifier then it won't be threadsafe.

@ethan-puyaubreau ethan-puyaubreau force-pushed the feature/energy-profiler-infrastructure branch from a9c54ce to 5d9082f Compare August 26, 2025 18:45
# - Tool interface definitions
# - Basic kernel timer tool

add_subdirectory(kokkos-tools) No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This red circle means "no newline at end of file".
Please fix it by adding a newline character.
Same comment everywhere.

Comment on lines 20 to 23
namespace KokkosTools {
namespace EnergyProfiler {

std::string generate_prefix() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace KokkosTools {
namespace EnergyProfiler {
std::string generate_prefix() {
std::string KokkosTools::EnergyProfiler::generate_prefix() {

Comment on lines 21 to 22
namespace KokkosTools {
namespace EnergyProfiler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace KokkosTools {
namespace EnergyProfiler {
namespace KokkosTools::EnergyProfiler {

@ethan-puyaubreau ethan-puyaubreau force-pushed the feature/energy-profiler-infrastructure branch 2 times, most recently from 6fb522b to f2f0606 Compare August 29, 2025 00:01
@ethan-puyaubreau ethan-puyaubreau force-pushed the feature/energy-profiler-infrastructure branch from 5fbccdf to 2629fad Compare August 29, 2025 14:44
@ethan-puyaubreau ethan-puyaubreau changed the title Energy profiling tools: baseline infrastructure and timer system Energy profiling tools: Core infrastructure with timing tool and export capabilities Aug 29, 2025
std::lock_guard<std::mutex> lock(state.get_mutex());
state.get_active_regions().push_back(region);
} catch (const std::exception& e) {
std::cerr << "Error in start_region: " << e.what() << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ... afaik that is not how exceptions are supposed to be used ... especially since you are not handling the exception but just printing and not rethrowing. So this would not lead to an abort.

if any of the functions in the try block throw, you are silencing that but also not restoring a valid state so that the program can continue. I would remove the try-catch

TimingInfo region;
region.name = name;
region.type = type;
region.start_time = std::chrono::high_resolution_clock::now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when starting, you should take the time at the end and then use it to update the last region in the dqueue. This way you don't measure the construction or the lock

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I am writing this: It of course does not hold for nested regions ... but that is too much for now

@ethan-puyaubreau ethan-puyaubreau force-pushed the feature/energy-profiler-infrastructure branch from d1bae2d to 049a99f Compare August 29, 2025 17:46
@ethan-puyaubreau ethan-puyaubreau force-pushed the feature/energy-profiler-infrastructure branch from 049a99f to 6fcf78e Compare August 29, 2025 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet