-
Notifications
You must be signed in to change notification settings - Fork 69
Energy profiling tools: Add Daemon class for periodic task execution #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from 14 commits
3b8efed
d47feef
5b58534
9458838
f01b82e
4502851
75fa7d4
67a8030
626f777
646bf35
d26e43d
008bfcb
a25f2b7
9dd32a0
da422c5
04e6724
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| //@HEADER | ||
| // ************************************************************************ | ||
| // | ||
| // Kokkos v. 4.0 | ||
| // Copyright (2022) National Technology & Engineering | ||
| // Solutions of Sandia, LLC (NTESS). | ||
| // | ||
| // Under the terms of Contract DE-NA0003525 with NTESS, | ||
| // the U.S. Government retains certain rights in this software. | ||
| // | ||
| // Part of Kokkos, under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://kokkos.org/LICENSE for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //@HEADER | ||
|
|
||
| #include "daemon.hpp" | ||
| #include <stdexcept> | ||
| #include <thread> | ||
|
|
||
| namespace KokkosTools::EnergyProfiler { | ||
| void Daemon::start() { | ||
| if (!running_) { | ||
| running_ = true; | ||
| thread_ = std::thread(&Daemon::run, this); | ||
| } | ||
| } | ||
|
|
||
| void Daemon::stop() { | ||
| if (running_) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are the semantics here? Should we require that it's been running?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean, it is not threadsafe, so we are limited with the current impl. But if we care we can either go to jthread or at least use atomics for the status
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, I think C++20
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently the tools don't have a minimum requirement for 20, right? Since most of the other tools are not threadsafe, I would say that is a larger discussion. |
||
| running_ = false; | ||
| thread_.join(); | ||
| } | ||
| } | ||
|
|
||
| void Daemon::run() { | ||
| while (running_) { | ||
| auto next_run = std::chrono::high_resolution_clock::now() + interval_; | ||
| func_(); | ||
| std::this_thread::sleep_until(next_run); | ||
|
Comment on lines
+38
to
+40
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just wondering about use cases where e.g. the time that the function takes is longer than the sampling interval.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We never ment this to be able to run a function at an interval lower than its own execution time. That would be a much more complex daemon.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. Just thinking that in terms of strategy for the PR, it may make sense to clarify whether the PR is going for a general deamon, or for a deamon that works for the energy use case. Perhaps focusing on the energy use case would be helpful because it would give a good framework to think about issues like the choice of the tick for the duration, or this issue of the sampling frequency vs the duration of the API call. If you narrow the focus to a daemon specific to the energy use case, the daemon could be placed into the That being said, I do think it would make sense to think about what happens when the API call takes longer than the sampling interval. Perhaps not systematically. But e.g. in the case where for whatever random reason, the API call happens to take a bit longer than expected.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First, let me preface by saying this work by @ethan-puyaubreau has been valuable in terms of capabilities for energy profiling and monitoring as well as ideas for software engineering that apply to other tools in Kokkos Tools. With that, to hone in on your bigger picture question on a generic daemon class: Assuming I have understood your use of the term Daemon as a real-time independent monitoring service, a key reason for my Kokkos Tools PR #291 for the Kokkos Tools LDMS connector is enabling this daemon-like functionality via a production-level performance monitoring software tool, called LDMS (Lightweight Distributed Monitoring Service). The motivation for it is that the Kokkos Tools LDMS connector can eventually integrated with - or more specifically be a parent tools library to - any other Kokkos Tools child/base library, e.g., nvtx-connector, variorium-connector. A variant of this LDMS connector had been used for the last couple years in production - it has been closed-source until this PR #291. (By the way, PR #291 is ready for review - feel free to provide suggestions for improvements and/or let me know if you have questions on it.)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I moved it into the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! |
||
| } | ||
| } | ||
| } // namespace EnergyProfiler | ||
| } // namespace KokkosTools | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| //@HEADER | ||
| // ************************************************************************ | ||
| // | ||
| // Kokkos v. 4.0 | ||
| // Copyright (2022) National Technology & Engineering | ||
| // Solutions of Sandia, LLC (NTESS). | ||
| // | ||
| // Under the terms of Contract DE-NA0003525 with NTESS, | ||
| // the U.S. Government retains certain rights in this software. | ||
| // | ||
| // Part of Kokkos, under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://kokkos.org/LICENSE for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //@HEADER | ||
|
|
||
| #ifndef KOKKOSP_ENERGY_PROFILER_DAEMON_HPP | ||
| #define KOKKOSP_ENERGY_PROFILER_DAEMON_HPP | ||
|
|
||
| #include <chrono> | ||
| #include <functional> | ||
| #include <thread> | ||
|
|
||
| namespace KokkosTools::EnergyProfiler { | ||
| class Daemon { | ||
| public: | ||
| Daemon(std::function<void()> func, const std::chrono::duration& interval) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One quick note is I don't see a default value for the time interval of the Daemon but maybe you could have one. If you do, I think a reasonable one would be 1 second. This would at least makes sense from the point of view of #291. Maybe there are other use cases and situations where you want it to be shorter (or longer).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I think the user explicitly specifying a duration is a good practice here. It is a core setting of the daemon that should be thought about explicitly.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, got it - yes, I agree. We want this Daemon class to be generic beyond the energy profiling use case and/or LDMS use case, and it should indeed be thought out by users explicitly based on the scenario. |
||
| : interval_(interval), func_(std::move(func)){}; | ||
|
|
||
| void start(); | ||
| void stop(); | ||
| bool is_running() const { return running_; } | ||
| std::thread& get_thread() { return thread_; } | ||
JBludau marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| private: | ||
| void run(); | ||
| std::chrono::duration interval_; | ||
| bool running_{false}; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there an issue of thread safety?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes. As stated in a previous comment, we could just switch to jthread and https://en.cppreference.com/w/cpp/thread/jthread/request_stop.html once we have c++20 |
||
| std::function<void()> func_; | ||
| std::thread thread_; | ||
| }; | ||
| } // namespace EnergyProfiler | ||
| } // namespace KokkosTools | ||
| #endif | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the semantics of
start()if it's already running? Should it break? Should it be restarted?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should just do nothing if it is already running. Multiple start calls should be fine since in our current use case the global system time is recorded. If someone wants to use it differently I would opt we add a
restart()functionality.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but as described below ... this is not threadsafe anyway