Skip to content

Conversation

@JosuaCarl
Copy link

@JosuaCarl JosuaCarl commented Jan 19, 2026

As described in #6710, %cpu seems to exceed the value of 100% per core. The following update addresses this by removing /proc/stat as a source for the globally passed ticks, relying instead on starttime in /proc/<pid>/stat as a global clock.

…pid>/stat`

Signed-off-by: Josua Carl <josua.carl@uni-tuebingen.de>
@netlify

This comment was marked as resolved.

@JosuaCarl JosuaCarl mentioned this pull request Jan 19, 2026
Documentation: Fixed comments to be flagged for removal
Signed-off-by: Josua Carl <josua.carl@uni-tuebingen.de>
Signed-off-by: Josua Carl <josua.carl@uni-tuebingen.de>
Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JosuaCarl, thank you for working on this fix.

I'd like to ask if you could narrow the scope of this change to focus specifically on the CPU calculation fix, rather than rewriting the entire logic structure. The current PR introduces several structural changes:

  • A new nested function nxf_observe_task_stats() with nameref variables
  • Reordering of the existing flow and variable declarations
  • Moving the trap statement and other logic around

While the underlying fix for the CPU calculation (using starttime from /proc/pid/stat instead of /proc/stat) makes sense, the extensive restructuring makes this change hard to test and validate across the wide variety of (legacy) systems where Nextflow runs.

Could you please refactor this PR to make only the minimal changes necessary to fix the CPU calculation bug? This would help us:

  1. Review the actual fix more easily
  2. Reduce the risk of introducing regressions on different Linux distributions and versions
  3. Make it easier to bisect if issues arise later

Thanks!

…nested functions and preserving execution order

Signed-off-by: Josua Carl <josua.carl@uni-tuebingen.de>
@JosuaCarl
Copy link
Author

JosuaCarl commented Jan 21, 2026

Hi @pditommaso , regarding your remarks:
The adapted implementation of nxf_trace_linux ...

  • ... does not use a nested function anymore
    • instead subtracting a sampled "before" state from the "after" stats of the process (like other stats, e.g. IO)
  • ... preserves the command flow with one exception
    • I moved all the sampling steps directly after the task execution. This ensures that the impact of other commands (such as CPU usage calculation) does not affect the sampling.
  • ... has trap at its previous position.
    • I think it could be moved because it is again another command that is executed during the sampling period and therefor a command that impacts the resource usage that is reported. I saw no impact from moving it around, but I am also unsure about what it does, so this is more a statement about trial and error.
  • ... uses the old names, when a value was previously in place
    • only introducing new verbose names for new variables
  • ... has some more comments to explain execution logic

@pditommaso
Copy link
Member

Not sure we can accept in the current form

@JosuaCarl
Copy link
Author

Ok, why exactly? If you want me to address anything else or feel like your requested changes were not addressed appropriately, please don't hesitate to say so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

%cpu / cpus > 100%

3 participants