-
Notifications
You must be signed in to change notification settings - Fork 306
task: execute gdb on cores to extract backtrace and locals #2111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This seems like a good idea. |
rzarzynski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very, very like the idea! It addresses a common headache: deploying a binary environment fully compatible with Teuthology (ceph-debug-docket.sh doesn't always work, sorry) just to run the gdb command for backtracking.
I think there is also demand for injecting other gdb cmds, more specific to particular investigation. I wonder whether having a facility to inject further commands through a task's yaml would be helpful. Customizing teuthology code is also doable but harder.
Signed-off-by: Jose J Palacios-Perez <perezjos@uk.ibm.com>
Signed-off-by: Jose J Palacios-Perez <perezjos@uk.ibm.com>
44a3be8 to
c8c6d71
Compare
teuthology/task/internal/__init__.py
Outdated
| log.info(f'Getting backtrace from core {dump} ...') | ||
| with open(gdb_output_path, 'w') as gdb_out: | ||
| gdb_proc = subprocess.Popen( | ||
| ['gdb', '--batch', '-ex', 'set pagination 0', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like introduces another dependency for teuthology worker. Also what is gonna happen when teuthology is run in non-linux environment where gdb is not installed, for example, macosx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kshtsk Precisely, I wanted to ask about the dependencies: where is the correct place to ensure the gdb package for the distro is being installed? How can we make this functionality optional, eg when using a debug build?
teuthology/task/internal/__init__.py
Outdated
| raise RuntimeError('Stale jobs detected, aborting.') | ||
|
|
||
|
|
||
| def get_backtraces_from_coredumps(coredump_path, dump_path, dump_program, dump): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in another method, could you please, use typing for arguments and method returns.
Also, the arguments description would be great to have in docstring.
|
I think the comment in |
If you're looking the place where the |
|
Mmh, I'm getting the following failure when attempting to test: These are my remotes in my local checkout: I pushed like this: |
|
All pass: |
|
The unit tests are failing. |
Precisely, yes (assuming a worker node runs in the remote -- eg. smithi-- machine that originates the core when a Linux process aborts)
Ok, will look at that
Yes, we would like to use it on the worker (remote) nodes that originate the coredump (eg. smithi machines), ideally when running debug builds. Does this mean that the functionality of this PR is better suited somewhere else (eg. ceph/qa) instead? |
The teuthology worker machine is local node where dispatcher and supervisor are running, the remote node (like smithi) are remote nodes, the code in this PR is currently implementing when gdb is executed locally (via Popen). In order to run gdb on the remote (i.e. test target nodes) you might need to call |
|
Ok, rewriting the approach:
|
Signed-off-by: Jose J Palacios-Perez <perezjos@uk.ibm.com>


It would be useful to run gdb on coredump files in their originating machine to extract their backtrace and locals and pack these as part of the tarball.
Then, scrape.py will use this information as a quick prompt of the issue. This might save time when examining issues when cores have been dropped. This might need to run gdb on the remote machine. So it also needs to ensure that gdb has been installed, which might be restricted to -debug builds only.