Skip to content

Conversation

@ahamboeck
Copy link

Title: fix: Blackwell (RTX 5090) support for environment setup and JIT loading

Summary

This PR provides a fix for running on Blackwell architecture (RTX 5090) with CUDA 12.8.1.

It builds upon #166 (which addressed JIT import failures for lib3dgrt and lib3dgut) to include lib_mcmc_cc and resolving a build isolation blocker in the installation script that prevents environment setup on newer systems.

Related Issues

Environment

  • GPU: RTX 5090
  • OS: Ubuntu 22.04
  • CUDA: 12.8.1
  • Install: ./install_env.sh

Root Cause & Proposed Fixes

1. Build Isolation Failures

Issue: During ./install_env.sh, pip creates an isolated build environment by default. On Blackwell systems with CUDA 12.8.1, these isolated environments often fail to correctly link against the specific system PyTorch/CUDA headers.
Fix: Added --no-build-isolation to the pip install commands in install_env.sh to force usage of the pre-configured host environment.

2. JIT Module Loading Failure

Issue: As identified in #165, while torch.utils.cpp_extension.load() successfully compiles the JIT extensions, a subsequent import by name (e.g., import lib3dgrt_cc) fails with ModuleNotFoundError.
Fix: - Updated setup_3dgrt(), setup_3dgut(), and setup_mcmc() to return the module object directly from jit.load().

  • Modified the loaders in tracer.py and relevant files to assign the plugin directly from that returned object.

Steps to Reproduce (Before Fix)

  1. Run ./install_env.sh on a Blackwell system (observe build failures).
  2. Attempt training: python train.py --config-name apps/nerf_synthetic_3dgrt.yaml ...
  3. Observe ModuleNotFoundError: No module named 'lib3dgrt_cc' despite successful compilation logs.

Copilot AI review requested due to automatic review settings December 26, 2025 17:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses JIT compilation and module loading failures on Blackwell (RTX 5090) architecture with CUDA 12.8.1. The fix involves two key changes: disabling pip build isolation during installation to ensure proper linking against system PyTorch/CUDA headers, and refactoring JIT module setup functions to return module objects directly rather than relying on post-compilation imports by name.

  • Modified installation script to use --no-build-isolation flag for requirements installation
  • Refactored all JIT setup functions (setup_3dgrt, setup_3dgut, setup_playground, setup_mcmc, setup_gui) to return the compiled module object
  • Updated corresponding loader functions in tracer and strategy files to use the returned module object directly

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
install_env.sh Added --no-build-isolation flag to requirements installation to force usage of pre-configured host environment
threedgrt_tracer/setup_3dgrt.py Modified to return the compiled tdgrt module object
threedgrt_tracer/tracer.py Updated loader to assign module directly from setup_3dgrt return value
threedgut_tracer/setup_3dgut.py Modified to return the compiled tdgut module object
threedgut_tracer/tracer.py Updated loader to assign module directly from setup_3dgut return value
threedgrut_playground/setup_playground.py Modified to return the compiled playground_lib module object
threedgrut_playground/tracer.py Updated loader to assign module directly from setup_playground return value
threedgrut/strategy/src/setup_mcmc.py Modified to return the compiled gaussian_mcmc module object
threedgrut/strategy/mcmc.py Updated loader to assign module directly from setup_mcmc return value
threedgrut/gui/setup_gui.py Modified to return the compiled gui_module module object

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +38 to +45
gui_module = jit.load(
name="lib3dgrut_gui_cc",
sources=source_paths,
extra_cflags=cflags,
extra_cuda_cflags=cuda_cflags,
extra_include_paths=include_paths,
)
return gui_module
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setup_gui() function now returns the gui_module, but the caller in ps_extension.py (line 31) has not been updated to use this return value. After calling setup_gui(), it still attempts to import lib3dgrut_gui_cc by name (line 32), which will fail with the same ModuleNotFoundError this PR is addressing. The ps_extension.py file should be updated to match the pattern used in other files: tdgui = setup_gui() instead of setup_gui() followed by import lib3dgrut_gui_cc as tdgui.

Copilot uses AI. Check for mistakes.
@yapkap
Copy link

yapkap commented Jan 7, 2026

Implemented these changes manually and while it does work for the base config and mcmc config, it appears a similar import error still occurs when trying to use selective_adam as optimizer (lib_optimizers_cc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants