-
-
Notifications
You must be signed in to change notification settings - Fork 92
Add self-supervised 3D-Var-based AI data assimilation #194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
hey @jacobbieker Can you review this |
jacobbieker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't include AI-generated code or documentation.
Also, this PR is way too large, it needs to be split up.
|
|
||
|
|
||
| class SelfAttentionLayer(nn.Module): | ||
| """Self-attention layer for processing point cloud features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split these just doc changes into a separate PR, this is a huge PR that is hard to review and has a lot of unrelated changes. Do that, and then I can review just the implementation.
| return H | ||
|
|
||
|
|
||
| def generate_synthetic_data(batch_size=32, grid_size=(10, 10), num_channels=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generation for testing shouldn't be in the model file, only in the unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These model training loops, etc. stuff shouldn't be in the /models/ directory, but in a subdirectory, like /models/data_assimilation/ or something similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These feel like more generic things for training that shouldn't be in a model PR. If you want to add more visualizations, feel free, but that should be separate.
graph_weather/models/evaluation.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this evaluation should be included in this PR, a lot of the computation already exists somewhere in the repo, and is more generally useful. Please refactor or remove.
| return analysis | ||
|
|
||
|
|
||
| class SimpleDataAssimilationModel(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer just the model implementation from the paper, not a second, simpler one for this.
|
|
||
| self.network = nn.Sequential(*layers) | ||
|
|
||
| def forward(self, background, observations): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this repo, we want the models to take in points and output points, so they are all compatible. Please make this work with the same kind of interface as the GraphWeatherForecaster, or GenCast implementations.
| return sample | ||
|
|
||
|
|
||
| def create_synthetic_assimilation_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be needed, but needs changes. The models being added should be compatible with taking in points and lat/lon locations, and so should this one. This might make it not necessary to have this.
for more information, see https://pre-commit.ci
|
Sure @jacobbieker |
|
Closing this as #196 is a duplicate |
Pull Request
Description
This PR introduces a self-supervised AI-based data assimilation prototype
The implementation replaces supervised learning with a physics-based 3D-Var cost function, allowing the model to learn directly from:
Sparse/noisy observations
Scope
Focused on demonstrating the feasibility and correctness of the self-supervised assimilation approach
Intended as an experimental framework, not a full operational replacement
Checklist
Implements the 3D-Var objective as the training loss
Supports both forecast-based and cold-start first-guess states
Modular design for experimenting with different observation operators and error assumptions
Suitable as a foundation for extending to real-world variables