Skip to content

Conversation

@SOHAMPAL23
Copy link

Pull Request

  • Add self-supervised AI-based data assimilation using 3D-Var loss

Description

  • This PR introduces a self-supervised AI-based data assimilation prototype

  • The implementation replaces supervised learning with a physics-based 3D-Var cost function, allowing the model to learn directly from:

  • Sparse/noisy observations

Scope

  • Focused on demonstrating the feasibility and correctness of the self-supervised assimilation approach

  • Intended as an experimental framework, not a full operational replacement

Checklist

  • Implements the 3D-Var objective as the training loss

  • Supports both forecast-based and cold-start first-guess states

  • Modular design for experimenting with different observation operators and error assumptions

  • Suitable as a foundation for extending to real-world variables

@SOHAMPAL23 SOHAMPAL23 marked this pull request as draft January 5, 2026 15:47
@SOHAMPAL23 SOHAMPAL23 marked this pull request as ready for review January 6, 2026 09:07
@SOHAMPAL23
Copy link
Author

hey @jacobbieker Can you review this

@jacobbieker jacobbieker self-requested a review January 6, 2026 09:18
Copy link
Member

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't include AI-generated code or documentation.

Also, this PR is way too large, it needs to be split up.



class SelfAttentionLayer(nn.Module):
"""Self-attention layer for processing point cloud features.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split these just doc changes into a separate PR, this is a huge PR that is hard to review and has a lot of unrelated changes. Do that, and then I can review just the implementation.

return H


def generate_synthetic_data(batch_size=32, grid_size=(10, 10), num_channels=1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generation for testing shouldn't be in the model file, only in the unit tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These model training loops, etc. stuff shouldn't be in the /models/ directory, but in a subdirectory, like /models/data_assimilation/ or something similar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These feel like more generic things for training that shouldn't be in a model PR. If you want to add more visualizations, feel free, but that should be separate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this evaluation should be included in this PR, a lot of the computation already exists somewhere in the repo, and is more generally useful. Please refactor or remove.

return analysis


class SimpleDataAssimilationModel(nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer just the model implementation from the paper, not a second, simpler one for this.


self.network = nn.Sequential(*layers)

def forward(self, background, observations):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this repo, we want the models to take in points and output points, so they are all compatible. Please make this work with the same kind of interface as the GraphWeatherForecaster, or GenCast implementations.

return sample


def create_synthetic_assimilation_dataset(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be needed, but needs changes. The models being added should be compatible with taking in points and lat/lon locations, and so should this one. This might make it not necessary to have this.

@SOHAMPAL23
Copy link
Author

Sure @jacobbieker
Working on the splitting part of the pr and making other changes in the new PR

@jacobbieker
Copy link
Member

Closing this as #196 is a duplicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants