-
Notifications
You must be signed in to change notification settings - Fork 20
Notebook unittests for Tellseq B and Tellseq C notebooks #358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook unittests for Tellseq B and Tellseq C notebooks #358
Conversation
…les to reflect change in add_controls output
…n CI vs local test output
…ady updated to reflect change in add_controls output
…ting point differences in CI vs local test output
Pull Request Test Coverage Report for Build 20936646009Details
💛 - Coveralls |
antgonza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor question.
| "outputs": [], | ||
| "source": [ | ||
| "# format INPUT_DNA_KEY column to avoid floating point weirdness, then write out\n", | ||
| "plate_df[INPUT_DNA_KEY] = plate_df[INPUT_DNA_KEY].map(lambda x: f\"{x:.10g}\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An excellent question! I have two answers:
- I asked chatgpt how to handle having floating point numbers expressed weirdly in file output (like 0.1 being written out as 0.09999999999999) and it suggested formatting the dataframe write out, saying "%.10g gives up to 10 significant digits and avoids lots of trailing noise; it’s a good general default for “human-ish” TSV."
- I implemented that and then looked at what changed in the output file, and what I saw was that the changes were cutting off digits I did not believe in the first place, like 7.499350000000001 becoming 7.49935, 7.504912999999999 becoming 7.504913, 7.4984139999999995 becoming 7.498414, and so forth.
That said, if you feel that it should be other, I am definitely not married to this. It does seem to matter what the particular data column is (based, I assume on how it is calculated) whether it has a bunch of meaningful digits or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation, sounds good to me.
add_controlsoutputadd_controlsoutputThis PR was an absolute beast because of differences in last-digit floating point representations in the output files created on my development machine versus those created in the github CI. To reign this in, I also made the following changes: