Skip to content

Conversation

@AmandaBirmingham
Copy link
Collaborator

@AmandaBirmingham AmandaBirmingham commented Jan 12, 2026

  • switch tellseq B notebook to using string constant MINIPICO_LIB_CONC_KEY instead of string literal to reference dataframe column name
  • modify tellseq B notebook to support unit tests
  • add unit test for tellseq B notebook main path
  • update tellseq B test files to reflect change in add_controls output
  • modify tellseq C notebook to support unit tests
  • add unit test for tellseq C notebook main path
  • update tellseq C test files to reflect change in add_controls output

This PR was an absolute beast because of differences in last-digit floating point representations in the output files created on my development machine versus those created in the github CI. To reign this in, I also made the following changes:

  • modify number of digits for INPUT_DNA_KEY column in dataframe output file of tellseq A notebook
  • modify number of digits for MINIPICO_LIB_CONC_KEY column in dataframe output file of tellseq B notebook
  • update tellseq A, tellseq B, and tellseq C dataframe test files to reflect digit limits

@coveralls
Copy link

Pull Request Test Coverage Report for Build 20936646009

Details

  • 18 of 20 (90.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 92.743%

Changes Missing Coverage Covered Lines Changed/Added Lines %
notebooks/tests/test_tellseq_B_concentration_estimation.py 9 10 90.0%
notebooks/tests/test_tellseq_C_equal_volume_pooling.py 9 10 90.0%
Totals Coverage Status
Change from base Build 20929950507: 0.1%
Covered Lines: 6607
Relevant Lines: 7124

💛 - Coveralls

Copy link
Collaborator

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor question.

"outputs": [],
"source": [
"# format INPUT_DNA_KEY column to avoid floating point weirdness, then write out\n",
"plate_df[INPUT_DNA_KEY] = plate_df[INPUT_DNA_KEY].map(lambda x: f\"{x:.10g}\")\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why 10?

Copy link
Collaborator Author

@AmandaBirmingham AmandaBirmingham Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An excellent question! I have two answers:

  1. I asked chatgpt how to handle having floating point numbers expressed weirdly in file output (like 0.1 being written out as 0.09999999999999) and it suggested formatting the dataframe write out, saying "%.10g gives up to 10 significant digits and avoids lots of trailing noise; it’s a good general default for “human-ish” TSV."
  2. I implemented that and then looked at what changed in the output file, and what I saw was that the changes were cutting off digits I did not believe in the first place, like 7.499350000000001 becoming 7.49935, 7.504912999999999 becoming 7.504913, 7.4984139999999995 becoming 7.498414, and so forth.

That said, if you feel that it should be other, I am definitely not married to this. It does seem to matter what the particular data column is (based, I assume on how it is calculated) whether it has a bunch of meaningful digits or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation, sounds good to me.

@antgonza antgonza merged commit 5c4fc69 into biocore:master Jan 12, 2026
2 checks passed
@AmandaBirmingham AmandaBirmingham deleted the notebook_unittests_tellseqB_and_tellseqC branch January 12, 2026 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants