Manually patched GDCquery_clinic bug -- incorrect sample set downloaded for TARGET-NBL

I had existing code that downloaded clinical data from the [TARGET-NBL](https://portal.gdc.cancer.gov/projects/TARGET-NBL) cohort using the [GDCquery_clinic function](https://github.com/BioinformaticsFMRP/TCGAbiolinks/blob/master/R/clinical.R), downloading the 842 cases that have clinical data. The code is:

`GDCquery_clinic(project = "TARGET-NBL", type = "clinical")`

**Previously, it worked perfectly, but then months later, I ran the exact same script again, and it downloaded a different set of 842 cases**, which had overlap with the original set. After days of digging I found the issue:

For some reason, on line 241, there is an if statement with this code that tests if the project is a TCGA project or not:

`if f (grepl("TCGA",project)){`

And only if this evaluates to TRUE, part of the case filter for the API request URL will include selecting only cases where the field `files.data_category` is equal to "Clinical" (lines 246-247). However, my project was TARGET-NBL, so the else block will instead execute, which leaves out this case filter in the API request URL. Without this filter, there are over 1100 cases to choose from (looking at all cases, whether or not they have "Clinical" data).

Importantly, though, on line 233, we set up the "size" (case count) filter, which will set `size=842` in the API request URL. Therefore, when we call the function on TARGET-NBL, it still specifies that we must retrieve only 842 cases, so it just arbitrarily selects 842 of the 1100 cases. This is why the code downloaded an oddly different set of 842 cases the second time.

**The patch I used was just to get rid of the if statement on line 241 and execute the code in the if-true block always.** I'm not sure if there was ever a rationale for this if-else code block, but at least for TARGET projects, we need to execute the if-true block that adds the "files.data_category" case filter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manually patched GDCquery_clinic bug -- incorrect sample set downloaded for TARGET-NBL #654

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Manually patched GDCquery_clinic bug -- incorrect sample set downloaded for TARGET-NBL #654

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions