How to input data into target sheet from a source without a unique ID

Meredith Rhodes
Meredith Rhodes ✭✭✭✭
edited 02/22/24 in Formulas and Functions

For context, I am mining public clinicaltrials.gov records to associate unique trial records with results publications. There are often multiple results publications linked to an individual trial record.

I have a sheet of my institutional records, each record has its own row and is uniquely identified by a national clinical trial number (NCT number).

I have a report that lists all results publications (each publication has a unique ID called the PMID) attached to each trial record. So my list looks like this (red cells are duplicate NCT numbers):


Before I studied the above report, I set up a data shuttle to populate a Results PMID cell (text format) based on the unique NCT number. This of course only grabs the *first PMID* associated with the trial record, and skips all of the others.

Trial records will have anywhere from 0-20+ results PMIDs associated with them.

I'm trying to work through a process to move all associated results PMIDs into the relevant trial row on my sheet.

Do I import the PMIDs into a dropdown list in the Results PMID column? Is there a way to run a data shuttle to select all relevant PMIDs for a trial record?

Is there a best practice for handling this sort of scenario?

Thank you!

Meredith

Meredith Rhodes, PhD

ClinicalTrials.gov Specialist

UW School of Medicine & Public Health

UW Clinical Trials Institute

mkrhodes@clinicaltrials.wisc.edu

Answers

  • Paul Newcome
    Paul Newcome ✭✭✭✭✭✭

    What if you created another column in the source and combined the NCT and PMID? Would that create a unique data point on every row?

  • Meredith Rhodes
    Meredith Rhodes ✭✭✭✭

    Hi @Paul Newcome -

    I'm not sure I follow your line of thinking. The Unique ID on my target sheet is the NCT number. If I combine the NCT + PMID into a single column on the source sheet, what would be the process for connecting it to my target sheet?

    Right now the column that I'd like to have all relevant PMIDs in on my target sheet is a text field - and I COULD just add a string of PMIDs into one cell there (although this doesn't seem like a great way to ever leverage this data). I could somehow store the data in a drop down that I can multi-select from - as long as I can automate that somehow.

    I COULD give up on trying to force this data into my primary sheet and make a sheet with parent / child rows to document all PMIDs associated with each trial record. This could be a possible way to work your idea... I can automate a checkbox via cross sheet formula on the primary sheet if a trial record has any PMID linked to it, and curate publications elsewhere.

    Maybe this is the way forward. I'm still very much in excel thinking over here.

    Meredith

    Meredith Rhodes, PhD

    ClinicalTrials.gov Specialist

    UW School of Medicine & Public Health

    UW Clinical Trials Institute

    mkrhodes@clinicaltrials.wisc.edu

  • Paul Newcome
    Paul Newcome ✭✭✭✭✭✭

    Right. Your target sheet has unique IDs of the NTC number, but you have multiple PMIDs you want associated to each.


    If you create a unique ID in the source data of NTC + PMID, you would be able to use data shuttle to pull every row over into Smartsheet. Once you get every row into Smartsheet, you can slice and dice the data as needed.

  • Meredith Rhodes
    Meredith Rhodes ✭✭✭✭

    @Paul Newcome -

    Following up on this query. I did what you suggested above to create a unique id pairing between each NCT number and the respective PMIDs linked to it. (NCT + PMID - see below)

    I would like to collect all PMIDs associated with an NCT number from the sheet pictured above on a different sheet, so I can list details about each PMID (publication) in its own row.

    Is it possible to create a formula on my target sheet that transposes an unknown number (between 0~20) of PMIDs from the source sheet into a column on the target sheet? For more context, I'm inspired to build a clinical trial transparency report card, and I'm mining a couple of data sheets to collect and 'grade' specific responsible practices. I want the above PMIDs to show up like this (where I just typed them in below):

    I can keep this always at the bottom of the template to account for a large number of entries.

    So my follow up is in 2 parts: What is the formula to collect the PMIDs from the source sheet? Index / collect? and can I make this formula grab all PMIDs as above?

    Thank you!

    Meredith

    Meredith Rhodes, PhD

    ClinicalTrials.gov Specialist

    UW School of Medicine & Public Health

    UW Clinical Trials Institute

    mkrhodes@clinicaltrials.wisc.edu

  • Paul Newcome
    Paul Newcome ✭✭✭✭✭✭

    You would need to manually populate some numbers in a separate column (can be hidden after setting up and called "Number" in this example). Starting with 1 you would go on down to accommodate however many you think you will need max and then a few more just in case. If needed, we can walk through setting up a "not enough rows" alert after we get this initial pull working.


    Once you get the Number column set, you would use

    =IFERROR(INDEX(COLLECT({PMID Column}, {NCT Column}, @cell = [Responsible Practice]$1), Number@row), "")


    From there you would dragfill this on down the column until you have populated this formula for all of the Number rows.

  • Meredith Rhodes
    Meredith Rhodes ✭✭✭✭

    Thanks @Paul Newcome - I will play around with this.

    I was wondering - can I set up a growing logic statement to collect the next PMID?

    Something like this (I'm about to paraphrase a formula here):

    In the first cell / row that I want the first PMID: INDEX / MATCH (or collect) a PMID associated with that NCT number

    In the cell below it (where I want it to look for another matching PMID): IF INDEX / MATCH = (the above cell), skip it (is this a CHAR(10)?) and return the next matching PMID - or if there is no other than stop and return nothing.

    In the cell below this one (where I want it to look for another matching PMID): IF/AND INDEX / MATCH = either of the above cells, then skip it and return the next matching PMID - or if there is no other than stop and return nothing.

    And so on. If I figure out the max number of PMIDs I currently have associated with one record - I can build this many formulas - a new one in each cell below the last.

    Is that crazy?

    Meredith

    Meredith Rhodes, PhD

    ClinicalTrials.gov Specialist

    UW School of Medicine & Public Health

    UW Clinical Trials Institute

    mkrhodes@clinicaltrials.wisc.edu

  • Meredith Rhodes
    Meredith Rhodes ✭✭✭✭

    WHOA @Paul Newcome. Your solution is far simpler than mine 😳 This worked like a charm!

    Meredith Rhodes, PhD

    ClinicalTrials.gov Specialist

    UW School of Medicine & Public Health

    UW Clinical Trials Institute

    mkrhodes@clinicaltrials.wisc.edu

  • Paul Newcome
    Paul Newcome ✭✭✭✭✭✭

    @Meredith Rhodes The INDEX/COLLECT combined with the Number column does the incremental indexing, and the IFERROR leaves the row blank if there aren't that many.

Help Article Resources

Want to practice working with formulas directly in Smartsheet?

Check out the Formula Handbook template!