Hello all,
I'm looking to build a formula that will return the digital object identifier (doi) string from a paper citation. The problem is, sometimes it the last piece of the citation, sometimes there is additional text at the end, and sometimes there isn't a doi to return.
I've been playing around with a few formulas that I've found in the community here - with variable success. Here is the data that I'm looking at. I'm seeking to put a formula into the DOI column that will extract the text following "doi: " (if this exists, otherwise return N/A) and before either ". Epub" or ". eCollection" or ". Erratum" or ". No abstract" or ". Print". I would also like to exclude the final "." to return the information as listed in the example below.
=RIGHT(citation@row, LEN(citation@row) - FIND(" doi:", citation@row) - 4)
seems to return everything after " doi:" if it exists, but doesn't exclude any text after the string.
=LEFT([DOI formula]@row, (FIND(". Epub ", [DOI formula]@row) - 1))
will exclude everything after ". Epub" if I combine it with the result of the first formula (another column I've labeled [DOI formula]).
I'm having trouble setting up the right logic to consider all possible text after the doi string.
I've attempted to explain this to the AI generator - with no luck. So I'm either not a great AI communicator - or this one is for the humans out there.
Anyone super skilled in this sort of text extraction?
Thank you!
Meredith