I have a huge spreadsheet column filled with data like this:
Abutilon abutiloides (Jacq.) Garcke ex Hochr.
Abutilon lignosum (Cav.) G. Don
Abietinella abietina (Hedw.) Fleisch.
Hypnum L. abietinum Hedw.
Thuidium abietinum (Hedw.) Schimp.
Abronia alpina Brandegee
Abies Rogers 2009 alba Mill.
Abies amabilis (Douglas ex Loudon) Douglas ex Forbes
I want to delete all the extraneous text, leaving me with a list of scientific names, like this:
Abutilon abutiloides
Abutilon lignosum
Abietinella abietina
Hypnum abietinum
Thuidium abietinum
Abronia alpina
Abies alba
Abies amabilis
This regular expression deletes parentheses and their contents:
([^()]*)
The next step is to delete every string beginning with a capital letter preceded by a space. In other words, things like these should be deleted:
[space]L.
[space]Rogers
[space]Williams2009
ChatGPT gave me the following regex’s. The first one blew up in my face. The others are a little better, but they still don’t do the job.
\s[A-Z][^\s]*
(\b[A-Z][a-z]+ [a-z]+)(?: [A-Z][^\s]*)+
(\b[A-Z][a-z]+ [a-z]+)(?: [A-Z][^\s])
Can anyone give me a regex that will purge my file of all strings beginning with capital letters and preceded by a space? Thanks!