Sublime Forum

**DaGreatGatsby** · January 13, 2016, 8:58am

Sorry in advance if this is in the wrong section.

I’m currently working on gene annotations and in order for me to submit my gene annotations for alignment, I can only have the following “>gi etc.” only once in a single line. However, the files that I’m working on has more than one “>gi etc.” in a single line. For example:

gi|566047797|gb|AHC51682.1| anthranilate synthase subunit II [Sulfolobus acidocaldarius SUSAZ]
MQWGCKITDITLIIDNYDSFVYNIAQLIGELGSYPIVIRNDEITLKGVERLNPDRIVISPGPGSPDKKEDIGIVNDVIKY
LGRRIPILGICLGHQAIGFTFGTVIRRAKTIYHGKISKIVHSSSSTLYDGIPNRFEATRYHSLVIDNVKEPLVIDAYSEE
DKEIMGVHHIEYRIYGVQFHPESVGTGYGRRIFYNFLNKV
gi|499287957|ref|WP_010979247.1| anthranilate synthase subunit II [Sulfolobus tokodaii] >gi|15921491|ref|NP_377160.1| anthranilate synthase component II [Sulfolobus tokodaii str. 7] >gi|15622277|dbj|BAB66269.1| anthranilate synthase component II [Sulfolobus tokodaii str. 7]
MDLTLIIDNYDSFVYNIAQIVGELGSYPIVIRNDEITVKGVERINPDRIIISPGPGTPEKKEDIGIVIDVIKQFGRRIPI
LGICLGHQAIGYAFGAKIRKARRVFHGKISKINIINNVALYDGLPKQIEATRYHSLVIDDVKEPLIIDAYSLEDNEIMAI
HHSELRIYGVQFHPESVGTPLGKRIIYNFLNKV

I would like Sublime to Find and Erase the text that are in red. Notice that the there is a space prior to the second “>gi etc.” So, I’ve tried Find and Replace " >.+$", but it doesn’t work. Can somebody tell me how to do this properly? Thanks in advance.

**qgates** · January 13, 2016, 8:58am

First you need to make sure regex search/replace is switched on. With the find/replace panels open, ensure the button marked ‘*’ is pushed; it’s at the far left with ‘Aa’ to its right. If regex is on Sublime will hilight matches as you type expressions into the find box and syntax hilight them too, handy!

Now, your regex won’t work at present. The problem is it will match the whole line, and if I read you right you want the first >gi.* kept and any subsequent on the same line discarded?

Use:

Find: (>gi.+?) >gi.+$
Replace: $1

The () part matches the >gi.* you want to keep, and the next part matches the rest. The replacement $1 matches what’s in the (). And the ? in the () tells the regex parse to be lazy instead of greedy, otherwise the () will match x-1 instances of >gi.* which isn’t what you want; try it without the ? and you’ll see.

Incidentally, the above methodology matches the whole line and replaces it with the part we want to keep. There is a way to match only the part we want to delete and replace it with an empty string, but I find the above approach easier in these situations.

You’re welcome

How Do I Find and Replace Lines That Have " >" in Them?