Sublime Forum

"find" a character in the second column

#1

Okay, I have a two column text file of names. The first column contains the name in “last name, first name” format while the second contains “first name last name”.

Somehow, probably me typing, I seem to end up with a comma in the second column - there shouldn’t be any. I can’t look generally for commas since there should always be one in the first column.

Can someone please tell me how I can look for a specific character, in this case a comma but possibly others, only in the second column?

I tried to search here but for the life of me I couldn’t come up with a reasonable search.

0 Likes

#2

Assuming the second column is left-aligned to a given column/character position, you could select the second column and then use find/replace constrained to the selection.

Column select by dragging middle mouse button (often mousewheel), or with keyboard press Ctrl+Alt+Down at the left column, then Shift+End to select (all Windows keys).

:slight_smile:

0 Likes

#3

Thanks for the response, M. Gates. Well, that works. Yes, it’s tab delimited at position 30 so I can get there easily enough.

Now, do you have a quicker way to select the second column in all 500,000 lines? Ctl+Alt+Dwn works, just takes forever to get from the top of the file to the bottom.

I’m a newbie at using Sublime, if that’s not obvious. That file just outgrew Notepad++ which started doing some really weird things to my data.

0 Likes

#4

For a widespread operation in a big file, you’re probably better off trying to craft a regular expression that matches directly in the second column without having to select it first or things will slow to a crawl (along with taking a long time to select the data ;)).

An example would be something like the following:

  • Find: ^([^\t]*\t[^\t]*),
  • Replace: \1
  • Note: there is a space character following the \1 above).

Basically:

  • From the start of the line, match 0 or more characters that aren’t tabs, followed by a tab
  • Now the match has consumed the entire first column
  • Match 0 or more characters that aren’t tabs, followed by a comma

The result would be to match everything from the start of the line up until the first comma seen in the second column, capturing all of the preceding text so that it can be inserted back with a space following it instead of a comma.

Since we know that tab characters separate columns of data, commas in the first column are skipped because the match has to see a tab before it continues on; from there any comma we see prior to a tab must be in the second column, and we can’t see into the third column or beyond because that would require a tab but the second part of the match excludes tab characters.

If you have examples with multiple commas in the second column you would have to run the operation more than once to find them all. Also, although this worked for me in a small contrived data set, I would recommend that you make sure you can reverse the operation if things go weird. :slight_smile:

1 Like

#5

Yes, selecting big columns in Sublime can be a PITA. However, there is a way:

  1. Normally highlight all the lines with the mouse/kb
  2. Press Ctrl+Shift+L to turn each line into a separate selection
  3. Hit the home key to lose your selections and leave your multiple cursors all aligned at the left
  4. Cursor across to the desired column
  5. Hit Shift+End to select to the end of line for each cursor.

Dunno how that’s going to perform with 500k lines though. Tbh, Sublime isn’t great at heavy lifting with column operations, mainly because it doesn’t “know” column selection as such. Multi cursors are used to get you there, but the idea isn’t great when operating on huge files or doing complex column manipulations. Sometimes I still kick back to my old editor, Crisp, for doing such things.

:slight_smile:

0 Likes

#6

@OdatNurd, I use regex some with a few other searches, it didn’t dawn on me to use them here … duh! I’m playing with regex101.com to test some of what I’ve been doing and it seems as if the php variety of regex is not what sublime uses.

Using what you suggested would work if the line I was looking at had a comma in the second column- most of them don’t so it wraps around to the comma it finds in the next line in the first column.

Hmmm … back to the drawing board.

0 Likes

#7

I believe that Sublime uses the boost library for it’s regex searching in this case, which is a PCRE library.

Wrapping didn’t happen in my test example, but apparently I completely glossed over your rather explicit mention of the file being two columns.

What I posted above works with 3 or more columns because it’s rooted at the start of the line and would need to pass a tab character to escape the second column.

In a two column example, you need to tack on some matching to “eat” the rest of the line:

  • Find: ^([^\t]*\t[^\t]*),([^\t]*$)
  • Replace: \1 \2

Probably if you know for sure your second column doesn’t contain a tab, just a simple (.*$) after the comma is simpler.

0 Likes

#8

@Odatnurd This is a list of authors names, optionally there may be 3rd / 4th / ??? columns depending on the various ways they choose to identify themselves.

This rendition of the regex works !!! man, you have no idea how much time you’ve saved me (and at 71 I have to watch the minutes, there aren’t all that many left!!)

All I really want to do is find the commas; sometimes their presence indicates other problems I want to clean up so simply finding them is enough. As I said, this rendition works beautifully for that! Thanks!!!

1 Like