Sublime Forum

**janus000** · October 10, 2016, 9:14am

This regex works just fine inside regex101.com, but won’t work inside sublime. Are some of these characters unsupported? This regex is supposed to highlight the 2nd word on every single line of a document.

**addons_zz** · October 5, 2016, 6:51pm

What is your regex? May you link like this:

https://regex101.com/r/tF2hV5/3

**janus000** · October 5, 2016, 7:19pm

Sorry about that!

**kingkeith** · October 5, 2016, 7:24pm

but your regex ^(?:\S+\s+){1}(\S+).*$ matches the whole line… ST doesn’t highlight matches that participated in groups separately/differently to the rest of the match…

why not just use ^\s*\S+\s+\K(\S+)? although your definition of “word” leaves something to be desired

**janus000** · October 5, 2016, 7:26pm

Works perfect man, thanks! And I’m a noob at regex. I was following a youtube guide a while ago, but stopped because it started delving into python stuff.

**janus000** · October 5, 2016, 7:28pm

Oh sorry about this… Just noticed I can’t change the nth word that’s selected. I don’t want it specifically for the 2nd word, but rather I want the ability to switch it by putting a number in.

**kingkeith** · October 5, 2016, 7:35pm

oh I see, I wondered why the {1} was there

^\s*(\S+\s+){1}\K(\S+)

**janus000** · October 5, 2016, 7:35pm

Thanks again! Any video guides you’d recommend? I don’t do so well with this abstract programming stuff. ;x

**kingkeith** · October 5, 2016, 7:37pm

although now that we aren’t anchoring on the end of the line, the \s can match newlines I think…

probably better off using:

^[\t ]*(\S+[\t ]+){1}\K(\S+)

no problem happy to help

unfortunately I am not a fan of videos, they are often too slow for me - I like to read somebody else may be able to suggest something though

**janus000** · October 5, 2016, 8:16pm

I’m back once again, sorry. Is there a way to preserve the line that the words are on? I need this regex to copy paste text into an excel sheet. There’s 5 columns of data, 300 rows. The rows vary from 3-5 words in length, Here’s an example of what the data “should” look like:

word word word
word word word word word
word word word
word word word
word word word word
word word word word
word word word

But instead when I copy the columns into excel, it looks like this:

word word word word word
word word word word
word word word word
word word word
word word word
word word word
word word word

**janus000** · October 6, 2016, 1:40pm

Is this doable with regex? Currently looking through video tutorials, but no luck so far.

**kingkeith** · October 6, 2016, 2:01pm

I’m not quite sure I understand what you are trying to do here. Is it something like this?

You have text in ST, and you are selecting, say the second word on each line.
Then you copy and paste it into Excel into column 1.

Then you want to select, say the third word on each line in ST, but some lines only have 2 words.
So you copy and paste into Excel into column 2 and the rows in column 1 don’t always line up with column 2 the same way they did in ST?

sounds like you would need to always select a newline character if the nth word doesn’t exist?

**janus000** · October 6, 2016, 1:49pm

This sounds correct, yes. I’ve been pulling my hair out trying to figure it out. It seems like I’d need to group the 2nd word of each line into 1 group, and then group every other word + white space into a non capturing group, and then finally group the newlines/linebreaks/whatever the proper name is into a 3rd group, and then when I copy all the highlighted text, it’ll keep only the 2nd words, and maintain the line that they were on when I paste into excel.

**kingkeith** · October 6, 2016, 2:01pm

this should work:

^([\t ]*(\S+[\t ]+){1}\K(\S+)|(?!(?2)).*$\K\n)

you can change the {1} as before, to pick a different “nth word”, where n here represents the number of words to skip, before then picking the next word

**janus000** · October 6, 2016, 2:00pm

You’re a saint, thanks! I’m trying to learn this stuff 'cause it’s super useful, I just suck with abstract concepts, and tutorials often use lots of jargon/unfamiliar vocabulary/

**janus000** · October 6, 2016, 4:40pm

Quick question (still going through this tutorial): I saw you include \t which is a tab stop, but I’m not quite sure I understand it’s purpose. I know the definition (tab stop = where the cursor stops after you press tab, usually 8 characters), but I still don’t understand how that applies here?

Edit: I replaced \t with \S, and the regex still appears to work?
Edit 2: also, I got rid of the space after \t and it also doesn’t affect anything? what’s its purpose?
Edit 3: removed the brackets around \s in capturing group 2, no change. what’s the brackets’ purpose there?
Edit4: same thing for \S inside capturing group one at the very beginning. I know what character classes are, I just don’t know why it’s being used here?
Edit 5: noticed that when set to capture the first word {0}, it only captures one later, so clearly that stuff was useful, just not sure how it did what it did.
Edit 6: I get why you put the negative lookahead (?!(?2) so that you could deducate group two, but doesn’t \K already remove previously consumed characters? Also, I don’t get how the | or operation ever triggers? Isn’t it only supposed to be one or the other, not both?

**addons_zz** · October 6, 2016, 5:21pm

See this website: http://www.rexegg.com/regex-quickstart.html

On that page for starters, there is a table. Browsing on links and searching there, you may find better. If something is not clear there, search on google.

**kingkeith** · October 7, 2016, 9:04am

dude, too many questions! haven’t you ever heard the general programmers’ advice “if it ain’t broke, don’t fix it”?

if there are no tab characters in your file, then removing \t will have no effect.

`^([\t ](\S+[\t ]+){1}\K(\S+)\|(?!(?2)).$\K\n)`

Options: Case insensitive; Exact spacing; Dot doesn’t match line breaks; ^$ match at line breaks; Default line breaks; Numbered capture; Allow duplicate names; Greedy quantifiers; Allow zero-length matches

Assert position at the beginning of a line (at beginning of the string or after a line break character) (line feed) ^
Match the regex below and capture its match into backreference number 1 ([\t ]*(\S+[\t ]+){1}\K(\S+)|(?!(?2)).*$\K\n)
- Match this alternative (attempting the next alternative only if this one fails) [\t ]*(\S+[\t ]+){1}\K(\S+)
  - Match a single character present in the list below [\t ]*
    - Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
    - The tab character \t
    - The literal character “ ”
  - Match the regex below and capture its match into backreference number 2 (\S+[\t ]+){1}
    - Exactly once (meaningless quantifier) {1}
    - Match a single character that is NOT a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) \S+
      - Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
    - Match a single character present in the list below [\t ]+
      - Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
      - The tab character \t
      - The literal character “ ”
  - Keep the text matched so far out of the overall regex match \K
  - Match the regex below and capture its match into backreference number 3 (\S+)
    - Match a single character that is NOT a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) \S+
      - Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
- Or match this alternative (the entire group fails if this one fails to match) (?!(?2)).*$\K\n
  - Assert that it is impossible to match the regex below starting at this position (negative lookahead) (?!(?2))
    - Match the regex inside capturing group number 2 (subroutine call; restore capturing groups upon exit; do not try further permutations of the call if the overall regex fails) (?2)
  - Match any single character that is NOT a line break character (line feed) .*
    - Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
  - Assert position at the end of a line (at the end of the string or before a line break character) (line feed) $
  - Keep the text matched so far out of the overall regex match \K
  - Match the line feed character \n

Explanation created with RegexBuddy

[Solved] Regex won't work inside sublime?