This regex works just fine inside regex101.com, but won’t work inside sublime. Are some of these characters unsupported? This regex is supposed to highlight the 2nd word on every single line of a document.
[Solved] Regex won't work inside sublime?
but your regex ^(?:\S+\s+){1}(\S+).*$
matches the whole line… ST doesn’t highlight matches that participated in groups separately/differently to the rest of the match…
why not just use ^\s*\S+\s+\K(\S+)
? although your definition of “word” leaves something to be desired
Works perfect man, thanks! And I’m a noob at regex. I was following a youtube guide a while ago, but stopped because it started delving into python stuff.
Oh sorry about this… Just noticed I can’t change the nth word that’s selected. I don’t want it specifically for the 2nd word, but rather I want the ability to switch it by putting a number in.
Thanks again! Any video guides you’d recommend? I don’t do so well with this abstract programming stuff. ;x
although now that we aren’t anchoring on the end of the line, the \s
can match newlines I think…
probably better off using:
^[\t ]*(\S+[\t ]+){1}\K(\S+)
no problem happy to help
unfortunately I am not a fan of videos, they are often too slow for me - I like to read somebody else may be able to suggest something though
I’m back once again, sorry. Is there a way to preserve the line that the words are on? I need this regex to copy paste text into an excel sheet. There’s 5 columns of data, 300 rows. The rows vary from 3-5 words in length, Here’s an example of what the data “should” look like:
word word word
word word word word word
word word word
word word word
word word word word
word word word word
word word word
But instead when I copy the columns into excel, it looks like this:
word word word word word
word word word word
word word word word
word word word
word word word
word word word
word word word
Is this doable with regex? Currently looking through video tutorials, but no luck so far.
I’m not quite sure I understand what you are trying to do here. Is it something like this?
You have text in ST, and you are selecting, say the second word on each line.
Then you copy and paste it into Excel into column 1.
Then you want to select, say the third word on each line in ST, but some lines only have 2 words.
So you copy and paste into Excel into column 2 and the rows in column 1 don’t always line up with column 2 the same way they did in ST?
sounds like you would need to always select a newline character if the nth word doesn’t exist?
This sounds correct, yes. I’ve been pulling my hair out trying to figure it out. It seems like I’d need to group the 2nd word of each line into 1 group, and then group every other word + white space into a non capturing group, and then finally group the newlines/linebreaks/whatever the proper name is into a 3rd group, and then when I copy all the highlighted text, it’ll keep only the 2nd words, and maintain the line that they were on when I paste into excel.
this should work:
^([\t ]*(\S+[\t ]+){1}\K(\S+)|(?!(?2)).*$\K\n)
you can change the {1}
as before, to pick a different “nth word”, where n
here represents the number of words to skip, before then picking the next word
You’re a saint, thanks! I’m trying to learn this stuff 'cause it’s super useful, I just suck with abstract concepts, and tutorials often use lots of jargon/unfamiliar vocabulary/
Quick question (still going through this tutorial): I saw you include \t which is a tab stop, but I’m not quite sure I understand it’s purpose. I know the definition (tab stop = where the cursor stops after you press tab, usually 8 characters), but I still don’t understand how that applies here?
Edit: I replaced \t with \S, and the regex still appears to work?
Edit 2: also, I got rid of the space after \t and it also doesn’t affect anything? what’s its purpose?
Edit 3: removed the brackets around \s in capturing group 2, no change. what’s the brackets’ purpose there?
Edit4: same thing for \S inside capturing group one at the very beginning. I know what character classes are, I just don’t know why it’s being used here?
Edit 5: noticed that when set to capture the first word {0}, it only captures one later, so clearly that stuff was useful, just not sure how it did what it did.
Edit 6: I get why you put the negative lookahead (?!(?2) so that you could deducate group two, but doesn’t \K already remove previously consumed characters? Also, I don’t get how the | or operation ever triggers? Isn’t it only supposed to be one or the other, not both?
See this website: http://www.rexegg.com/regex-quickstart.html
On that page for starters, there is a table. Browsing on links and searching there, you may find better. If something is not clear there, search on google.
dude, too many questions! haven’t you ever heard the general programmers’ advice “if it ain’t broke, don’t fix it”?
if there are no tab characters in your file, then removing \t
will have no effect.
^([\t ]*(\S+[\t ]+){1}\K(\S+)|(?!(?2)).*$\K\n) |
---|
Options: Case insensitive; Exact spacing; Dot doesn’t match line breaks; ^$ match at line breaks; Default line breaks; Numbered capture; Allow duplicate names; Greedy quantifiers; Allow zero-length matches
- Assert position at the beginning of a line (at beginning of the string or after a line break character) (line feed)
^
- Match the regex below and capture its match into backreference number 1
([\t ]*(\S+[\t ]+){1}\K(\S+)|(?!(?2)).*$\K\n)
- Match this alternative (attempting the next alternative only if this one fails)
[\t ]*(\S+[\t ]+){1}\K(\S+)
- Match a single character present in the list below
[\t ]*
- Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
*
- The tab character
\t
- The literal character “ ”
- Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
- Match the regex below and capture its match into backreference number 2
(\S+[\t ]+){1}
- Exactly once (meaningless quantifier)
{1}
- Match a single character that is NOT a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed)
\S+
- Between one and unlimited times, as many times as possible, giving back as needed (greedy)
+
- Between one and unlimited times, as many times as possible, giving back as needed (greedy)
- Match a single character present in the list below
[\t ]+
- Between one and unlimited times, as many times as possible, giving back as needed (greedy)
+
- The tab character
\t
- The literal character “ ”
- Between one and unlimited times, as many times as possible, giving back as needed (greedy)
- Exactly once (meaningless quantifier)
- Keep the text matched so far out of the overall regex match
\K
- Match the regex below and capture its match into backreference number 3
(\S+)
- Match a single character that is NOT a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed)
\S+
- Between one and unlimited times, as many times as possible, giving back as needed (greedy)
+
- Between one and unlimited times, as many times as possible, giving back as needed (greedy)
- Match a single character that is NOT a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed)
- Match a single character present in the list below
- Or match this alternative (the entire group fails if this one fails to match)
(?!(?2)).*$\K\n
- Assert that it is impossible to match the regex below starting at this position (negative lookahead)
(?!(?2))
- Match the regex inside capturing group number 2 (subroutine call; restore capturing groups upon exit; do not try further permutations of the call if the overall regex fails)
(?2)
- Match the regex inside capturing group number 2 (subroutine call; restore capturing groups upon exit; do not try further permutations of the call if the overall regex fails)
- Match any single character that is NOT a line break character (line feed)
.*
- Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
*
- Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
- Assert position at the end of a line (at the end of the string or before a line break character) (line feed)
$
- Keep the text matched so far out of the overall regex match
\K
- Match the line feed character
\n
- Assert that it is impossible to match the regex below starting at this position (negative lookahead)
- Match this alternative (attempting the next alternative only if this one fails)
Explanation created with RegexBuddy