Sublime Forum

[solved] REGEX: How can I select 3 consecutive lines that contains different words?

#1

bla bla
bla bla
WORD_1 something
WORD_2 baby girl
WORD_3 mercedes
bla bla

Well, the simple method is to use: |
Like this: (^.*)WORD_1.*$|(^.*)WORD_2.*$|(^.*)WORD_3.*$ (but this works even the lines are not consecutive)

Can anyone know a small formula, and to select strictly those 3 consecutive lines?

I try with +, but doesn’t work: (^.*)(WORD_1+WORD_2+WORD_3).*$

0 Likes

#2

The | operator is for alternation; it means “match this or that”. In your example that means you’re telling it to match a line that starts with WORD_1, WORD_2 or WORD_3. Not only is it matching them even if they’re not consecutive, it would also match them if they’re out of order or appear multiple times.

The + operator means “match the thing that comes before me one or more times” (as opposed to * which means "match the thing that comes before me zero or more times). In your example you’re telling it to find the words WORD_1, WORD_2 and WORD_3 all together (with nothing in between them), where each one can appear multiple times, but they have to appear in that order.

Something like this will do what you want (more or less, see below):

(?s)(^.*)WORD_1.*(^.*)WORD_2.*(^.*)WORD_3(.*)^

This works because the (?s) part is telling the regex engine that the . character should also match a newline character because normally it does not. The rest of the example is similar to what you tried originally.

Notice that we no longer include any $ characters because they don’t mean anything anymore; we are telling the system that the . matches a newline, so there is no more “end of line” for the $ to match.

The regex ends with the ^ character for this reason. If you leave it off, the regex will match everything else to the end of the file. Adding in the ^ at the end stops the match when it gets to the start of the next line.

This means that if this sequence appears as the last 3 lines in the file, the regex will no longer match.

In theory your original example would word if you removed the | characters and added (?m) to the start of the regex to tell the regex engine that it should treat the input as multiple lines. However, Sublime doesn’t seem to honor that, so it doesn’t work here.

0 Likes

#3

Or alternatively, this also works (although I would swear I tried it first and it did not):

(^.*)WORD_1.*\n(^.*)WORD_2.*\n(^.*)WORD_3(.*)\n

This just matches the newlines instead of using a $ to match the end of the line, which is a lot cleaner. Also it works regardless of the position of the three lines in the file.

1 Like

#4

I modify a little bit your first regex, now works beautiful, thanks

(?s)WORD_1.*(?s)WORD_2.*(?s)WORD_3(.*)^

0 Likes

#5

I’m not sure if it’s true, but I’ve read that it’s better to use [^\n] instead of .. It does the exact same thing (unless you specify DOTALL or something).

1 Like

#6

hello, I don’t get it. Where should I replace with [^\n] in the example below?

0 Likes

#7

Everywhere where there’s a point.

. means everything except \n. the same as [^\n]

(?s)WORD_1[^\n]*(?s)WORD_2[^\n]*(?s)WORD_3([^\n]*)^

But don’t worry, because, as said @OdatNurd the (?s) means that the point matches anything… It was just a bit of extra info :slightly_smiling:

0 Likes

#8

at my place, your regex doesn’t work. :frowning:

0 Likes

#9

yes, because [^\n] matches anything except a \n. And your regex has (?s) which means that you want the . to match everything

"(?s)." => anything
"." => "[^\n]" => anything except "\n"

But, you, you want to match the \n.

Sorry for bringing confusion… Just ignore my messages in this topic, it’s not important.

0 Likes

#10

good to know. Thank you.

1 Like