Sublime Forum

Regex: Finds words repeated in multiple lines before “|” and after

#1

I have this lines with regex expressions, separated by |, by type Regex_A|Regex_B

(?s)((^.*)(<div class="entry-excerpt">)|(<!-- //.entry -->)(.*$))
(?s)((^.*)(<ul class="smallThumb-mainList">)|(<div class="navig">)(.*$))
(?s)((^.*)(word_2)|(<!-- //.entry -->)(.*$))
(?s)((^.*)(word_2)|(<!-- //.ambro34 -->)(.*$))

I want to find all those words\regex that are repeated before | and those that repeats after |

I try a regex, but doesn’t work too good: (?m)(.*)^(.*)\|(.*)(?=.*\1)

Help, please.

0 Likes

#2

I dont exactly understand what you want without any text excerpt. You should try backreferences (\1) though.

0 Likes

#3

hello bytie, I want a regex, so after search and replace to remain only one instance of:

So, a simple exemple:

Word_1 | Word_2
Word_3 | Word_2
Word_4 | Word_5
Word_4 | Word_6

In this case, Word_4 and Word_2 are repeated. So, I want after search to remain only this ones.

This is the output I expected: Word_4 | Word_2

0 Likes

#4

I should excuse, noticed you do know about backreferences.:sweat_smile:

If you’re using ST3 you could get an idea from something like (\w+)\s*\|\s*(\w+)\n(\1\s*\|\s*(\w+)|(\w+)\s*\|\s*\2), it parses a chunk of two strings and matches only if two adjacent strings match on 1st or 2nd word.

If you’re looking for a regex that will filter out words that dont repeat in whole text (by 1 pass) I think it is impossible, because the regex should somehow “know” along all the text when to put a word in a storage and when to ignore/discard it, and regexes aren that smart, you reserve variables in advance in the regex…

0 Likes