Sublime Forum

Inconsistent regex issue

#1

Hello,

I have a file that looks like this:

WU AND+EIA exalted (exalted+Slipstream) / enchantments (Curses+enchantments) / artifact (artifact+artifact) / graveyard (Disturb+Pressure) / control (control+control)
UB DHM+HLW control (control+control)
BR EIA+RAK crats (crats+crats)
RG HLW+AND stompy (buddy/Journey+4+ damage) / tempo (Journey+Keys)
GW RAK+DHM go-wide (go-wide+go-wide)
WB LDO+SOU plus counters (Renown+Assault) / artifact (Equipment+Civilized) / aggro (Renown+aggro) / control (control+control)
BG SOJ+TTR prowess (Spellbound/Roar+Heroic) / graveyard (graveyard+graveyard) / big mana (Revelation+Colossal)
GU SOU+KSV ramp (ramp+ramp)
UR TTR+LDO prowess (Heroic+prowess)
RW KSV+SOJ stompy (tappy+Revelation) / tapping (tapping+Convoke) / tempo (Tunnels+flyers) / aggro (aggro+aggro) / go-wide (go-wide+go-wide)

WU AND+EIA exalted (exalted+Slipstream) / enchantments (Curses+enchantments) / artifact (artifact+artifact) / graveyard (Disturb+Pressure) / control (control+control)
UB DHM+HLW control (control+control)
BR EIA+RAK crats (crats+crats)
RG HLW+AND stompy (buddy/Journey+4+ damage) / tempo (Journey+Keys)
GW RAK+DHM go-wide (go-wide+go-wide)
WB LDO+TTR plus counters (Renown+power>base) / tempo (Renown/flyers+flyers) / Equipment (Equipment+power>base) / aggro (Renown+power>base) / control (control+control)
BG SOJ+SOU graveyard (graveyard+graveyard)
GU TTR+KSV prowess (Heroic+spell in combat) / ramp (ramp+ramp)
UR SOU+LDO artifact (Civilized+Equipment)
RW KSV+SOJ stompy (tappy+Revelation) / tapping (tapping+Convoke) / tempo (Tunnels+flyers) / aggro (aggro+aggro) / go-wide (go-wide+go-wide)

...and so on, for about a million lines.

I am using regex to find and delete any ten-line entry that contains certain text. It works for certain permutations, but for others the execution ends to prevent ‘eternal matching’.
Working example:
(?<=\n\n)(.+?\n){2}BR EIA\+HLW.+?\n(.+?\n){7}\n
Failing example:
(?<=\n\n)(.+?\n){2}BR AND\+KSV.+?\n(.+?\n){7}\n

What I’m doing is entering the regex, then pressing ‘Find All’. First example finds about a 1000 entries, second is stopped and finds nothing.

a. Why would this be happening, if the only change is a few fixed chars?
b. Is there something I could do with the regex to make it better?

Thank you for any answers.

0 Likes

#2

Neither can be found in your sample. Making it hard to debug for you.

0 Likes

#3

I was mostly asking for an abstract answer. I am sure both example regexes would work on a sample of just two entries. I can upload the entire file later if it’s needed. I wasn’t certain anyone would want to download it. It’s like 70 MB iirc, can’t check right now.

0 Likes

#4

People need a minimal reproduceable case, not a 70MB file. By finding a minimal reproduceable sample, you probably find out the cause by yourself.

0 Likes

#5

I have been doing the same on other files without this issue for a week. For me, this is the minimal reproducible case unfortunately. I really wouldn’t know what to do to reduce it. That’s why I’m here.

0 Likes

#6

I wonder if removing the ? after .+ would make any difference, they serve no purpose when dot can’t match newlines. Perhaps it causes the state machine to blow up or something?

0 Likes

#7

That optimization has allowed a few more queries to work, but as of yet, not all of them. Weirdly, some of those that started working after the optimization yielded fewer results than the ones before. Perhaps the problem is in how sparse the results are?

0 Likes