Sublime Forum

Regex, numeric, lazyness issue[solved]

#1

Hello dear forum,

I’m evaluating sublime text, and wanted to use so regexp.

It functionned really good so far, except for one expression that is making me crazy.

I does not pretend that my regex is fully valid, but it looks really strange. I’m using sublime text 3 build 3083 on a mac.

data sample:

<Object><field>mytextdataA</field></Object><Object><field>mytextdataB</field></Object><Object><field>mynumericdata1235</field></Object>

when using this regex:

<Object>.+?mytextdataA.+?</Object>

The following is selected, what looks fine to me :

<Object><field>mytextdataA</field></Object>

If now wanting to a numeric data data, with this regex :

<Object>.+?123.+?</Object>

the whole line is then selected :

<Object><field>mytextdataA</field></Object><Object><field>mytextdataB</field></Object><Object><field>mynumericdata1235</field></Object>

Is it know that the lazy identifier is not functionning as soon as there are near numeric values ?

is it a mistake on my side ? is it a problem on the regex parser ?

any advice more than welcomed :wink:

0 Likes

#2

tried as workaround to use :

.{0,}?123.{0,}?
.?123.?

nothing allow the proper selection to be done for now. I’ve checked the boost perl regex documentation, without luck so far.

Anyway, found something interesting, it’s not linked to numeric values… using this :

[quote]mynumericdata1235mytextdataBmytextdataA
[/quote]

as input data,will allow to select numeric data, it’s apparently linked to how the first litteral is backtracked… I guess it’s a user error, but I can’t find any proper way to get rid of this.

0 Likes

#3

Let’s simplify things, to see better what’s going on. Suppose the text that you’re searching is “xxay”. If you use the regular expression “x.+?y” there are two ways that the match can succeed. It can match the entire source text, with the “.+?” part matching the second and third characters, “xa”, or it can match the last three characters, ignoring the first “x”, with the “.+?” part matching only the “a”. That’s all: non-greedy matching only applies within a particular match. Once a match has succeeded it does not provide a rule for rejecting that match because some other match has a shorter substring for that particular part. The rule for choosing between valid matches is simply to choose the left-most one. So “x.+?y” matches all of “xxay”, because that’s the match that starts at the leftmost position in the string being searched.

EDIT: the practical consequence of this is that the search starts at the left end of the string being searched, and if a match is found, that’s the end of the search. If no match is found it tries again, starting at the second character in the string being searched, and so on.

To prevent matching “xxay” here, all that’s required is to eliminate “x” from the possibility of matching. The regular expression for doing that would be “x^x]+?y”; that says to match a letter “x” followed by one or more characters that are not “x”, followed by a “y”. So that would match the last three characters in “xxay”, and not all four.

To prevent swallowing up a “” inside the wildcard you need a negative assertion. I don’t use complex regular expressions much, so I can’t guarantee that this is right, but something along the line of “(?!..).+?123.+?”. The “(?!” opens the negative assertion, and the “)” closes it. It says that the match fails of there’s a “” between the “” and the “”. I think…

EDIT (As FichteFoll says): “(?!.*).+?123.+?” is sufficient.

0 Likes

#4

why dont you search for field tags instead of object tags? Because you need to select the object tags too?
In that case, why don’t you try being a bit more specific like the following:

yoursearch</field

Edit: just noticed pete’s post. Well :^)

0 Likes

#5

pete already explained it well enough, but you actually only need this:

<Object>(?!.*<Object>).+?123.+?</Object>

(?!..) is a negative look-ahead. It basically means here that the first .+? match should not contain the string “”.

0 Likes

#6

you are all wonderfull.

Negative lookahead was the key here, it’s fully functionnal in all cases.

I stopped to search due to the late time, and I’m glad I did this, as this morning, the solution was ready to use… impressive forum ! you saved my day :wink:

hope I will be able to return the favor anytime soon…

regards.

0 Likes