Sublime Forum

Performance of zero-width assertions in sublime-syntax regexes

#1

The internal implementation of Sublime’s internal regexes seems pretty straightforward, but I’m curious as to how zero-width assertions (^, $, \b, lookahead) are implemented. In particular, \b would seem to require a limited form of lookbehind. Many syntaxes use \b freely even when a less stringent assertion would do – for instance, in the JavaScript syntax:

- match: \bimport\b

This pattern is seen over and over. But in most cases, it could be replaced by:

- match: import(?!\w)

This is a looser assertion that, depending on the implementation, could be more performant. I haven’t been able to get good benchmarks yet.

Should we expect there to be a cost to unnecessary use of \b, compared to (?!\w) or no assertion at all? If so, this may be an opportunity to improve (however slightly) the performance of built-in syntaxes.

0 Likes

#2

in my experience, \b is more performant than a negative lookahead. \b before a word doesn’t need a lookbehind because the regex engine tracks whether the end of the previous match was a word character or not.

0 Likes

#3

in my experience, \b is more performant than a negative lookahead.

Do you mean specifically in sublime’s custom regex engine?

the regex engine tracks whether the end of the previous match was a word character or not.

Interesting! I had not heard this; is it documented somewhere?

0 Likes