Sublime Forum

Dev Build 3103: "custom regex engine" and problem with \x escape

#1

These two items in the changelog do not sufficiently explain what the changes are:

  • Added a custom regex engine that matches multiple regexps in parallel, for faster file loading and indexing
  • Improved Unicode support, including combining character rendering, character classification in regex searches, and case insensitivity in Goto Anything matching

Does the custom regex engine have any documentation?

It certainly has not improved Unicode support for me. I am now unable to use the hexadecimal escape \x and that’s destroyed most of my ability to work with Unicode.

  1. When I use \x, the search doesn’t find anything, and I get an error message (here I’m trying to find the letter “a” with its hexadecimal number 61).

Hexadecimal escape sequence was invalid. The error occurred while parsing the regular expression: ‘>>>HERE<<<\x{61}’. in regular expression \x{61}
.
2. I tried using \X, but the search now treats {61} as if it were the interval operator, and \X as if it were “any character.” The search highlights the text from my cursor to the 61st character after it:

Most of my job involves regular expression searches and non-Roman alphabets, and not being able to search for Unicode characters with hexadecimal escapes has made my job essentially impossible.

0 Likes

#2

Try \x61. Sublime either based on Oniguruma or using it so its rules should apply.

\X{61} BTW does what’s intented - \X matches any unicode character and number in brackets acts as quantifier.

0 Likes

#3

ST used boost’s PCRE engine in previous builds. I believe that this is still basically the case, but even if it wasn’t it should be using the same syntax. \X is mentioned there, but I don’t exactly understand why it would produce the result you showed, although it also should not produce what you expected.

0 Likes

#4

Note he has “Highlight matches” turned off (the button near input field).

1 Like

#5

For two-character hex numbers, \x61 and the like do work as expected.

However, three- and four-character hex codes do not.

This example has an endash, hex 2013, between the numerals 138 and 39. Searching for the endash using \x2013 didn’t find the literal endash - it found a space (the two-digit hex 20) and the literal characters 1 and 3:

Being able to search for two-character hex codes definitely improves my life, but the General Punctuation code block (including very common characters like quotation marks and dashes) and most non-Roman alphabets have 3- and 4-character codes, and I still haven’t figured out how to find those with the hex escape.

0 Likes

#6

Try with \x{2013} on build 3104. I can’t right now.

0 Likes

#7

There was a bug with boost regex that caused braced hex character escapes, e.g. \x{61} or \x{2013}, to not function properly. This was fixed in 3104.

Without braces, \x is always followed by 2 hex characters, hence the behavior you were seeing @aparker. Any number of hex characters other than 2 must be surrounded by {}.

Just as a note, the sregex engine is only used for syntax highlighting, not in the Find functionality.

1 Like

#8

YES! The braced hex escape \x{} does work in 3104.

Thank you, I can do my job again :slight_smile:

0 Likes