The problem with looking ahead over line boundaries is that it would interfere with optimizations that I’m guessing Sublime uses to speed up highlighting.
When you modify a view, you have to re-highlight it. But we want the highlighter to work instantly, as fast as the user can type with no delay. To make this faster, we can exploit the fact that most changes to a view (such as typing a single character) will leave most of the highlighting unchanged.
To begin with, whenever we parse the view, we’ll cache the parser state for the beginning of each line. (The parser state is simply the context stack, which is an array of pointers.) Then, when the view is modified, we don’t start parsing from the beginning of the view – we start at the first modified line using the cached parser state for the beginning of that line. Then, after we have finished highlighting the modified region, we can look for a chance to stop highlighting early. We can stop if we are at the beginning of a line that is after the modified part and our current parser state is the same as the cached parser state for that line.
This way, we only have to re-parse the lines that are actually affected by the modification. If we’re editing a 100-line view and only one line changes, we literally save 99% of the work. The drawback to this approach is that it only works if the highlighting for each line does not depend on subsequent lines. That is, it only works if you can’t look over line boundaries. Remove that restriction and you can’t perform this optimization anymore.
I’ve thought about ways to allow limited lookahead while salvaging the optimization. One approach would be to record for each line the last preceding line that affects it. Then, rather than starting at the beginning of the modified line, you would start at the line it points to. To do this, the regexp engine would have to mark the latest line it sees with the index of the line it started on. It would be possible during syntax compilation to set a flag for any regexp that could cross a line boundary, so it might not be so bad. This approach would have the advantage of being transparent to syntax authors. Of course, if multiline regexps were abused, performance could be affected.
Personally, my dream is an enhancement that allows Sublime to tell the difference between these two:
(foo);
(foo) => bar;
No amount of lookahead will solve this problem, because foo
could be arbitrarily-complicated code that could be interpreted ambiguously as an expression or a formal parameter declaration. In order to parse this, we’d need nondeterminism.