Sublime Forum

Sublime-syntax add support for peeking/doing lookahead to the next line

#1

After trying a few days to figure out the way to make a syntax for a language that a) is context-sensitive and b) uses significant indentation to delimit blocks of code, I’ve come to the conclusion that this is currently impossible, or at least beyond my capabilities.

The task would be made possible, though, if we could do lookahead to the next line. Let the fake syntax (??x) mean “he next line starts with x”; using this in a match would let me know simultaneously whether I should expect more, less or the same indentation and wether I need to add a new context (which one would be based on the ending of the current line).

Currently, I can’t find a way, at the start of the line, to know the ending of the previous line and the indent difference between the current line and the previous one. I can either know the indentation difference (via push and pop backreferences) or know the ending or the previous line (via setting a context at the end of the line).

Note: I could also be able to handle it by instead being able to peek from the start of the previous line, if implementing that is technically simpler.

0 Likes

#2

For your precise use case I don’t think you need additional features: I had to implement something similar and basically when you push a context, if you capture the indentation, then you are able, using backreference and negative look-ahead, to know when to pop the context.

0 Likes

#3

Could you provide an example? I can’t use lookbehind of inxefinite length so I don’t really know hoe to do it without that.

I have a context where I capture the initial space. Then to actually process the line contents I have to either set another context, loosing the bacreference, or push another context, but then I can’t pop it and immediately set another…

Currently I’m doing the brute-force thing of code-generating syntaxes: mysyntax_1tab, mysyntax_2tab… And so on, but this surely is not a good way to do it.

Could you share the code you used to accomplish that?

0 Likes

#4

I cannot share the whole stuff, but I think just one context should be good enough for you.

Basically I can have an entry description : everything indented compare to the keyword is part of this context and as soon as something is at same or lower indentation level I pop out

  description:
    - match: '^(\s*)(?:(\w+)(\.))?(description)\s*:'
      captures:
        2: variable.other.rif
        3: keyword.operator.rif
        4: keyword.language.rif
        5: keyword.operator.rif
      push:
        - meta_scope: meta.description.rif
        - match: '^(?!\1\s+)'
          pop: true
        - match: '.*'
          scope: string.unquoted.rif
0 Likes

#5

I really thank you, but sadly, that doesn’t suit me. In my case, the line may contain up to it’s end recursive structures that need colouring. That’s why I mention that I need to either push or set a second context for the end of the line. and since the cholice of context to push for the next line depends on the contents of the first line (parsed not by the the context that captured the initial indentation, but by the second one), I can’t use the backreference trick, it seems.

0 Likes

#6

I also have the case where this context is included in other context that do the exact same mechanism, I don’t see any issue.

0 Likes

#7

Weird. When I use a context to match, say, (\s+)(?=<something else) and then set another context to parse the rest of the line, then I can no longer use \1 to backreference the indentation.

0 Likes

#8

I do not use set in my case, just push, this might the issue

0 Likes

#9

We’ve discussed some ideas about how to solve the “next line” problem with syntax definitions, but don’t have anything to share yet.

0 Likes

#10

Actually my problem could also be solved by being able to peek back from the start of the current line, which I guess would be less cumbersome to implement.

0 Likes

#11

I also would like an option where the entire document is treated like a single line and/or have access to the stack of rules. This would help when trying to highlight syntax where components could span multiple lines. I think it could also permit style checking and linting.

0 Likes

#12

The problem with looking ahead over line boundaries is that it would interfere with optimizations that I’m guessing Sublime uses to speed up highlighting.

When you modify a view, you have to re-highlight it. But we want the highlighter to work instantly, as fast as the user can type with no delay. To make this faster, we can exploit the fact that most changes to a view (such as typing a single character) will leave most of the highlighting unchanged.

To begin with, whenever we parse the view, we’ll cache the parser state for the beginning of each line. (The parser state is simply the context stack, which is an array of pointers.) Then, when the view is modified, we don’t start parsing from the beginning of the view – we start at the first modified line using the cached parser state for the beginning of that line. Then, after we have finished highlighting the modified region, we can look for a chance to stop highlighting early. We can stop if we are at the beginning of a line that is after the modified part and our current parser state is the same as the cached parser state for that line.

This way, we only have to re-parse the lines that are actually affected by the modification. If we’re editing a 100-line view and only one line changes, we literally save 99% of the work. The drawback to this approach is that it only works if the highlighting for each line does not depend on subsequent lines. That is, it only works if you can’t look over line boundaries. Remove that restriction and you can’t perform this optimization anymore.

I’ve thought about ways to allow limited lookahead while salvaging the optimization. One approach would be to record for each line the last preceding line that affects it. Then, rather than starting at the beginning of the modified line, you would start at the line it points to. To do this, the regexp engine would have to mark the latest line it sees with the index of the line it started on. It would be possible during syntax compilation to set a flag for any regexp that could cross a line boundary, so it might not be so bad. This approach would have the advantage of being transparent to syntax authors. Of course, if multiline regexps were abused, performance could be affected.

Personally, my dream is an enhancement that allows Sublime to tell the difference between these two:

(foo);
(foo) => bar;

No amount of lookahead will solve this problem, because foo could be arbitrarily-complicated code that could be interpreted ambiguously as an expression or a formal parameter declaration. In order to parse this, we’d need nondeterminism.

1 Like