Sublime Forum

.sublime-syntax match "" bug?

#1

I get an unexpected “skip” or “black hole” after match: $\n when followed by a match ''.

What changes do I need to make to the below syntax-def to this work? I know (?=\S) would overcome the problem, but I don’t want to use (?=\S) because I need to track whitespace in this grammar.

Here is my test.

Text to highlight:
A[\n]
B[\n]

Expected result:
A: format is testval1.Alpha_1
B: format is testval2.Alpha_2

Actual results:
A: format is testval1.Alpha_1
B: format is text.plain.Alpha_3; the problem is that the Alpha_2 match [B] doesn’t appear to be attempted, though the read-position should be exactly before the “B” upon the entry into Alpha_2.

Here is the syntax def:

main:
  - include: Alpha_1

Alpha_1:
  - match: '(A)'
    scope: testval1.Alpha_1
  - match: '($\n)'
    scope: text.whitespace.Alpha_1
  - match: ''
    set: Alpha_2

Alpha_2:
  - match: '(B)'
    scope: testval2.Alpha_2
  - match: ''
    set: Alpha_3

Alpha_3:
  - match: (.)
    scope: text.plain.Alpha_3
0 Likes

#2

The - match: '' acts verymuch like an else in C/JavaScript. Hence Alpha_3 is set as soon as anything else than (B) is matched while Alpha_2 is on stack, which includes whitespace and eol.

Alpha_2 is set to the stack as soon as anything else than (A) or ($\n) is matched, which also includes whitespace.

Hence Alpha_2 is ignored as soon as you reach a whitespace.

But in the end one main question arises: Why do you want to use different contexts, if you just want to scope A and B differently?

May the following be your solution?

main:
  - match: '$\n'
    scope: text.whitespace.Alpha_1
  - match: 'A'
    scope: testval1.Alpha_1
  - match: 'B'
    scope: testval2.Alpha_2
  - match: \S
    scope: text.plain.Alpha_3
1 Like

#3

First, thank you for taking the time to look at my problem!

To answer your question, I need to separate the contexts because they are in fact different contexts. The whole semantics of the problem I am trying to solve are not visible here. I have simplified to the base case—but such requires that Alpha_? are in fact different contexts.

My question still remains. If the application of scope/state-change is based upon the left-most or in the case of equivalent matches, the top-most match, the something is wrong.

After the A and $\n are consumed in Alpha_1, the read head should be on the “B” and the state machine at Alpha_2, which should pick up from there. I am suggesting that the rules as stated in the doc are not being applied correctly, because the “B” is at the read-head but failing to match.

You mention whitespace, but there is no whitespace between the A$\n and the “B”. I am stuck because the match ‘’ is firing instead of the “B” when the read-head should be at the “B”.

Can you explain to me why the match “B” is being ignored when the read-head has just finished processing the “A$\n” and the state machine is in Alpha_2? The next character, is the “B” at that point.

Thanks again for your engagement on this important matter to me! :slight_smile:

0 Likes

#4

Matching stops when nothing more can be matched, not when all the text on the line has been matched.

Therefore, the lexer will be in state Alpha_3 by the end of the first line, before the ‘B’ has been encountered.

2 Likes

#5

The statement of post 1

I know (?=\S) would overcome the problem, but I don’t want to use (?=\S) because I need to track whitespace in this grammar.

and

You mention whitespace, but there is no whitespace between the A$\n and the “B”.

are somewhat contrary. So it is hard to guess, what you expect.

If there is no whitespace between (A) and $\n and both must match to move on to Alpha_2 - I guess on base of available information - you would need to change Alpha_1 to somewhat like

Alpha_1:
  - match: (A)($\n)
    captures:
      1: testval1.Alpha_1
      2: text.whitespace.Alpha_1
    set: Alpha_2

Otherwise anything in Alpha_1 not exectly matching (A) or ($\n) causes the lexer to immediatelly fall through to Alpha_3 due to your - match: '' commands in Alpha_1 and Alpha_2.

Can you explain to me why the match “B” is being ignored when the read-head has just finished processing the “A$\n”

You expect the lexer to “read-ahead” the whole document, thus seeing (B), if (A)($\n) was consumed. But this expectation is wrong.

ST splits the document into lines and then runs the lexer against each single line.

This means, all your rules are applied to the current line, until nothing more matches.

As (B) is on the next line, the lexer doesn’t see it if it is done with (A)($\n), but

But - match: ' always does. Hence you fall through to Alpha_3.

So you need to tell the lexer to look for (B) or any other line start.

main:
  - include: Alpha_1

Alpha_1:
  - match: (A)
    scope: testval1.Alpha_1
  - match: $\n
    scope: text.whitespace.Alpha_1
    set: Alpha_2

Alpha_2:
  - match: (B)
    scope: testval2.Alpha_2
  - match: ^   # line does not start with (B), go on to Alpha_3
    set: Alpha_3

Alpha_3:
  - match: (.)
    scope: text.plain.Alpha_3
0 Likes

#6

Thank you. You helped me greatly! This was the key insight I was missing. Between your comments and DEATHAXE’s excellent discussion I understand what is happening in my particular case.

0 Likes

#7

Use of the line-start (^) was the other piece of the puzzle. Thank you DEATHAXE, I greatly appreciate your investment on my problem!

0 Likes