Sublime Forum

Syntax branch help: Common + short vs Rare + long

#1

I’m writing a syntax definition where I need to branch on two contexts, A and B. Context A is much more common, but usually wouldnt span more than 5-10 lines. Context B is much more rare, but theoretically could be as long as the user wants. An extra wrinkle is that Context A looks just like Context B until it sees an equal sign, in which case it’s unambiguously A.

I’m trying to balance what the docs say about performance with the branch construct, namely it says that the more common branch should come first, but also that it won’t backtrack more than 128 lines.

Here’s what I’m currently thinking:

  • If I match A, then B:
    • If the expression is actually A (the common case), it’ll parse as normal
    • If the expression is actually B, it’ll parse the whole expr before failing, then reparse with B. If the expression happens to be longer than 128 lines, it will not backtrack (does this mean itll just stay parsed as A?)
  • If I match B, then A:
    • If the expression is actually A (the common case), it’ll fail pretty quickly when it sees the equal sign (within 5-10 lines, unless the user is intentionally doing something weird) and reparse with A
    • If the expression is actually B, it’ll parse as normal

This breakdown makes me think parsing B first is better, but I’m not sure how bad performance will be since A is much more common (like 97% of the time).

0 Likes

#2

If a branch doesn’t fail after within 128 lines it never fails. So if A is executed first it could be popped if equal sign is matched, but when to fail? If there’s no indicator B must be executed first and cause the branch to fail if equal sign is matched.

If context A is normally short enough performance shouldn’t matter too much.

Only if there are more than one possible solutions, means both A then B or B then A can be implemented you’d like to choose the more common one to be executed first.

0 Likes

#3

Let’s assume that A will fail when it reaches an empty line (^$). So it is possible to do A-B, but in the pathological case of B being 200 lines long, it would erroneously parse as A.

If context A is normally short enough performance shouldn’t matter too much.

Are you saying that if A is short enough (or more specifically, if getting B to fail is short enough), B-A is fine even though A is more common?

0 Likes

#4

Sure, why not. If the other way round is likely to fail highlighting correctly, options are limited. I’d always prefer a robust solution.

You can easily create some benchmarking files with lots of expressions of type A or B and compare results of Syntax Test - Performance. I do so very often with files of about 5k to 100k lines of code if I am in doupt of best performing solutions.

0 Likes