I’m writing a syntax definition where I need to branch on two contexts, A and B. Context A is much more common, but usually wouldnt span more than 5-10 lines. Context B is much more rare, but theoretically could be as long as the user wants. An extra wrinkle is that Context A looks just like Context B until it sees an equal sign, in which case it’s unambiguously A.
I’m trying to balance what the docs say about performance with the branch construct, namely it says that the more common branch should come first, but also that it won’t backtrack more than 128 lines.
Here’s what I’m currently thinking:
- If I match A, then B:
- If the expression is actually A (the common case), it’ll parse as normal
- If the expression is actually B, it’ll parse the whole expr before failing, then reparse with B. If the expression happens to be longer than 128 lines, it will not backtrack (does this mean itll just stay parsed as A?)
- If I match B, then A:
- If the expression is actually A (the common case), it’ll fail pretty quickly when it sees the equal sign (within 5-10 lines, unless the user is intentionally doing something weird) and reparse with A
- If the expression is actually B, it’ll parse as normal
This breakdown makes me think parsing B first is better, but I’m not sure how bad performance will be since A is much more common (like 97% of the time).