(I apologize in advance to the extent that Iām telling you things you already know.)
One way to do this would be to let each context take a parameter. You would have a stack of [context, parameter] pairs; or, equivalently, a stack of contexts and a stack of parameters. Parameters could be (and usually would be) empty. Then, the matching rules of the context could depend upon the parameter. (This might be exactly what youāre suggesting.)
This is basically how the current system works, where captures are a sort of hidden parameter that the child context can access. But itās not properly implemented, probably because this is legacy tmLanguage behavior that I suspect was actually a bug to begin with.
This approach is incompatible with the Sublime strategy of (I assume) precompiling all of a contextās regexps into a DFA. Changing the parameter would require recompiling the context. I donāt know how long it takes to compile a single context; maybe itās not that bad, and maybe the vast majority of the work can be cached anyway. This depends on the internals of the custom regexp system. Dynamic recompilation might be fast enough.
The reason this isnāt a problem today is that backreferences automatically trigger the legacy regexp engine. A new, proper parameter implementation could do the same. The old engine is relatively slow, but thereās no compilation time. It may be faster than dynamic recompilation (not to mention easier on memory).
Any such implementation could be tremendously useful for quite a few languages. Heredocs are a great example, but you could also use it for tag matching in HTML/XML/JSX and possibly even to handle whitespace in Python. Wouldnāt that be something!
Hereās an example Iāve come up with. It features with_parameters
, which tells the engine to set a parameter on the pushed context. For XML tag matching:
contexts:
main:
- match: <(\w+)>
scope: open-tag.xml
push:
- with_parameters:
tag_name: \1
- meta_scope: meta.tag.xml
- include: tag-body
tag-body:
- match: </({{$tag_name}})>
scope: close-tag.xml
pop: true
- match: </(\w+)>
scope: invalid.illegal.unmatched
- include: main
For Python indentation:
contexts:
main:
match: ''
push:
- with_parameters:
indent: ''
- include: block
block:
- match: ^\s*$ # do nothing, empty line
- match: ({{$indent}}(?:\s+)) # increase indent
push:
- with_parameters:
indent: \1
- meta_scope: meta.block.python
- include: block
- match: ^{{$indent}}(?=\S) # do nothing, same indent
- match: ^ # decrease indent
pop: true
The best implementation would probably be a separate parameter stack. Every time a context is pushed onto the context stack, a parameter record (or null) would be pushed onto the parameter stack. If a context refers to a parameter using {{$param}}
, that context would be either use the old engine or be dynamically compiled. It would look up the value starting from the top of the stack. (If no value was found, perhaps an unmatchable value should be substituted.) This way, the parameter records are nice and immutable. In the usual case where a context neither pushed nor used any parameters, everything would work as it does now, and the overhead would be pushing a null value onto a stack. In cases where the new behavior was used, the parameter record overhead should hopefully be small and the context matching performance should be at least as fast as the current backreference system (or faster if dynamic recompilation worked).
I havenāt done the math, but offhand, I think this would augment Sublimeās parser to handle a variety of context-sensitive languages. It would be orthogonal to other means of improving the parser, such as adding nondeterminism.