…how do I remember how many things I have on the stack?
Well, by understanding and thinking like ST’s lexer - the part of the syntax engine which parses the text and creates the tokens using the patterns from a syntax definition.
I personally build mental routes in my head to track, what’s happening next when applying a push/set or pop operation.
It is probably too much to explain here, but basically most of it is about strategy how design structure of a syntax definition. You can do all sorts of weired things, ending in uncontrollable spagettie. That’s what early sublime-syntax definitions looked like.
Finally however - once you’ve written some for various different languages - you’ll recognize some sort of standard scheme most of them follow.
Keep it as simple as possible to be able to follow the tracks. That’s one reason why I’d suggest to keep using push
, set
and pop: 1
whenever possible. In rare cases pop: 2
might help, but anything else can and should be avoided as it increases complexity and easily causes you loosing track of what’s happening.
Given that I am still using main (to match “cat”) after overpopping, that must mean u can’t pop main right? So can I just overpop if I don’t know?
Yes, has a safe guard to prevent popping main
.
… push a dozen times in one slew of operations and ‘set’ one time u just need to ‘pop: 1’ I guess.
Not sure I understand that correct.
In the following example you’d add foo-body
and bar-body
. By using set baz-body
bar-body
is overwritten, so it still requires 2 contexts to be removed to immediatelly return to main.
main:
- match: foo
push: foo-body # adds foo-body to stack
foo-body:
- match: bar
push: bar-body # adds bar-body to stack
- match (?=\S)
pop: 1
bar-body:
- match: baz
set: baz-body # replaces foo-body by bar-body
- match (?=\S)
pop: 2
baz-body:
- match: ;
pop: 2
You can replace set
by push
and increase pop count, to get same result.
main:
- match: foo
push: foo-body # adds foo-body to stack
foo-body:
- match: bar
push: bar-body # adds bar-body to stack
- match (?=\S)
pop: 1
bar-body:
- match: baz
push: baz-body # adds bar-body to stack
- match (?=\S)
pop: 2 # from her needs to pop 2 contexts to return to main, immediatelly
baz-body:
- match: ;
pop: 3 # from her needs to pop 3 contexts to return to main, immediatelly
But generally speaking, this is a design, I’d try to avoid as messing with different pop: n
values makes syntax structure complicated and decreases re-usability of contexts.
A common scheme would be to use push...set...set...set...set...pop: 1
.
main:
- match: foo
push: foo-body # adds foo-body to stack
foo-body:
- match: bar
set: bar-body # replace foo-body by bar-body
- match (?=\S)
pop: 1
bar-body:
- match: baz
set: baz-body # replaces foo-body by bar-body
- match (?=\S)
pop: 1
baz-body:
- match: ;
pop: 1
Modern syntaxes also push multiple contexts on stack and pop them one after another to achieve tha same, but with maximum re-usability of contained contexts.
main:
- match: foo
push:
- baz-body # continue here if `bar-body` was popped
- bar-body # continue here if `foo-body` was popped
- foo-body # first one being evaluated
foo-body:
- match: bar
pop: 1
- match (?=\S)
pop: 1
bar-body:
- match: baz
pop: 1
- match (?=\S)
pop: 1
baz-body:
- match: ;
pop: 1
Which strategy is the best one depends on how detailed meta
scopes are to be applied, but that’s another lesson.
What does capture do? My current understanding is it matches each match’s capture groups to specific scope? So I can colour each specific part of a regex.
Correct. Matches of parenthesed groups (<pattern>)
are added to a list of results. Each item can be assigned a separate scope using captures
keyword followed by the index of the group.
Note however, If a pattern contains foo (bar)* baz
, and a text contains foo bar bar bar baz
, only the first bar
would ba addressed. All following are not assigned a scope.
Using your example, #
is scoped meta.preprocessor
and include
as keyword.control.include
.
- match: ^\\s*(#)\\s*\\b(include)\\b
captures:
1: meta.preprocessor.c++
2: keyword.control.include.c++
Note: scope names are odd here. #
is a punctuation and meta.preprocessor
should be applied to the whole line. So the following would make more sense:
- match: ^\\s*(#)\\s*\\b(include)\\b
scope: meta.preprocessor.c++
captures:
1: punctuation.definition.preprocessor.c++
2: keyword.control.include.c++
or even the following, if we want to skip leading whitespace.
- match: ^\\s*((#)\\s*\\b(include)\\b)
captures:
1 meta.preprocessor.c++
2: punctuation.definition.preprocessor.c++
3: keyword.control.include.c++
On the corona sdk syntax file it seems to be this series of captures that controls the ability to colourize the parameters, but it seems to be broken or incomplete.
Looks very much like a relic of old TextMate age, when pushing contexts was not possible. Back in the days syntax definitions used very complex regular expressions to try to find all sorts of language constructs.
Forget them, they are deemed to fail!
Instead try to understand the language in detail, read the specs if any and try to use as simple as possible regular expressions in match
statements using the power of push/set/pop to apply specialized sets of simple patterns to parse your code.
The closer a syntax definition follows official syntax specs the better chances are to maintain correct highlighting under all circumstances.
And it’s way more maintainable, even if number of contexts can be large.