I’ve finally managed to achieve the goal. It wasn’t simple (actually, a nightmare), but thanks to your advise I’ve managed to accomplish it.
The problem was indeed tied to uncosumed whitespace: the else_POP
and immediate_POP
tricks to force your way out of a stacked context can cause a premature pop if there is still some whitespace floating around which isn’t captured by the various included contexts. At the end, I had to ensure that the pattern that would set
the context on the stack would also eat up (and dicard) any trailing whitespace.
Also, another problem was the END EVERY
construct having two optional trailing keywords (identifier and dot-terminator). To prevent loose scopes floating about, I had to implement an extra check for an END EVERY
statement followed by only whitespace (ie: neither ID nor terminator).
Here I publish the final code of how I’ve managed it (just in case it might turn out a useful reference to others with similar problems):
class:
- match: (?i)\bEVERY\b
scope: storage.type.class.alan
set: [class_body, class_identifier]
class_body:
- meta_scope: meta.class.alan
- include: class_head
- include: class_tail
class_head:
- match: (?i)\bIsA\b
scope: storage.modifier.extends
push: inherited_class_identifier
# TODO: inheritance
class_tail:
# ===========================
# END EVERY => no ID & no `.`
# ===========================
# When END EVERY is not followed by neither ID or dot, we must capture it
# separately to avoid stray scopes after it...
- match: (?i)\b(END\s*EVERY)\b(?=\s*)$
captures:
1: keyword.control.alan
pop: true
# ==========================
# END EVERY => ID and/or `.`
# ==========================
- match: (?i)\b(END\s*EVERY)\b\s* # <= must consume all whitespace!
captures:
1: keyword.control.alan
set:
- meta_content_scope: meta.class.alan
- include: class_tail_identifier
- include: terminator
- include: force_POP
terminator:
- match: '\.'
scope: punctuation.terminator.alan
The lesson I’ve learned from tackling with this problem is:
- Always beware of any leftover whitespace from regex match patterns:
- Try to consume trailing whitespace by capturing it with a discarded group
-
captures
is better than scope
because it allows to add extra discarded groups for testing if leading/trailing whitespace is a problem
- Lookaheads are your best friends when it comes to handle optional syntax elements at the end of a
meta
scope
- While working on a syntax’s context:
- Annotate the stack level in side comments (it’s so easy to loose track of how deep in the stack each context and included statements are)
- For reusable syntax-elements contexts, consider creating both a popless version and another one that pops out of the stack (sometimes you might be including them, other times pushing/setting them)
- In lack of context-stack debugger:
- add some arbitrary label to scopes in order to be able to track in the highlighted code which context is active (when there are variants) — eg: by using
keyword.control.NONE.alan
and keyword.control.IdOrDOTalan
I was able to uncover the leftover whitespace problem which was causing the wrong context to be used.
I’ve learned these small lessons the hard way, by running in circles for hours because I wasn’t mind-tracking correctly the stack levels. Also, I struggled a lot with include
vs push
vs set
choices, trying to adpat the context to my own likings and pre-existing reusables, which turned out to be a very bad approach.
When starting to deal with lots of reusable contexts, and contexts nestings, it can quickly become a complex task to keep a clear mental picture of what is actually going on at the parser level. Unfortunately, we can’t escape the unpleasant task of having to mentally track what RegEx patterns are capturing, consuming, discarding and how the various stacked contexts loop until they pop out.
If ST had a way to expose to the user the syntax parser’s stack state, its ques, and some debug info about the text being processed, the regexs matches and failures, it would be much easier to trace where our custom syntaxes fail.
Any chance that (somewhere in the future) ST might also ship with a command line tool to debug syntax definitions? A console app that takes a syntax file and a source test-files as input and spits out two files: a scoped version of the souce file (an XML like doc tree) and a log file listing all the innerworkings of the parser engine. This would be an invaluable tool to both learn how to build syntaxes as well as to fix problems.