I recommend starting off with comments and literals. Then move into keywords. Put all three of those into separate contexts and include
the latter two into main
and the former into prototype
. Go from there. Start thinking about stateful constructs (e.g. class Foo extends Bar
) and provide higher-priority handling for them while retaining the single-keyword fallback (and ensure that you always have the - match: (?=\S)
escape on your stateful contexts!). Be sure to push shared regular expressions into variables. Think about what identifiers you want to scope (if any). Think about what symbols make sense to index (hint: it is conventional to only index top-level, first-class things like classes and functions). Overlap meta-scoping on top of all of this. Finally, think about auto-indentation rules and what meta-scopes you’re going to need to make that work.
Critically, write tests. For everything. Constantly. When you want to make a new construct work, put it in the test file first even if you haven’t written the scope assertions, then make it work from there. The test files are one of the best elements of defining syntax for Sublime. Make sure to write tests that are invalid or partially-valid as well. Remember that it’s an editor: people are typing (present tense) code into it, code which isn’t perfectly parseable on a character-by-character basis, and you need to make that experience seamless.
…and don’t forget the - match: (?=\S)
escape hatch! Ever.
Edit: Just as an aside… A lot of the core syntaxes were accreted more than designed. Some of this is a function of the fact that these languages have a lot of corner cases that were caught progressively over time, and some of it is the fact that some of them are only mostly-rewritten derivations of older tmLanguage definitions. Also, it didn’t help that many of these core syntaxes were rewritten in parallel with efforts to nail down conventional scopes and even the core sublime-syntax functionality itself, so some of them have ideas and structures which no longer make any sense because the ecosystem has evolved and matured. It’s definitely possible to write a very well-structured, clear, systematic language mode, it’s just that most (all?) of the core modes are not for broadly historical reasons.