Sublime Forum

Sublime-Syntax From Scratch: Tips/Tricks?

#1

What tips, tricks and general guidelines would you give to someone who hasnt dabbled in .sublime-syntax ever?

I’ve read https://www.sublimetext.com/docs/3/syntax.html and https://www.sublimetext.com/docs/3/scope_naming.html, taken a peek at how C, C++, Ruby for example were implemented, but truth be told this is a bit over my head.

To me these syntax files seemed hacked together with no real rhyme or reason to them, on a tinker with it until it JustWerks™ basis.

What is the correct way to efficiently create the syntax? Eg:

  1. Think of every possible valid way of combining the specific language’s constructs
  2. Shove them in a syntax_test file
  3. Handle variables
  4. Handle operators
  5. Handle functions
  6. Handle classes
  7. etc etc etc

?

3 Likes

#2

I recommend starting off with comments and literals. Then move into keywords. Put all three of those into separate contexts and include the latter two into main and the former into prototype. Go from there. Start thinking about stateful constructs (e.g. class Foo extends Bar) and provide higher-priority handling for them while retaining the single-keyword fallback (and ensure that you always have the - match: (?=\S) escape on your stateful contexts!). Be sure to push shared regular expressions into variables. Think about what identifiers you want to scope (if any). Think about what symbols make sense to index (hint: it is conventional to only index top-level, first-class things like classes and functions). Overlap meta-scoping on top of all of this. Finally, think about auto-indentation rules and what meta-scopes you’re going to need to make that work.

Critically, write tests. For everything. Constantly. When you want to make a new construct work, put it in the test file first even if you haven’t written the scope assertions, then make it work from there. The test files are one of the best elements of defining syntax for Sublime. Make sure to write tests that are invalid or partially-valid as well. Remember that it’s an editor: people are typing (present tense) code into it, code which isn’t perfectly parseable on a character-by-character basis, and you need to make that experience seamless.

…and don’t forget the - match: (?=\S) escape hatch! Ever.

Edit: Just as an aside… A lot of the core syntaxes were accreted more than designed. Some of this is a function of the fact that these languages have a lot of corner cases that were caught progressively over time, and some of it is the fact that some of them are only mostly-rewritten derivations of older tmLanguage definitions. Also, it didn’t help that many of these core syntaxes were rewritten in parallel with efforts to nail down conventional scopes and even the core sublime-syntax functionality itself, so some of them have ideas and structures which no longer make any sense because the ecosystem has evolved and matured. It’s definitely possible to write a very well-structured, clear, systematic language mode, it’s just that most (all?) of the core modes are not for broadly historical reasons.

5 Likes

#3

this was helpful, thank you for your input

0 Likes

#4

in addition to @djspiewak’s excellent advice here, it’s also worth reading

3 Likes

#5

practical examples illustrating potential caveats, good read

0 Likes

#6

Since the OP mentions tricks, is there any trick to pop the current and the next context kn the stack at once?

Say from a match in context A I push: [C, B]. I want, from a match in B, to pop both B and C at once (or alternstively pop B and set D instead of C. Is any of this possible at all?

Oh, and a tip question: is there out there any toy syntax that defines blocks (and thus delimit meta scopes) with indentation? Studying haskell or python syntaxes seem too challenging for me at my current lwvel of expertise.

0 Likes

#7

I only wrote one syntax definition, for Clojure. Not sure if the advice makes any sense, but… Try to model the syntax structure the same way a compiler would parse it. The more closely you can match the actual parse tree, the more reliable your syntax will be. I did this for Clojure; the resulting syntax definition is very short in comparison to others, and 100% correct. This was probably enabled by the Lisp syntax, I dunno.

[1] Reference

2 Likes

#8

This is a generally good advice, but doesn’t need to result in short syntax definitions. Depending on language and the flexibility it offers to write different kinds of code structures, trying to act like a parser/compiler can be a very comprehensive task.

This is actually what most built-in syntax definitions are heading to. They try to identify each little part of the language as detailed and perfect as possible with even a lot of invisible meta scopes to possibly help plugins to make use of certain pieces of code.

The drawbacks of this approach are:

  1. The number of contexts being used increases, which starts hitting ST’s sanity limit of 25.000 allowed contexts as more and more combined syntaxes such as PHP include each other to highlight embedded code well.
  2. The growing complexity and the level of detail, the languages are parsed increases the risk of tricky edge cases or even bugs.
  3. Some entities of languages can’t simply be identified uniquely by a static code parser like ST’s lexer, which results in more or less guess work.

You should always keep in mind: The sublime-syntax is primarily for code highlighting but not for code analysis or linting.

P.S.:
When writing the CNC Sinumerik package, I tried to follow a strictly structured top-down-like approach. If you need some inspiration …

0 Likes

#9

The downside to this kind of approach is that parsers used in compilers are generally not amenable to error recovery. Additionally, if you structure your parser in the same way a compiler parser is structured, you will not be able to take advantage of set (as it is strictly more powerful than anything in a CFG), which is one of the most powerful tools sublime has for enabling robust handling of stateful constructs.

0 Likes

#10

I don’t think this is true. A PDA is typically defined analogously to set.

0 Likes

#11

It’s defined analogously to push. Assuming push and pop of the form that are supported by sublime-syntax, there is no way to emulate set within a PDA.

0 Likes

#12

Generally, a PDA is defined so that it pops on every transition, which is similar to always using set.

0 Likes