Sublime Forum

Syntax Fun


Just a heads up that the next dev build will be taking a little longer than normal to come out. I’ve been mostly doing weekly releases (ignoring bug fix versions, anyway) recently, but the next one won’t be out likely until next week, perhaps towards the end of this week.

I was doing some misc work that will make accepting community contributes for the default syntax definitions more practical (e.g., adding support for unit testing syntax defs), but got a little carried away, so the next build will have support for an alternative syntax definition format, .sublime-syntax, in addition to still supporting .tmLanguage files.

At its core, it still uses the same lexing engine that Sublime Text already has, so it’s fundamentally built around lexing tokens using regexes, one line at a time. The main difference is having direct control over the lexer stack, so it’s feasible to recognize compound constructs without having to make the regexes do all the work. This also enables handling syntax spread over multiple lines. For example, it’s now able to recognize this C code as defining a symbol named “point”, which isn’t possible with the current system:

typedef struct
    int x;
    int y
} point;

It also has a cleaner (IMO) way of including languages within one another, and a more friendly surface syntax to work with (currently JSON, but I’m planning on changing that to YAML before release).

NB., for those with exceptional memories, this isn’t related to the .sublime-syntax format that the very first public versions of Sublime Text used, although it does share some concepts, it’s really just reusing the file extension.

I’ll post full details alongside the next build.


Dev Build 3084
Debugging bad sublime-syntax

Great !









Very interesting…and unexpected. I am excited to see if this makes coding up languages easier and better. I would really like to finally fix a number of languages: C/C++, Yaml, etc. Current system is too weak to do these languages well.



Are you intending to allow YAML for other configuration files as well? That would be nice.



wow, excellent to hear about it!! The effort is much appreciated, even if it takes time or is delayed by other inspiration :slight_smile: Thanks



Excellent - I’m really looking forward to this!

While we’re talking about syntaxes - what is planned for amending the built in ones? At least for front end development, they have fallen really out of date. I teach web development to beginners and it’s always a point of confusion when the syntax highlighting doesn’t work or is reporting false negatives.

CSS syntax doesn’t support CSS3 or media queries - the language has really evolved, this one is great:
Also, Most developers now use Sass, SCSS, Stylus or LESS - they should come default.

JavaScript is really changing and we need support for ES6 / ES2015 — … 6%20Syntax

CoffeeScript has gained enough popularity to include by default. Most users search for “coffeescript” in package control and install an old version of coffeescript, the most up to date one is … ffeeScript

So yeah - excited about these updates and glad it’s getting some attention! :smiley:



I’m super excited!



Jon, this is awesome!

Out of interest, are here any speed improvements with the new system?

I do, I do!



This is great news. I echo some of the points raised above, especially about some syntaxes being out of date. I’ve especially noticed this for web development (echoing wesbos’ comments), for Java and C++. For some languages, eg. Javascript, the current system is too weak to be able to correctly and precisely identify all language construct possibilities, leading to limitations in both syntax colouring and symbol indexing.

  1. Can we expect improvements here to ripple through to symbol indexing in the case where .sublime-syntax are created to reflect modern syntax grammars?
  2. Are there any expected performance hits or benefits with this new system?
  3. Will there be a straightforward way to port tmLanguage files over to the new YAML/JSON formats to use as a starting point for enhancement?
  4. Are any APIs planned to allow extensions access to Sublimes syntax database? This could hugely simplify the creation of plugins that do code refactoring etc., making features unique to IDE’s like phpStorm a real possibility.

It’s good to see this aspect of Sublime being (potentially) brought up to date; hopefully we are moving toward the place where colouring, indexing and also completion suggestions for common languages/syntaxes are both up to date and contextually accurate (neither being so at present).

ps. Any other big changes/features in the pipeline? :wink:



At this stage, no. YAML if a great format if you’re familiar with it, but the syntax can be quite unintuitive at times (e.g., exactly when a string needs to be quoted is not always obvious). I’m not ruling it out, it’s just something I’m approaching with caution.

Bringing the syntax definitions up-to-date is the overreaching plan. I just didn’t want to inflict any more plist xml editing on anyone :slight_smile:

I don’t know yet. By default, no, but there are some approaches I want to reevaluate which may yield some improvements.

The single biggest performance hurdle is the way regexes are written, some of the in the wild tmLanguage files have regexes written in such a way that Oniguruma is forced to spend much more time evaluating them that it should. The single best thing that could be done is to write a syntax profiler tool, which would report how much time is spent in each regex. I’ll look at doing this when things are a bit further along.

Yep. The sublime-syntax files still do fundamentally the same thing, assigning scopes to the text, so the changes flow through naturally to symbol extraction.

At thing stage I expect it to be broadly similar.

I’ll look into it. The format is semantically a superset of the tmLanguage one, so in principle a tool could do an automatic conversion. Things may start to get a bit hairy when file includes are involved though.

Beyond the current APIs for this, there’s nothing planned. Gut feeling so far is that when you try to properly recognize a language’s grammar in the syntax files, it creates more problems than it solves, as suddenly error tokens are appearing everywhere as you’re typing with the file in a temporarily invalid state.

For example, the in progress C++ syntax knows that for statments are of the form:

"for" "(" <expr> ";" <expr> ";" <expr> ")"

So the natural thing to do is to mark up code such as “for (int i;)” as invalid on the close bracket, due to the missing semicolons, but in reality is this is just annoying. Currently it takes the more conservative approach of only showing an error if you have too many semicolons in your for statement, rather than not enough.

Yes :slight_smile:



Jon, this is mega awesome! I’m so excited to see such a huge change in Sublime, just for the future implications alone.

I’m pumped to see what other things you’re bringing to Sublime too :smiley:



Like this kind of teasing! :smiley:



Guys, considering that Jon is in Australia, you do realize that by the time this thread was started it was already 1st of April, yeah?

How crazy would be to see Jon tomorrow shouting: HA HA HA!! Gotcha yaaaa!!! :smiling_imp:



That would be such a mean joke to pull. In a way you would be admitting: hey you know that issue you really want fixed? HA, I’m not doing it!

But you never know…



I was thinking more about the situation of being able to easily build tools that are “language aware”, like extracting code to a new function, or jumping to the next function, beuatifying etc… This can already be done via plugins, but it’s left to the plugin writer to analyse the text from scratch; AFAIK the only thing that’s available to the plugin writer is knowledge of scope at a given cursor position. That means a lot of text analysis, probably using regexes, that’s repeated in various ways across plugins and varies depending on syntax specifics. What I have in mind is being able to hook into Sublime’s db to, for example, jump to the next logical position for a given scope - eg. next function, insert after, select previous etc. for given scopes, at the current level and up. Something akin to jQuery’s DOM selectors and manipulation tools like .after() insert().

Anyway, thanks for the detailed reply. And I hope that others are incorrect about “Yes” being an April 1st thing :wink:



New version still having sidebar issues.

I rely heavily on the sidebar, it would be very difficult for me to navigate my source trees for my various projects using the goto anything method and i generally have 3 or 4 different but related projects open at the same time. The sidebar however has had an issue for a number of versions wherein it will no longer display the project tree. Sometimes this happens at launch, right now, if i shrink down the project tree then re-expand it the expansion has zero entries. I can get the entries back by quitting and relaunching. if i have sub dirs in my tree (and i almost always do) they expand to an empty list.

I can resolve this entire issue temporarily by deleting my sublime project and sublime workspace files and recreating them which is a major pita because eventually im just going to have to do it again… and again…



View.find_by_selector should let you do these things. It’ll return all regions matching the given selector, so you’ll need to filter them yourself. I’m open to adding more functions along these lines if needed.



One thing that I have found confusing in the API is the weird behavior of:


This call will give you a region, that sometimes doesn’t really make sense. Then if you get the scope of your current point and then call find_by_selector, you get more realistic regions:

scope_name = self.view.scope_name(pt) self.view.find_by_selector(scope_name)

I would expect that extract_scope would give the same region found in find_by_selector, but that isn’t really the case. So to get a better region, I often have to use find_by_selector and loop through the results until I find a region that intersects my cursor.

Another oddity, is when the new EOF change. In general it makes sense, but when I request the scope at EOF, it will give the scope of the region before it, but when you call find_by_selector, EOF doesn’t appear in any of the regions. These inconsistencies cause plugin writers to add what I feel is unnecessary code to get what they want. I guess if the cursor is at EOF, a plugin could just ignore the scope or skip looking at the scope all together, but I find the behavior confusing.