Sublime Forum

Syntax highlight capture scopes should apply to all captures of a given group

#1

I’m very new to sublime text 3, and I’m starting to play around with syntax highlighting. One thing I’ve found, which I don’t wether is a feature or bug (but please tell me if Iyou know) and that yet is really counterintuitive, is that in a match pattern with quantifiers on groups and context assigned to groups within the “captures:” section, the scope will only be assigned to the last capture by the group.

As an example, say I have a syntax with the following pattern
- match: "(blah,\\s*)++"

which then assings constant.language to capture 1.

If I apply it to the text blah, blah, blah,, only the last “blah” will be highlighted, whereas I wish the three of them were.
What do you think? Should that be included as a feature:

0 Likes

A Little Syntax Regex Issue
#2

pretty sure it’s not a bug, it’s probably done for performance reasons - if you need to achieve that you can normally leave it with no repetition as long as the context doesn’t change, it will then find and scope all instances naturally.

it’s the same reason why I used a lookahead in the Markdown syntax for thematic breaks, so I could scope the non-whitespace characters:

1 Like

#3

If the Sublime regex engine is implemented like I think it is, then this is a fundamental limitation. Each regexp has a fixed number of positions it needs to save; capturing multiple instances of the same group would require dynamically keeping a list for each capture group for each regexp. This would be a significant change to some fairly tricky internals and could adversely affect the performance of syntax highlighting.

It should be possible to accomplish the same task without that feature. In your example, simply remove the ++ quantifier and the engine will match each of the three occurrences separately. I generally find that I don’t use many capture groups when I write a syntax because they’re less powerful and flexible than using contexts.

Also, some recommendations:

  • Use single quotes instead of double. That way, you don’t need to escape the backslash: '(blah,\s*)++'.
  • Avoid possessive quantifiers like ++. The fast regexp engine doesn’t support them (I think), which would cause Sublime to fall back on the original, slower engine. You should be able to accomplish the same goal without them.
  • The whitespace probably shouldn’t be scoped at all, while the comma should probably be punctuation.separator.comma.

I realize what you posted is just an example, but I thought I’d mention these things just in case.

0 Likes

#4

I know I could match each occurence separatedly. However in my current use case I have reasons to want to match a line in one match: my syntax is for an indent-based language and so I’m using the backreference trick in the “heredocs example” section of the syntax documentation to highlight wrong indents as illegal. But surely I could find a workaround by pushing an anonymous context and popping it just before the end of the line or something.

As for your recommendations:

-Thanks for the single quote tip! I did’t know that and that’s indeed really helpful for readability. Can any regex string by enclosed in ‘’ instead of “”?

-Wow, it’s pretty counter-intuitive that possessive quantifiers would harm performance, when they are meant to increase it by avoiding pointless backtracking (they are pretty natural to me to use at this point, but I can re-teach myself. Do atomic groups also cause to fallback to the slower engine? And whhere could I find documentation on this issue? The syntax doc page only refers to the regex using the Ruby syntax and a link the a specification of the onigoruma engine -I guess that’s the slower one since it explicitly mentions possesive quantifiers. Do you know which one is the faster one?

-You are totally right on the last point, but it was only a dirty example to get the point across.

0 Likes

#5

Yep. In a lot of cases, you don’t need quotes at all, but when you do single quotes are preferable.

Regarding Sublime’s regexp engine, see my comment here.

0 Likes