Sublime Forum

Syntax highlighting question: popping scopes

#1

I am creating a syntax definition for the Stata language, and I’m having trouble getting the scopes right for a certain kind of embedded strings. Stata can enclose strings with double-quotes; it can also enclose string with so-called compound double quotes, which are opened with `" (backtick-double-quote) and closed with "’ (doublequote-apostrophe). Compound-double-quoted string can include double-quoted string. I want them colored differently, and I want double-quoted strings that are within compound-double-quoted string to take the color of double-quoted string.

I can scope them just fine with this:

double-quote:
  - match: (?<!`)"(?!')
    scope: punctuation.definition.string.begin.double.stata
    push:
      - meta_scope: string.quoted.double.stata
      - match: (?<!`)"(?!')
        scope: punctuation.definition.string.end.double.stata
        pop: true
compound-quote:
  - match: '`"'
    scope: punctuation.definition.string.begin.double.compound.stata
    push:
      - meta_scope: string.quoted.double.compound.stata
      - match: '"'''
        scope: punctuation.definition.string.end.double.compound.stata
        pop: true
      - include: compound-quote
      - include: double-quote

And everything works with the syntax highlighting:

But there are cases where a compound-double-quoted string can contain an unmatched double-quote character, like this:

`" This string " has an unmatched double-quotation mark"'

But this doesn’t get highlighted right, because the compound closing is not recognized:

So, I’d like to handle this one of two ways:

  1. Ideally, the double-quote within a compound quote will only trigger the double-quote scope if there is a matching double-quote to end the double-quote scope

  2. Less ideally, the "' that pops the compound-double-quote scope should also pop any added double-quote scope as well.

Any thoughts or advice?

Thanks!!!

0 Likes

#2

Maybe something like that?

%YAML 1.2
---
# See http://www.sublimetext.com/docs/3/syntax.html
file_extensions:
  - sata
scope: source.sata
contexts:

  main:
    - include: comment
    - include: string

  comment:
    - match: //.*
      scope: comment.line.double-slash.sata

  string:
    - match: '`"'
      scope: punctuation.definition.string.begin.sata
      push:
        - meta_scope: string.quoted.double.sata
        - match: '"'''
          scope: punctuation.definition.string.end.sata
          pop: true
        - include: string-interpolated

  string-interpolated:
    # match interpolated string, if closing double-quote exists.
    - match: '"(?=.*"(?!''))'
      scope: punctuation.definition.string.begin.sata
      push:
        - meta_scope: string.interpolated.sata
        - match: '"'
          scope: puntuation.definition.string.end.sata
          pop: true


Btw: You should avoid back-refs under all circumstances as they slow down highlighting much.

0 Likes

#3

Ah - thank you! I needed some extra code to get everything I needed, but that strategy really helped. In case anyone cares, here is my current effort.

compound-quote:
  - match: '`"'
    scope: punctuation.definition.string.begin.double.compound.stata
    push:
      - meta_scope: string.quoted.double.compound.stata
      - match: '"'''
        scope: punctuation.definition.string.end.double.compound.stata
        pop: true
      - include: compound-quote
      - include: nested-double-quote
      - include: single-quote

double-quote:
  - match: (?<!`)"(?!.*?"')(?=.*"(?!'))
    scope: punctuation.definition.string.begin.double.stata
    push:
      - meta_scope: string.quoted.double.stata
      - match: (?<!`)"(?!')
        scope: punctuation.definition.string.end.double.stata
        pop: true
      - include: single-quote

nested-double-quote:
  - match: (?<!`)"(?=.*[^`]"[^'](?=.*"'))
    scope: punctuation.definition.string.begin.double.stata
    push:
      - meta_scope: string.quoted.double.stata.nested
      - match: (?<!`)"(?!')
        scope: punctuation.definition.string.end.double.stata
        pop: true
      - include: single-quote

This succeeds for all my test cases:

One follow-up: is there a way to avoid the back-refs, given that I need to match compound-quoted delimiters separately from double-quoted delimiters?

0 Likes

#4

I don’t see why you need them at all. Sublime will match rules in the given order. As long as you look for "` before ", you should be fine.

0 Likes

#5

It is said all back-refs to be replaceable by stacking contexts correctly.

Indeed, I’ll just need to add two lines to my prior example to make it succeed on your provided test cases.

In a second step I joined string and `string-interpolated.

%YAML 1.2
---
# See http://www.sublimetext.com/docs/3/syntax.html
file_extensions:
  - sata
scope: source.sata
contexts:

  main:
    - include: comment
    - include: string

  comment:
    - match: //.*
      scope: comment.line.double-slash.sata

  string:
    - match: '`"'
      scope: punctuation.definition.string.begin.sata
      push:
        - meta_scope: string.quoted.double.stata
        - match: '"'''
          scope: punctuation.definition.string.end.sata
          pop: true
        - include: string
    # match interpolated string, if closing double-quote exists.
    - match: '"(?=.*"(?!''))'
      scope: punctuation.definition.string.begin.sata
      push:
        - meta_scope: string.interpolated.stata
        - match: '"'
          scope: puntuation.definition.string.end.sata
          pop: true
1 Like

#6

Modifying '"(?=.*"(?!''))' to '"(?=[^“]*”(?!’’))’` makes even the following expression work well.

9 `"compound `"with " <-- another stray embedded compound `" followed by another compound "' quoted "' quoted"'
0 Likes

#7

Deathaxe: this is beautiful, but unless I’m doing something wrong your code doesn’t work right for line 8 of my example. It is starting the string.interpolated scope at the stray " character because there is another " later in the line. But because that later " comes outside the compound string, it should not count.

0 Likes

#8

That fixed my 8, but doesn’t seem to work right on your 9.

0 Likes

#9

Try this one '"(?=[^`"]*"(?!''))', please. Markdown was eating the back tick.

1 Like

#10

Brilliant!

0 Likes

#11

btw back references are i.e. \1 that refer to the contents of capture groups that already matched, you’re discussing lookbehinds, which refer to text preceding the current match position. Using the correct terminology can help avoid confusion in the future :wink:

3 Likes

#12

Ah, thanks for the tip.

0 Likes

#13

OK, so it gets worse. There is one more scope definition that I’m not trying to integrate, and my efforts break what’s here.

The new one is single-quote, which in Stata are opened with backtick and closed with single-quote: \this is single quoted’` (without the backslash).

Here’s what I’m using:

single-quote:
  # matches ` unless followed by " or :
  - match: '`(?!")'
    scope: punctuation.definition.string.begin.single.stata
    push:
      - meta_scope: string.quoted.single.stata
      - match: (?<!")'
        scope: punctuation.definition.string.end.single.stata
        pop: true
      - include: single-quote

This has its own lookback, and also once I include this within the other string declarations, it breaks them too. Ugh!

0 Likes