Sublime Forum

Dev Build 3085

#21

Sure. This is for JSON. You will notice that escape chars such as \t will scope as invalid instead of normal escape chars because the regex engine will not recognize (?x: some syntax ). If you adjust those to use (?x) instead, it starts to scope proper.

[code]%YAML 1.2

http://www.sublimetext.com/docs/3/syntax.html

name: JSON
file_extensions:

  • json
  • sublime-settings
  • sublime-menu
  • sublime-keymap
  • sublime-mousemap
  • sublime-theme
  • sublime-build
  • sublime-project
  • sublime-completions
  • sublime-commands
    scope: source.json
    contexts:
    main:
    • include: value
      array:
    • match: ‘’
      captures:
      0: punctuation.definition.array.begin.json
      push:
      • meta_scope: meta.structure.array.json
      • match: ‘]’
        captures:
        0: punctuation.definition.array.end.json
        pop: true
      • include: value
      • match: “,”
        scope: punctuation.separator.array.json
      • match: ‘^\s]]’
        scope: invalid.illegal.expected-array-separator.json
        comments:
    • match: /**
      captures:
      0: punctuation.definition.comment.json
      push:
      • meta_scope: comment.block.documentation.json
      • match: */
        pop: true
    • match: /*
      captures:
      0: punctuation.definition.comment.json
      push:
      • meta_scope: comment.block.json
      • match: */
        pop: true
    • match: (//).*$\n?
      scope: comment.line.double-slash.js
      captures:
      1: punctuation.definition.comment.json
      constant:
    • match: \b(?:true|false|null)\b
      scope: constant.language.json
      keyString:
    • match: ‘"’
      captures:
      0: punctuation.definition.string.begin.json
      push:
      • meta_scope: string.quoted.double.key.json
      • match: ‘"’
        captures:
        0: punctuation.definition.string.end.json
        pop: true
      • match: |
        (?x: # turn on extended mode
        \ # a literal backslash
        (?: # …followed by…
        "\/bfnrt] # one of these characters
        | # …or…
        u # a u
        [0-9a-fA-F]{4} # and four hex digits
        )
        )
        scope: constant.character.escape.json
      • match: \.
        scope: invalid.illegal.unrecognized-string-escape.json
        number:
    • match: |
      (?x: # turn on extended mode
      -? # an optional minus
      (?:
      0 # a zero
      | # …or…
      [1-9] # a 1-9 character
      \d* # followed by zero or more digits
      )
      (?:
      (?:
      . # a period
      \d+ # followed by one or more digits
      )?
      (?:
      [eE] # an e character
      ±]? # followed by an option +/-
      \d+ # followed by one or more digits
      )? # make exponent optional
      )? # make decimal portion optional
      )
      comment: handles integer and decimal numbers
      scope: constant.numeric.json
      object:
    • match: ‘{’
      comment: a JSON object
      captures:
      0: punctuation.definition.dictionary.begin.json
      push:
      • meta_scope: meta.structure.dictionary.json
      • match: ‘}’
        captures:
        0: punctuation.definition.dictionary.end.json
        pop: true
      • include: keyString
      • include: comments
      • match: “:”
        captures:
        0: punctuation.separator.dictionary.key-value.json
        push:
        • meta_scope: meta.structure.dictionary.value.json
        • match: ‘(,)|(?=})’
          captures:
          1: punctuation.separator.dictionary.pair.json
          pop: true
        • include: value
        • match: ‘^\s,]’
          scope: invalid.illegal.expected-dictionary-separator.json
      • match: ‘^\s}]’
        scope: invalid.illegal.expected-dictionary-separator.json
        string:
    • match: ‘"’
      captures:
      0: punctuation.definition.string.begin.json
      push:
      • meta_scope: string.quoted.double.json
      • match: ‘"’
        captures:
        0: punctuation.definition.string.end.json
        pop: true
      • match: |
        (?x: # turn on extended mode
        \ # a literal backslash
        (?: # …followed by…
        "\/bfnrt] # one of these characters
        | # …or…
        u # a u
        [0-9a-fA-F]{4} # and four hex digits
        )
        )
        scope: constant.character.escape.json
      • match: \.
        scope: invalid.illegal.unrecognized-string-escape.json
        value:
    • include: constant
    • include: number
    • include: string
    • include: array
    • include: object
    • include: comments
      [/code]
0 Likes

#22

That’s a YAML surprise, rather than an issue with the regex parser. Multi line strings with the pipe indicator have a newline character added after each line, including the last line. Because your regex is using the subexp version of the extended form, it’s expecting a literal newline to be matched at the end. You can verify this by converting it to a normal double quoted string, adding newlines and escapes as required: the regex will then work as expected.

0 Likes

#23

I’ll see what I can do about this issue, given it’s caused by convert_syntax.py.

0 Likes

#24

- match: |- (?x: # turn on extended mode \\ # a literal backslash (?: # ...followed by... "\\/bfnrt] # one of these characters | # ...or... u # a u [0-9a-fA-F]{4} # and four hex digits ) )

The trailing hyphen tells YAML not to append the trailing newline.

0 Likes

#25

Thanks for the pointer, I wasn’t aware. I’ll update convert_syntax.py, and fix the too eager leading whitespace stripping at the same time.

0 Likes

#26

Anyone seeing the “Unable to fetch update url contents” on OS X 10.10.3? I’m unable to update automatically since build 3084.

Thanks, Akos

0 Likes

#27

The minus says not to include the final newline; however if you also wish to have it ignore the ones inside, use “>-” instead of “|-”. I think it doesn’t matter if you already have (?x) on, but it’s worth noting since >- will introduce no whitespace at all.

0 Likes

#28

[quote=“huot25”]Have any of you experienced any performance differences between using a regex with a bunch of keywords vs doing each one written out?

\b(?i:AND|OR|NOT|TRUE|FALSE|SELECT|INPUT)\b

vs.

\b(?i:AND)\b
\b(?i:OR)\b

[/quote]

Performance with Oniguruma will be fundamentally similar between them, however I’d expect the single regex case to be faster.

I’d suggest not worrying about this case too much: sregex is currently falling back to Oniguruma for case-insensitive matching, however support will be added for it shortly. sregex is much more efficient than onig for these alternative heavy regexes: you can get a feel for this by removing the case insensitive flag and seeing the speed difference. For sregex, there is no speed difference between one large regex vs multiple smaller ones.

0 Likes

#29

Yup, that is it. Thanks.

[quote]The minus says not to include the final newline; however if you also wish to have it ignore the ones inside, use “>-” instead of “|-”. I think it doesn’t matter if you already have (?x) on, but it’s worth noting since >- will introduce no whitespace at all.
[/quote]

Also, good to know. I still don’t use YAML enough to remember these differences about multi-line strings. This is a good reminder.

0 Likes

#30

This not exactly correct, > folds the line and replaces line breaks with a single space whereas >- additionally removes the trailing newline. There are a few more special cases however. (yaml.org/spec/1.2/spec.html#id2796251)

[code]import yaml

s = “”"
This is
a folded
string :wink:
With no

trailing newline
but
    leading spaces
sometimes

“”"

d = yaml.load(">" + s)

print(d, “tail”)
print()

d = yaml.load(">-" + s)

print(d, “tail”)
[/code]

[code]This is a folded string :wink: With no
trailing newline but
leading spaces
sometimes
tail

This is a folded string :wink: With no
trailing newline but
leading spaces
sometimes tail[/code]

Anyway, since the converter tool preserves the original string it needs to respect the newlines and include them in the converted representation, so > is not really an option.

0 Likes

#31

Thanks, I described its effect incorrectly. It’s the version I’d come to prefer since it’s never surprised me, but I always use (?x) flag for multiline anyway since I end up using spaces for regex readability, so I hadn’t given much thought to the fact that newline=space with this case.

0 Likes

#32

[quote=“jps”]
I’d suggest not worrying about this case too much: sregex is currently falling back to Oniguruma for case-insensitive matching, however support will be added for it shortly. sregex is much more efficient than onig for these alternative heavy regexes: you can get a feel for this by removing the case insensitive flag and seeing the speed difference. For sregex, there is no speed difference between one large regex vs multiple smaller ones.[/quote]

Thanks Jon! It’s good to know I am not crazy! I spend a few hours over the weekend trying to find the best performing option and I was not seeing any difference between the onig vs sregex, but it was falling back to onig since my regex was using case-insensative matching. Looking forward to having case-insenative matching in one of the future releases. I removed that and the files load instantly vs a 4 second load time.

0 Likes

#33

Hi,

I find the new sublime-syntax format really nice.
I always wanted to change the syntax highlighting for scala but I never wanted to dive into the tmlanguage format.
Now I can do it !

I started something on github at https://github.com/gwenzek/scalaSublimeSyntax if you want to have a look.

I was wondering if there is a way to match all Unicode letters ? Something like match: [a-zA-Z] but that would also catch éèà…

0 Likes

#34

Gwenzek, you can use the “core” Unicode character property classes with \p{NAME_OF_PROP}.

So for example, matching any letters, regardless of case or alphabet, would be \p{L}.
And all Latin letters, regardless of case (and including characters with diacritics) would be \p{Latin}.

See geocities.jp/kosako3/oniguruma/doc/RE.txt , section 3 for more info, or check out the Unicode homepage to figure out exactly what’s included in a given character property class.

0 Likes

#35

Thanks for the link, just what I needed !

0 Likes