Sublime Forum

Is sublime regex engine exposed to be used in a python plugin?

#1

I want to make a plugin that parses a custom file format and performs some transformations on it. Since I’ve already done a sublime syntax for this file format, it’d be pretty straightforwards to create python functions that fullfill the role of sublime syntax contexts.

The problem is that the engine python uses is backtracking and on top of that doesn’t offer possessive quantifiers, atomic groups and some other useful tools to simplify expressions in backtracking engines.

So I was wondering, as the title says, if the sregex engine can be accessed and used somehow from a plugin, or if there are sublime plugins that somehow mitigate the shortcommings of python’s engine.

0 Likes

#2

Sublime’s sregexp engine is custom-designed for parsing. It deliberately does not support possessive quantifiers, atomic groups, backreferences, lookbehind, or subexpressions. It is not accessible from plugins.

Sublime also uses the Oniguruma library, which does support a full range of advanced features. I believe that Python bindings exist, though to my knowledge there is no package or dependency that exposes them. It shouldn’t be hard to write one, if you need it.

Alternatively, if you’re parsing files that are open in views, and you already have a robust .sublime-syntax for your custom format, then why not simply use the scoping information that Sublime already provides?

0 Likes

#3

“Alternatively, if you’re parsing files that are open in views, and you already have a robust .sublime-syntax for your custom format, then why not simply use the scoping information that Sublime already provides?”

Inexperience, mostly. I’m still learning both the API and basic python (having problems with variables scope right now, and the fact that local methods doesn’t seem to work like in c#)

Also, the parsing I want to do for the transformation is stricted than the one provided by the sublime-syntax, to provide a better experience when editing files. I guess I could sacrifice that, give the sublime syntax a sctricter ruleset and learn my way through accessing scope regions from the python API

Thanks for the info. A shame that the sublime API doesn not expose sregex.

0 Likes

#4

Is the .sublime-syntax online to look at? There are various techniques for adding detailed meta-scopes in a fault-tolerant manner. It’s probably worth giving that a shot.

0 Likes

#5

The Python module regex is available as a dependency, and if I recall correct, that engine is rather more advanced than re, so it may be useful if you can’t get the .sublime-syntax format to work for you.

0 Likes

#6

No, but since it’s really small, I guess I can put it here (does the forum not allow spoiler tags?)

[code]%YAML 1.2

See http://www.sublimetext.com/docs/3/syntax.html

name: easy_completions
file_extensions:

  • easy_completions
    scope: meta_completions
    contexts:
    prototype:
    • match: ‘##.*$’
      push:
      • meta_include_prototype: false
      • meta_scope: comment.block
      • match: ‘##.*$’
        pop: true
    • match: ‘\s*(#.*)’
      captures:
      1: comment.line
      main:
    • match: ‘’
      push: [trigger, scope]
      scope:
    • match: ‘^- (scopes):\s*(’’)’
      captures:
      1: keyword
      2: string
      push: [trigger, string_scopes]
    • include: ill
      trigger:
    • meta_scope: meta_completion
    • match: ‘^- (trigger):\s*(’’)’
      captures:
      1: storage.type
      2: string
      push: [hint_or_content, string_completion]
    • include: ill
      hint_or_content:
    • meta_scope: meta_hint_or_content
    • match: ‘^(?=\tc)’
      push: content
    • match: ‘^(?=\th)’
      push: hint
    • match: ‘^’
      pop: true
      content:
    • meta_scope: meta_content
    • match: ‘^\t(content):\s*(’’)’
      captures:
      1: storage.type
      2: string
      push: string_content
    • match: ‘^’
      pop: true
      hint:
    • meta_scope: meta_hint
    • match: ‘^\t(hint):\s*(’’)’
      captures:
      1: storage.type
      2: string
      push: string_hint
    • match: ‘^’
      pop: true
      string:
    • match: ‘\t’
      scope: illegal
    • match: ‘’’’
      pop: true
      string_scopes:
    • meta_scope: string.scopes
    • meta_include_prototype: false
    • include: string
      string_completion:
    • meta_scope: string.completion
    • meta_include_prototype: false
    • include: string
      string_hint:
    • meta_scope: string.hint
    • meta_include_prototype: false
    • include: content_patterns
    • match: ‘’’’
      pop: true
      string_content:
    • meta_scope: string.content
    • meta_include_prototype: false
    • include: content_patterns
    • match: ‘’’’
      pop: true
      content_inside_placeholder:
    • meta_scope: string.content
    • meta_include_prototype: false
    • include: content_patterns
    • match: ‘}’
      scope: constant.other
      pop: true
      content_patterns:
    • match: ‘\$’
      scope: constant.character.escape
    • match: ‘${\d+:’
      scope: constant.other
      push: content_inside_placeholder
    • match: ‘$(?x:
      (PARAM)?\d+
      | SELECTION
      | TM_(
      CURRENT_(LINE|WORD)
      | FILE(NAME|PATH)
      | FULLNAME
      | LINE_(INDEXNUMBER)
      | SELECTED_TEXT
      | SOFT_TABS
      | TAB_SIZE
      )
      )’
      scope: constant.other
    • match: ‘$’
      scope: invalid.illegal
      ill: #ill-formed (illegal) tokens
    • match: ‘\S+’
      scope: invalid.illegal

[/code]

I’d love lo learn the techniques you mention - meanwhile I managed to solve my task by using a handcrafted python command - which has been a good learning introduction. My goal here was having a build system to write completion files in a yaml-ish format. Maybe that’d make for a good plugin to publish later on.

0 Likes

#7

Do you have an example of a file that uses this syntax?

0 Likes

#8

Sure

[code]#sfsdfs

  • scopes: ‘some scopes. This string is allowed to be multiline, but in hindsight it probably shouldn’t’ #sdfs - comment
    • trigger: ‘This string is allowed to be multiline, but in hindsight it probably shouldn’t’
      hint: ‘This string is allowed to be multiline, but in hindsight it probably shouldn’t’
      content: ’ '#
    • trigger: ‘rtyr’
      hint: ‘sdfs’
      content: ‘blç~$1º1ah’#
    • trigger: ‘if’
      hint: ‘if = { limit = { … } … }’
      content: ‘if = {
      limit = {
      $1
      }
      $2
      }’
      [/code]
0 Likes

#9

That format looks a lot like YAML. Why not simply use YAML? That way, you could use the many existing tools for working with YAML. You could still use a custom .sublime-syntax to provide more specific highlighting than the default YAML syntax (example).

0 Likes

#10

Well, of course that advixe would be the most sensible one to give, and I appreciate it! I don’t really need the full yaml syntax though, but foremost, I want to acquire the know-how for doing more complex stuff later on. (also I looked at the yaml highlighting files to get a sense of hoe a full-fledged suntax highlighting works, and man is that hard to follow, for me at least.

0 Likes