Sublime Forum

Can 'embed' be used to define indent-based syntaxes?

#1

Say I want to define a toy version of python syntax, e.g. only allowing to define nested functions (including arguments.

So, e.g. I could have

def asd(a, b, c):
    def erte(e, t, y, u):
        def as():
            #ect

I want to make it so each nested function add a new meta.function scope, so that I can use a plugin to e.g. get the level of nesting by just querying the scope at a point - and more syntax-related things, like only allowing some constructs in some contexts byt not others, etc (e.g. allowing control flow directly inside functions but not inside classes defined inside functions).

Is it feasible? I’m trying to use \1 inside embed’s escape, but it doesn’t seem to work when trying to parse such complex structures.

If someone could provide a minimal example of how to do it, or at least directions, I’d be greatful.

0 Likes

What does '\G' mean in syntax definition?
#2

no, it can’t be done with push,set and with_prototype so it also can’t be done with embed

0 Likes

#3

I guess, what you are looking for is something like:

%YAML 1.2
---
# http://www.sublimetext.com/docs/3/syntax.html
name: Toy Python
file_extensions:
  - py
scope: source.python


contexts:
  main:
    - include: function

  function:
    - match: ^([\t ]*)(def)\s+(\w+)
      captures:
        2: keyword.declaration.function.python
        3: variable.function.python
      push:
        - meta_scope: meta.function.python
        # pop off at a none-empty line with same indention as `def`
        - match: ^(?!(\s*$|\1[\t ]+\S))
          pop: true
        - include: function

Note: It doesn’t work with mixed tab/space indentions.

1 Like

Sublime-syntax for indentation languages
Sublime-syntax for indentation languages
#4

Yeah,something like that, but that also can handle a variable number of parameters for the functions. My natural answer would be to push two contexts after the function name, one for the parameters and one for the function body. Problem is that the \1 backreference doesn’t seem to work when pushing more than one context. I’ve tried instead embedding a context for the function body and handling the arguments inside there, but that doesn’t seem to work either.

Edit: is there any advantage in using [\t ] over \s?

Edit2: doesn’t appear to work, in the following snippet, the first line gets colorised, but the second doesn’t:

def func:
    def func:
0 Likes

#5

Something like this?

%YAML 1.2
---
# http://www.sublimetext.com/docs/3/syntax.html
name: Toy Python
scope: source.python
contexts:
  else-pop:
    - match: (?=\S)
      pop: true

  pop-eol:
    - match: $
      pop: true

  main:
    - include: statements

  statements:
    - match: (?=\s*\S)
      push: statement

  statement:
    - include: pop-eol
    - match: ^([\t ]*)(?=def)\b
      set:
        - function-body
        - function-def

  function-body:
    - meta_content_scope: meta.function.python
    - include: indented-block

  indented-block:
    # Ignore empty line
    - match: ^\s*$
    # pop off at a none-empty line with same indention as `def`
    - match: ^(?!\1[\t ]+\S) #
      pop: true
    - include: statements

  function-def:
    - match: def\b
      scope: keyword.declaration.function.python
      set:
        - expect-colon
        - function-parameters
        - function-name

  expect-colon:
    - match: ':'
      scope: punctuation.separator.python
      pop: true
    - include: pop-eol
    - include: else-pop

  function-parameters:
    - match: \(
      scope: punctuation.section.group.begin.python
      set:
        - match: \)
          scope: punctuation.section.group.end.python
          pop: true
        - match: \w+
          scope: variable.parameter.python
        - match: ','
          scope: punctuation.separator.python
    - include: pop-eol
    - include: else-pop

  function-name:
    - match: \w+
      scope: entity.name.function.python
      pop: true
    - include: pop-eol
    - include: else-pop

This is written to confine the backref to the smallest possible context. Example:

def foo(x, y):
    def bar(z):
        two deep
    one deep

zero deep

def baz(
    a,
    b,
    c,
):
    one deep
2 Likes

#6

Thank you very much! I actually have managed to make “embed” do the dedenting work for me but it feels like I feel its a fragile solution; yours looks way more elegant and robust, so I shall study it. Thank you very much! BTW, what is the reason behind the [\t ] pattern? Isn’t it equivalent to \s when regexes are single-line?

0 Likes

#7

\s also matches form feeds and some other characters.

0 Likes

#8

Ok, after testing, it works wonders and is pretty extendable. I actually am surprised this is not used for the actualy python syntax (ot at least the meta.function scopes are not applied).

0 Likes

#9

The downside is that the backreferences trigger the slower Oniguruma engine. (This is why I made the context with the backrefs as small as possible, so that most of the work is done in other contexts that use the faster engine.) I haven’t done any tests, so it’s possible that the performance hit is small.

0 Likes

#10

Of note, using an embed / escape with a backref may be faster than a pop since the escape always uses oniguruma, and is matched separately from the rest of the match patterns, and only once per line. In the case of a backref, it would be a literal string match, apart from any anchor you may need to set. In those situations, I would imagine Oniguruma would perform reasonably well.

1 Like

#11

So, the patterns that embeds can use sregex while the escape pattern use Onigurama? If so, I’d suggwst for the ‘regex compatibility’ build system so specify that. In a syntax I’m working it reports ‘2 patterns not compatible’. These are two variables only used inside ‘escape’, so it shouldn’t be reported, maybe a reminder that ‘escape’ doesn’t use the new engine in the build output would be better.

BTW, is that build system available for us to tinker with, or does it use an ezternal binsry?’

0 Likes

#12

Embed is a newish feature that was added after the Python syntax rewrite/update. I don’t exactly consider it worthwhile to add this as a feature, but assuming it doesn’t degrade performance significantly you are of course free to make such changes upstream in a pr.

What is your use case for accurate meta block scoping, though? I can only imagine “extend selection to scope” currently, or maybe coloring indentation in different colors.

No, it’s internal.

0 Likes

#13

My use case is not for python itself, but for a custom DSL that looks pythonish. In said dsl, it’d be a lexical error to try to define, say, a class inside a function block. I’d want to catch these kinds of errors at the lexical level, instead of requiring a linter for them.

BTW, off-topic, do you guys have resources on how to tedt syntax performace? The sections on the official docs about syntax texts I cannot make sense of (my fault).

1 Like

#14

no need for resources, just open a file that is highlighted using the syntax you want to test and run the “syntax tests - performance” build system variant

2 Likes

#15

[quote=“FichteFoll, post:12, topic:41733”]
What is your use case for accurate meta block scoping, though? I can only imagine “extend selection to scope” currently, or maybe coloring indentation in different colors.[/quote]

This sort of scoping could be used to implement simple code intelligence features, such as auto-completing the names of variables in scope or finding the definition of a variable. I have a bare-bones implementation of this for JavaScript; if Python provided meta scopes, the same code would work with minimal tweaking.

More sophisticated features would require more specialized tools (e.g. LSP), but I do think that there’s a sweet spot where a simple scoping-based mechanism could provide some useful features to a variety of languages.

0 Likes

#16

@ngppgn Can you post your embed-based implementation? I’ve been trying to get such an implementation working, but I couldn’t figure out how to handle constructs like:

if True:
    (
print("This is, sadly, correctly indented."))
0 Likes

#17

My approach would make that illegal, since I can’t instruct the engine to not escape embeds when in certain contexts (like inside parentheses). The closest legal construct would be

if True:
    (
    print("This is, sadly, correctly indented."))

That’s what I meant when I said your solution is more robust: it allows greater flexibility (I myself don’t mind the gramar for mylang being stricter. In fact, with line wrap I could even force a whole function signature to be expressed in a single line).
Here you go in any case:

[code]%YAML 1.2

file_extensions: [myl]
scope: source.myLang
contexts:
main:
- include: statements

statements:
- match: ^(?=\s*\S)
push: statement

statement:
- match: ^([\t ]*)(def)\b
captures:
1: meta.indent
2: storage.type
embed: def-then-function
embed_scope: meta.function
escape: ^(?!\1[\t ]+\S)

def-then-function:
- match: ‘’
push:
- indented-block
- expect-colon
- function-parameters
- function-name

indented-block:
# Ignore empty line
- match: ^\s*$
# pop off at a none-empty line with same indention as def
- match: ^(?!\1[\t ]+\S) #
pop: true
- include: statements

function-def:
- match: def\b
scope: keyword.declaration.function.python
set:
- expect-colon
- function-parameters
- function-name

expect-colon:
- match: ‘:’
scope: punctuation.separator.python
pop: true
- include: pop-eol
- include: else-pop

function-parameters:
- match: (
scope: punctuation.section.group.begin.python
set:
- meta_content_scope: meta.function.parameters
- match: )
scope: punctuation.section.group.end.python
pop: true
- match: \w+
scope: variable.parameter.python
- match: ‘,’
scope: punctuation.separator.python
- include: pop-eol
- include: else-pop

function-name:
- match: \w+
scope: entity.name.function.python
pop: true
- include: pop-eol
- include: else-pop
[/code]

1 Like