Sublime Forum

**ngppgn** · October 19, 2017, 11:46am

It would be awesome for syntaxes modularity and maintainability, if we could have am import_variables: that added the variables of to the ones in the syntax importing them (in case of name conflict, the importing syntax should override the imported one).

This would allow e.g. syntaxes for several lexically related grammars to share the same pool of tokens much more painlessly, specially regarding maintainability.

**ThomSmith** · October 19, 2017, 1:49pm

You can implement this using YAML Macros:

%YAML 1.2
%TAG ! tag:yaml-macros:YAMLMacros.lib.extend:
---
name: Example
scope: source.example

variables: !extend
  _base: myVariables.yaml

contexts:
  ...

**ngppgn** · October 19, 2017, 3:25pm

Interesting. WIll look into it. Thanks (still, a built-in functionality for it would be neat):

**wbond** · October 19, 2017, 3:28pm

In my experience with syntaxes, copying/maintaining a few variables tends to be one of the things I spend the least amount of time on.

I think that some more controls for context stack manipulation would be a better use of time, since it would allow us to handle things like complex heredocs properly.

**ngppgn** · October 19, 2017, 3:44pm

Yes, by all means. More expressivity always trumps minor maintainability/QoL imrovements. (though in my current case, some variables are an OR of like hundreds of literals which need to be updated in like twenty files… Probably the yaml macros plugin will take care of that for me now.

**deathaxe** · October 19, 2017, 7:04pm

In such a case, I’d try to put the list of literals into a single match in a common file and include that scope in all needed syntaxes.

The include:

%YAML 1.2
---
name: Common
hidden: true
scope: library.mysyntax

contexts:
  
  variables:
    - match: |-
        (?x)
        (?:
          literal1 | literal2 | literal3
        )
      scope: variable.language.mysyntax

The file using the include:

%YAML 1.2
---
name: mysyntax
scope: source.mysyntax

contexts:
  main:
    - include: MySyntax Include.sublime-syntax#variables

**rwols** · October 19, 2017, 7:33pm

I’m still keeping my fingers crossed for an inheritance model of sorts.

**ThomSmith** · October 19, 2017, 8:35pm

While we’re waiting, is there anything YAML Macros could do to make life easier? I’m open to improving it if possible.

**ThomSmith** · October 19, 2017, 8:41pm

I think that some more controls for context stack manipulation would be a better use of time, since it would allow us to handle things like complex heredocs properly.

Do you have something particular in mind? As it stands, the old backreference behavior is the only way to handle features like tag matching and heredocs. Extending the syntax engine to handle these more directly sounds like a tough challenge, but also one that could greatly benefit complex languages.

**wbond** · October 19, 2017, 8:48pm

Yes, heredocs can be “nested”, but are FIFO in order, unlike pretty much all other constructs in most languages. They also have this funky ability to allow code to continue on until the end of the line. Here is an example (in Ruby): https://github.com/sublimehq/Packages/issues/710. If I recall correctly, Bash allows similar constructs with multiple heredocs per line.

In order to handle this, we could use some sort of ability to have temporary stack that could be pushed or unshifted. Then it would need to be able to be “applied” to the main stack. I haven’t spent any time yet on how feasible the implementation is. It is a fairly specialized requirement, from what I can tell, so it hasn’t been very high priority.

**ThomSmith** · October 19, 2017, 9:55pm

(I apologize in advance to the extent that I’m telling you things you already know.)

One way to do this would be to let each context take a parameter. You would have a stack of [context, parameter] pairs; or, equivalently, a stack of contexts and a stack of parameters. Parameters could be (and usually would be) empty. Then, the matching rules of the context could depend upon the parameter. (This might be exactly what you’re suggesting.)

This is basically how the current system works, where captures are a sort of hidden parameter that the child context can access. But it’s not properly implemented, probably because this is legacy tmLanguage behavior that I suspect was actually a bug to begin with.

This approach is incompatible with the Sublime strategy of (I assume) precompiling all of a context’s regexps into a DFA. Changing the parameter would require recompiling the context. I don’t know how long it takes to compile a single context; maybe it’s not that bad, and maybe the vast majority of the work can be cached anyway. This depends on the internals of the custom regexp system. Dynamic recompilation might be fast enough.

The reason this isn’t a problem today is that backreferences automatically trigger the legacy regexp engine. A new, proper parameter implementation could do the same. The old engine is relatively slow, but there’s no compilation time. It may be faster than dynamic recompilation (not to mention easier on memory).

Any such implementation could be tremendously useful for quite a few languages. Heredocs are a great example, but you could also use it for tag matching in HTML/XML/JSX and possibly even to handle whitespace in Python. Wouldn’t that be something!

Here’s an example I’ve come up with. It features with_parameters, which tells the engine to set a parameter on the pushed context. For XML tag matching:

contexts:
  main:
    - match: <(\w+)>
      scope: open-tag.xml
      push:
        - with_parameters:
            tag_name: \1
        - meta_scope: meta.tag.xml
        - include: tag-body

  tag-body:
    - match: </({{$tag_name}})>
      scope: close-tag.xml
      pop: true
    - match: </(\w+)>
      scope: invalid.illegal.unmatched
    - include: main

For Python indentation:

contexts:
  main:
    match: ''
    push:
      - with_parameters:
          indent: ''
      - include: block

  block:
    - match: ^\s*$ # do nothing, empty line
    
    - match: ({{$indent}}(?:\s+)) # increase indent
      push:
        - with_parameters:
            indent: \1
        - meta_scope: meta.block.python
        - include: block

    - match: ^{{$indent}}(?=\S) # do nothing, same indent

    - match: ^ # decrease indent
      pop: true

The best implementation would probably be a separate parameter stack. Every time a context is pushed onto the context stack, a parameter record (or null) would be pushed onto the parameter stack. If a context refers to a parameter using {{$param}}, that context would be either use the old engine or be dynamically compiled. It would look up the value starting from the top of the stack. (If no value was found, perhaps an unmatchable value should be substituted.) This way, the parameter records are nice and immutable. In the usual case where a context neither pushed nor used any parameters, everything would work as it does now, and the overhead would be pushing a null value onto a stack. In cases where the new behavior was used, the parameter record overhead should hopefully be small and the context matching performance should be at least as fast as the current backreference system (or faster if dynamic recompilation worked).

I haven’t done the math, but offhand, I think this would augment Sublime’s parser to handle a variety of context-sensitive languages. It would be orthogonal to other means of improving the parser, such as adding nondeterminism.

**Carpenter** · October 20, 2017, 1:00pm

How do you use them with regular expressions? Variables avoid repetitions and you can handle a long list of strings.

Example:

- match: '\b(string)\s*=\s*({{VARIABLE_XY}})'
  captures:
    1: scope.txt
    2: scope2.txt

I believe that you can’t replace this variable with a context.

**Carpenter** · October 20, 2017, 1:08pm

Thank you for your plugin. Could you be more specific? I don’t master Python and I try to do what you say using your plugin, but ST says “variables are missing”.
What I need to do?

**ThomSmith** · October 20, 2017, 2:39pm

If you post your code, I can take a look at it.

**ngppgn** · October 20, 2017, 8:49pm

Now that you mention possible enhancements of yamlmacros…m

One major improvement: let the user of yaml macros write the macros in a normal text file with normal syntax rather than in Python. That would help it be used by people who are not fluent in Python, and increase the overall readability of the macros’ content. (Of course I understand this would ba a huge change in the behaviour of the plugin as well.)

A medium-sized improvement would be allowing us to define macros that take more than one parameter.

And a micro-improvement: allow us to customise the first character of a macro call, rather than it be hardcoded to ‘!’.

**deathaxe** · October 21, 2017, 1:03pm

The idea behind my suggestion is to create common re-usable rules.

Here is an extended version of my prior example

%YAML 1.2
---
name: Common
hidden: true
scope: library.mysyntax

contexts:
  
  variables:
    - match: |-
        (?x)
        (?:
          literal1 | literal2 | literal3
        )
      scope: variable.language.mysyntax

The main file

%YAML 1.2
---
name: mysyntax
scope: source.mysyntax

contexts:
  main:
    # the mapping key or the L-value in an assignment
    - match: \bstring\b
      scope: meta.mapping.key variable.other.
      push: mapping-maybe-operator

  mapping-maybe-operator:
    - meta_content_scope: meta.mapping
    - match: =
      scope: punctuation.separator.mapping.key-value keyword.operator.assignment
      set: mapping-value
    # no assignment, pop off
    - match: (?=\S)
      pop: true

  mapping-value:
    # the value contains one of the imported variables
    - meta_content_scope: meta.mapping.value
    - include: MySyntax Include.sublime-syntax#variables
    # pop off if something else then the variable was found
    - match: (?=\S)
      pop: true

This looks a bit more complicated on a first glance, but allows proper meta-scope and invalid.illegal handling. The later one is not included here.

In general I’d suggest to avoid capturing examples like…

- match: '\b(string)\s*=\s*({{VARIABLE_XY}})'
  captures:
    1: scope.txt
    2: scope2.txt

They can slow down the whole lexer by 10 to 15% and cause highlighting the whole expression only if it was completed. While writing string won’t be highlighted until = and the list of VARIABLE_XY matches. This causes poor writing experience.

**ThomSmith** · October 21, 2017, 1:43pm

Syntactically, YAML Macros are implemented using YAML tags. This means that a YAML Macros file is a syntactically valid YAML file, and the macro system simply transforms it into another YAML file. This makes the implementation very simple, and it explains some of the design quirks of the system.

One major improvement: let the user of yaml macros write the macros in a normal text file with normal syntax rather than in Python.

One way this could be done is by writing a Python macro that grabs text out of another file. This could be implemented in a library bundled with the package so that it could be used without having to write any more Python.

Could you give an example of a use case for this?

A medium-sized improvement would be allowing us to define macros that take more than one parameter.

Because of the way the syntax works, you can only pass a single argument. However, if you pass an array, the macro system will pass the elements of the array as separate arguments rather than as a single array argument. So in effect, you can use this to pass multiple arguments. If you have an example of what you’re looking to do, we can see whether the current system may do the trick.

And a micro-improvement: allow us to customise the first character of a macro call, rather than it be hardcoded to ‘!’.

The ! is part of the YAML tag syntax. As-is, you can use the YAML syntax to choose another prefix, but that prefix must begin with !. This makes it possible to import multiple macro libraries from a file. This is something that’s less likely to change, because it’s mandated by the YAML spec itself.

**Carpenter** · October 22, 2017, 11:33am

Is your plugin compatible with ApplySyntax?

This is a fictitious example, but should cover all my needs:

In User:
Example.sublime-syntax

%YAML 1.2
%TAG ! tag:yaml-macros:YAMLMacros.lib.extend:
---
name: Example
scope: source.example

variables: !extend
  _base: myVariables.yaml

contexts:

  main:
    - match: '\b(blabla)\s*=\s*({{VARIABLE_1}})'
      captures:
        1: scope1
        2: scope2
    - match: '\b(blablabla)\s*=\s*({{VARIABLE_2}})'
      captures:
        1: scope1
        2: scope2

myVariables.yaml

  VARIABLE_1: |-
    (?x:
        literal_string_1
      | literal_string_2
    )

  VARIABLE_2: |-
    (?x:
        literal_string_3
      | literal_string_4
    )

Maybe the syntax is wrong?

Variables_macros.py (empty)

If it doesn’t take too long, could you help me please?

**ThomSmith** · October 22, 2017, 9:51pm

This should be totally orthogonal to ApplySyntax; they should work together just fine.

In your example, you should have a file named Example.sublime-syntax.yaml-macros. When you build this file, it should create a file in the same directory named Example.sublime-syntax. That file will not contain the %TAG directive or any macros; rather, it will be the result of applying all of the macros in your original Example.sublime-syntax.yaml-macros file. It is this compiled output file that Sublime will use for syntax highlighting.

Are you creating an Example.sublime-syntax.yaml-macros file and using the YAML Macros build system to compile it?

**Carpenter** · October 26, 2017, 11:03am

Your suggestion is very interesting, because I use a lot of regex and get a huge perfomance loss.
Could you add invalid.illegal handling in your example please? Just to be sure.

Importing variables from one syntax to another