Sublime Forum

ST3 Syntaxes: Nested Blocks with Balanced Delimiters

#1

I’m trying to create a custom AsciiDoc syntax for ST3 and am facing problems handling nested blocks with variable-lenght balanced delimiters:

[TIP]
===============
A TIP admonition block.

[NOTE]
=======
A NOTE admonition block nested inside the TIP block.
=======
===============

Since syntax RegExs only work on single lines, and because syntax don’t offer variables to store a macthed string, I can’t find a way to establish when a newly encountered block delimiter is marking the end of the current block or the beginning of a new nested block.

The only apparent way to achieve this would be to first trying to capture a delimiter with the same-lenght of the opening one (in which case the context can be popped); failing that, any match of four or more = chars would be the start of a new block.

I couldn’t find any syntax that uses balanced delimiters this way (the ST AsciiDoc syntaxes are mostly broken, and the markdown syntax doesn’t support nesting balanced block), so I wasn’t able to find inspiration in others’ syntaxes — but maybe it’s just because I don’t know where to look for, i.e. me not knowing which languages/syntaxes support a similar feature.

Nested blocks of this type are quite common in AsciiDoc, and being able to handle them correctly is important — e.g. section headings inside these blocks are not accounted for the document structure, nor in the TOC, so mismatching a nested block opening delimiter for the closing delimiter of the main block can have undesired consequences not only on the document’s syntax coloring but also (and most of all) on plugins designed to handle the document structure.

Can anyone help me on this?

  • Is there a package which supports this type of syntax construct, which I could study?
  • Is it indeed possible to achieve this in ST3 syntaxes?

I was hoping that maybe the Oniguruma engine exposes some advanced RegEx patterns to allow attaching functions to a capturing group, or store it on some interal variable or stack — whatever hack might help referencing the previous match of the opening delimiter.

It would be great if ST3 syntaxes natively supported some kind of temporary storage variables associated with each context, allowing to reference them from withing the RegEx patterns.

0 Likes

#2

You can. Take a look at how heredocs are implemented in Packages/Ruby/Ruby.sublime-syntax, or in Packages/ShellScript/Bash.sublime-syntax, or how raw strings are implemented in C++.

1 Like

#3

Thanks a lot!

This really helps. All I needed is to study a syntax which accomplished that, so I can learn how to do it.

0 Likes

#4

RegEx Allow Backreferencing Groups Captured by Pushing Context!

@rwols, after having looked at the syntax of the native Ruby package, as you suggested, it looks like ST3 support using backreferences in RegExs, pointing to the match that pushed the context.

Although this is undocumented, the official docs mention a similar usage for embed and escape:

  • embed. Accepts the name of a single context to push into. While similar to push, it pops out of any number of nested contexts as soon as the escape pattern is found. This makes it an ideal tool for embedding one syntax within another.
    • escape. This key is required if embed is used, and is a regex used to exit from the embedded context. Any backreferences in this pattern will refer to capture groups in the match regex.

Probably this also applies to other pushed contexts then, meaning it’s possible to backreference the captured groups of the pushing context’s match.

I haven’t had a change to test this yet, but looking at how the heredocs work in the Ruby syntax it seems a reasonable conclusion. Here are selected snippets from Ruby.sublime-syntax, for heredoc-plain only (side comments added):

  heredoc-plain:
    - match: (<<[-~])(')(\w+)(')
      captures:
        1: punctuation.definition.heredoc.ruby
        2: meta.tag.heredoc.ruby punctuation.definition.tag.begin.ruby
        3: meta.tag.heredoc.ruby entity.name.tag.ruby
        4: meta.tag.heredoc.ruby punctuation.definition.tag.end.ruby
      push: [heredoc-plain-indented-literal, trailing-heredoc-start]

  heredoc-plain-indented-literal:
    - meta_scope: meta.string.heredoc.ruby
    - meta_content_scope: string.unquoted.heredoc.ruby
    - include: indented-heredoc-end

  indented-heredoc-end:
    - match: ^\s*(\3)$ # HEREDOC delimiter   <-- note the `\3`
      scope: meta.tag.heredoc.ruby
      captures:
        1: entity.name.tag.ruby
      pop: true

It’s a pity that this isn’t adequately covered in the official ST3 Syntax Documentation, for it’s a powerful feature that can be exploited to handle complex syntaxes.

The fact that RegEx backreferences are mentioned only for embed and escape might induce the reader to think that their use is not available elsewhere (which the above discussion seem to prove that this is not case).

So, back to my original balanced delimiters problem; it seems that it should be indeed possible to implement nested blocks, by ensuring that the backreference to the opening delimiter is the left-most match, so that a matching ={4,} nested-opening delimiter will have lower priority when they both match the patter, which should allow to distinguish between the two.

Again, thanks a lot @rwols, without your help I wouldn’t have been able to find a real-case example of how this can be done (as I said in the original post, I wasn’t aware where to look for the answer).

I hope this thread might prove useful to others who are facing similar problems in their syntax definitions (maybe now a more appropriate thread title would be “backreferences in RegEx”, but “balanced delimiters” might also be a good way to draw attention to this problem).

1 Like

#5

There’s a PHP Heredocs section inthe official docs which explicitly mentions this, but I guess it would be useful to have a note near the push or pop reference too

0 Likes