Sublime Forum

Syntax definition: Clojure’s comment next form

#1

I’m trying to write syntax defintion for Clojure’s #_ reader macro. It comments one next form. So this:

abc #_def ghi

Would be equivalent to something like

abc /*def*/ ghi

in C-like languages.

What I’m having trouble is: how to tell Sublime to consume only one form? And that form can be anything: numbers, strings, any amount of nested parens etc. E.g.

abc #_(... [...] ...) def #_"ghi" ...d

Problem is, Sublime definitions assume “consume any number of these” and I see no way to say “consume only one of these”. Since there’s no closing tag, I can’t find an easy way to pop context.

One particularly tricky situation is this:

#_[abc][def]

This should skip [abc] but read [def]. So I can’t really pop on whitespace.

To make things even more complicated, one can use multiple #_ in a row. That means “ignore N next forms”.

#_#_abc def ghi

Roughly equivalent to

;; /*/*abc*/ def*/ ght

Some tests I’m using:

; reader comment on symbol

#_abc def

; reader comment on brackets

#_[abc] def

; reader comment on nested brackets

#_[a [b c]] def

; touching forms

#_[abc][def]
#_[abc]def
#_abc[def]

; nested form

[#_[abc]]

; double reader comment

#_#_abc def ghi

And my current impl that almost works except for “touching forms” and “double reader comment” cases

%YAML 1.2
---
# http://www.sublimetext.com/docs/syntax.html
name: Exprimental
scope: source.experimental
variables:
  ws: '(?:\s|(,))*'
contexts:
  main:
    - include: form
    
  immediately-pop:
    - match: ''
      pop: 1

  form:
    - include: reader-comment
    - include: symbol
    - include: brackets

  symbol:
    - match: '[a-z]+'
      scope: source.symbol

  brackets:
    - match: \[
      scope: punctuation.section.brackets.begin
      push:
      - meta_scope: meta.brackets
      - match: \]
        scope: punctuation.section.brackets.end
        pop:   true
      - include: main

  reader-comment:
    - match: '#_{{ws}}'
      scope: comment.block
      push: 
      - meta_scope: comment.block
      - include: form
      - include: immediately-pop

Any advice?

0 Likes

#2

I think I should clarify: what I need is for the entire form that immediately follows #_ to be in comment scope, so that syntax highlighting will highlight it as comment

0 Likes

#3

Aaaaand I think I figured it out. The solution is to make a copy of every form context and give them one more pop than normal. Then they will pop the comment context, too. This works:

%YAML 1.2
---
# http://www.sublimetext.com/docs/syntax.html
name: Exprimental
scope: source.experimental
variables:
  ws: '(?:\s|(,))*'
contexts:
  main:
    - include: form

  form:
    - include: reader-comment
    - include: symbol
    - include: brackets

  symbol:
    - match: '[a-z]+'
      scope: source.symbol

  brackets:
    - match: \[
      scope: punctuation.section.brackets.begin
      push:
      - meta_scope: meta.brackets
      - match: \]
        scope: punctuation.section.brackets.end
        pop:   1
      - include: main

  reader-comment:
    - match: '#_{{ws}}'
      scope: comment.block
      push: 
      - meta_scope: comment.block
      - include: nested-form

  nested-form:
    - include: reader-comment
    - include: nested-symbol
    - include: nested-brackets

  nested-symbol:
    - match: '[a-z]+'
      scope: source.symbol
      pop: 1

  nested-brackets:
    - match: \[
      scope: punctuation.section.brackets.begin
      push:
      - meta_scope: meta.brackets
      - match: \]
        scope: punctuation.section.brackets.end
        pop:   2
      - include: main
0 Likes

#4
%YAML 1.2
---
# http://www.sublimetext.com/docs/syntax.html
name: Exprimental
scope: source.experimental
variables:
  ws: '(?:\s|(,))*'
contexts:
  main:
    - match: (?=\S)
      push: form

  form:
    - include: reader-comments
    - include: brackets
    - include: symbol

  brackets:
    - match: \[
      scope: punctuation.section.brackets.begin
      push: bracket-body

  bracket-body:
    - meta_scope: meta.brackets
    - match: \]
      scope: punctuation.section.brackets.end
      pop: 2
    - include: main

  reader-comments:
    - match: '#_'
      scope: punctuation.definition.comment
      push: reader-comment-body

  reader-comment-body:
    - meta_scope: comment.block
    - include: form
    - include: immediately-pop

  symbol:
    - match: '[a-z]+'
      scope: source.symbol
      pop: 1

  immediately-pop:
    - match: ''
      pop: 1
1 Like

#5

Oh wow. But how does it work? Why in

  - include: form
  - include: immediately-pop

form only triggers once? Why doesn’t immediately-pop trigger before form?

And second question: any ideas how to make nested #_? So that #_#_abc def ghi comment both abc and def, but not ghi?

0 Likes

#6

Well, actually, brackets and symbol consume onces and pop current context off stack immediately.

reader-comments pushes reader-comment-body on stack. Once a form is consumed afterwards, reader-comment-body is popped and you are back on current context (another form).

In main a form context is pushed on stack each time any non-whitespace is found. Once a symbol or bracket is consumed, return to main and start again.

Why doesn’t immediately-pop trigger before form?

It actually doesn’t. An empty match is just always, true. Hence if no pattern of form matches, the empty match applies and pops context off stack.

That’s actually the reason, why stacking comments didn’t work as expected in my last post. I’ve fixed it:

%YAML 1.2
---
# http://www.sublimetext.com/docs/syntax.html
name: Exprimental
scope: source.experimental
variables:
  ws: '[\s,]'
contexts:
  main:
    - match: (?=\S)
      push: form

  form:
    - include: reader-comments
    - include: bracket
    - include: symbol

  bracket:
    - match: \[
      scope: punctuation.section.brackets.begin
      push: bracket-body

  bracket-body:
    - meta_scope: meta.brackets
    - match: \]
      scope: punctuation.section.brackets.end
      pop: 2
    - include: reader-comments
    - include: main

  reader-comments:
    - match: '#_'
      scope: punctuation.definition.comment
      push: reader-comment-body

  reader-comment-body:
    - meta_scope: comment.block
    - include: form
    - include: else-pop

  symbol:
    - match: '[a-z]+'
      scope: source.symbol
      pop: 1

  else-pop:
    - match: (?!{{ws}})
      pop: 1

Note: I’d recomment to express popping and non-popping behavior of contexts by singular and plural names.

1 Like

#7

Almost! But closing bracket doesn’t work now in case like

[#_abc] []
0 Likes

#8

Including form instead of main in bracket-body seems to fix it though! Let me play with it, so far looks very promising. Thank you a lot!

0 Likes

#9

Damn, now it closes too much… Test case: [[#_abc] []]

0 Likes

#10

Another - include: reader-comments is required in front of - include: main in bracket-body, to recursively handle comments in brackets.

I’ve edited my last snippet.

Including it in main would probably work also.

0 Likes

#12

Thank you for your help! What you did here is amazing, I wouldn’t get there in a million years.

One more struggle: can we do something about multiline?

#_
a
b

#_
   a
b

#_[
abc
]
def

#_#_
a
#_b
c
d
0 Likes

#13

Seems like removing else-pop does the trick…

Full solution so far:

%YAML 1.2
---
# http://www.sublimetext.com/docs/syntax.html
name: Exprimental
scope: source.experimental
contexts:
  main:
    - include: reader-comments
    - match: (?=\S)
      push: form

  form:
    - include: reader-comments
    - include: bracket
    - include: symbol

  bracket:
    - match: \[
      scope: punctuation.section.brackets.begin
      push: 
      - meta_scope: meta.brackets
      - match: \]
        scope: punctuation.section.brackets.end
        pop: 2
      - include: main

  reader-comments:
    - match: '#_'
      push: 
      - meta_scope: comment.block
      - include: form

  symbol:
    - match: '[a-z]+'
      scope: source.symbol
      pop: 1

And test cases:

; reader comment on symbol

#_abc def

; reader comment on brackets

#_[abc] def

; reader comment with nested brackets

#_[a [b c]] def
#_[] []
#_[[]] [[]]
[#_abc] []
[[#_abc] []]
[[#_abc]] [[]]
[ #_[ab[]c] ] [[]]

; touching forms

#_[abc][def]
#_[abc]def
#_abc[def]
[abc]#_[def]
[abc]#_def
abc#_[def]

; whitespace

#_ abc def

; multiline

#_
a
b

#_
   a
b

#_[
abc
]
def

#_#_
a
#_b
c
d

; double reader comment

#_#_abc def ghi
#_#_abc #_def ghi jkl
#_#_#_#_#_#_#_a b c d e f g h i j k
#_#_[abc][def][ghi]
#_#_[abc]#_[def][ghi][jkl]
0 Likes

#14

Without else-pop closing ] after #_ is scoped comment. Not sure if that’s desirable.

Instead, I’d recommend to change else-pop pattern to

  else-pop:
    - match: (?=[^\s,])
      pop: 1
0 Likes

#15

I’m slowly starting to understand the way this all works. So #_ works great, probably because it can’t really be nested. Metadata seems to fall into the same category: you also skip it and get to the next form.

But I see now that stuff like quote (’), syntax quote (`) and deref (@) work differently: they “wrap” another form.

What I want to achieve is that something like

  '''x
; ^^^^ meta.quoted
;  ^^^ meta.quoted meta.quoted
;   ^^ meta.quoted meta.quoted meta.quoted
;    ^ source.symbol

Seems impossible, because depending on the amount of quotes, symbol will need to “pop” different amount of times. And we only detect end of form inside symbol, right?

Specifically, integration between metadata and quote is hard to get right:

;;;;; Quoted inside meta
  ^'x y z
; ^^^ meta.metadata
;  ^^ meta.quoted
;    ^^^^ - meta.metadata - meta.quoted

;;;;; Meta inside quoted
  '^x y z
; ^^^^^ meta.quoted
;  ^^ meta.metadata
;    ^^ - meta.metadata
;      ^^ - meta.quoted

Some examples where this might be practical:

`~x (syntax quote + unquote immediately)
'#'x (quoted var)
@@@x (multiple derefs, equal to (deref (deref (deref x))))

Any ideas?

0 Likes