Sublime Forum

How to make a configuration-dependent syntax?

#1

Hi,

I’m writing a new pluggin for Ledger accounting.

An syntax already exists here, but this assumes English date format, which is not my case.

So I’ve written a new syntax file which looks like this:

%YAML 1.2
---

# See http://www.sublimetext.com/docs/3/syntax.html
file_extensions: [ledger, journal]
scope: source.ledger

variables:
  date: "\\d{2}[/-]\\d{2}[/-]\\d{4}"
  account: "\\w[\\w:\\s_-]+"
  payee: "[^*]+"

contexts:
  main:

    # Commentary
    - match: ^\s*(;.*)$
      scope: comment.line

    # Non-cleared entry
    - match: ^({{date}})([=]?)({{date}})?\s+({{payee}})$
      scope: invalid.invalid

    # Cleared entry
    - match: ^({{date}})([=]?)({{date}})?\s+\*\s+({{payee}})$
      captures:
        1: string.other.date
        2: punctuation.separator
        3: string.other.edate
        4: markup.italic.desc.cleared

    # Account with amount
    - match: ^\s+({{account}})\s+([-$£¥€¢\d.,_\w]+.*)$
      captures:
        1: markup.bold.account
        2: variable.other.amount

    # Account without amount
    - match: ^\s+({{account}})\s*$
      scope: markup.bold.account

    # Account
    - match: ^(account) ({{account}})\s*$
      captures:
        1: keyword.other
        2: markup.bold.account

    - match: ^(payee) ({{payee}})\s*$
      captures:
        1: keyword.other
        2: markup.italic.desc

    # Periodic transaction
    - match: ^(~)\s(.*?)$
      captures:
       1: punctuation.section.period
       2: constant.other.expression

    # Automatic transaction
    - match: ^(=)\s(.*?)$
      captures:
        1: punctuation.section.automated
        2: string.regexp.expression

My problems:

  1. I don’t want to change the syntax each time a new person wants to use another date format.
  2. You can see in the file two matches which look pretty the same. Indeed, a star can be added in the entry line to tell the entry has been cleared (it appears on your account register). What I want is the entry to be highlighted if this is not cleared (I did not managed to do something better that telling that’s invalid. I should have a look to making a additional color map layer). Yet, this behavior is always desired.

Conclusion: I’d like to know if this is possible to create one or two varibles in the configuration file so that it impacts the syntax file in both cases.

Thanks for your help !

0 Likes

#2

Syntaxes aren’t configurable in the fashion that I think you’re asking about; they’re a static list of the rules that define how to interpret the text in the file.

For something like this, generally you as the syntax author (or the authors of the file format that you’re working with) would determine what constitutes the various different types of dates that you’re going to support, and then potentially augment that in the future as needed if the list needs to grow. To minimize pain on this front you’d probably want to support the most common formats initially.

The various constructs of sublime-syntax files allow the ability to reuse regular expressions using variables, reuse match rules using include directives, and using the context stack to do things like minimize duplication/make things easier.

As an example, I converted the tmLanguage syntax you linked to above using the built in tools (Open the tmLanguage file, then choose Tools > Developer > New Syntax From from the menu) and modified it a bit:

%YAML 1.2
---
# http://www.sublimetext.com/docs/3/syntax.html
name: Ledger
file_extensions:
  - ldg
  - ledger
  - ldgr
scope: source.ledger
variables:
  date: |-
    (?x:
      \d{4}[/-]\d{2}[/-]\d{2} |
      \d{2}[/-]\d{2}[/-]\d{4}
    )
  account: '[\w:\s_-]+'
  payee: '[^*\n]+'
contexts:
  main:
    - match: ^\s*(;.*)$
      captures:
        1: comment.line
    - match: '^(?={{date}})'
      push:
        - match: '({{date}})([=])?({{date}})?'
          captures:
            1: string.other.date
            2: punctuation.separator
            3: string.other.date
        - match: '\s+\*\s+({{payee}})'
          captures:
            1: invalid
        - match: '\s+({{payee}})'
          scope: markup.italic.desc
        - match: '$'
          pop: true
    - match: '^\s+({{account}})\s+([-$£¥€¢\d.,_]+.*)$'
      captures:
        1: markup.bold.account
        2: variable.other.amount
    - match: '^\s+({{account}})\s*$'
      scope: markup.bold.account
    - match: ^(!\w+)(.*)$
      captures:
        1: keyword.control
        2: constant.other
    - match: "^([YPNDCiobh]).*$"
      captures:
        1: keyword.other
        2: constant.other
    - match: ^(~)\s(.*?)$
      captures:
        1: punctuation.section.period
        2: constant.other.expression
    - match: ^(=)\s(.*?)$
      captures:
        1: punctuation.section.automated
        2: string.regexp.expression

This includes variables to match the same text as your syntax above (but based on the existing syntax definition). Here the definition for the date variable specifies that the date matches two ways; the original way and the one you outlined above. Other formats could be added to the value easily.

Apart from using the account variable in the appropriate match rules, the other change to look at is this:

    - match: '^(?={{date}})'
      push:
        - match: '({{date}})([=])?({{date}})?'
          captures:
            1: string.other.date
            2: punctuation.separator
            3: string.other.date
        - match: '\s+\*\s+({{payee}})'
          scope: invalid
        - match: '\s+({{payee}})'
          scope: markup.italic.desc
        - match: '$'
          pop: true

This defines a match rule that asserts that there is something that looks like a date (one of the two formats) sitting at the start of a line without actually capturing the data. If that’s the case, then we know that this line is a line that starts with a date.

When that’s seen the push adds a new anonymous context onto the context stack. This makes only the match rules inside of this context active. These will be the only rules that can match any text until something uses a pop to remove this anonymous context from the stack, which makes the rules in the main context active again.

In our anonymous context we have rules for:

  • Matching a date or date range
  • Matching a payee that is preceded by an * character, which makes it invalid
  • Matching a payee without a * character in front, which makes it valid
  • Matching the end of the line, which returns us back to the main context rules

Hence we avoid the duplication of two matches that are almost but not quite the same (which is arguably more readable, less error prone and easier to augment) and instead just focus on the fact that the payee is the part that’s different.

Using this syntax, we get a result something like the following; the items that are preceded by an * are marked as invalid, but the others are not. The dates match as dates regardless of which of the two formats they use, and the accounts also match up across all of the examples.

image

1 Like