Sublime Forum

[Solved] Syntax definition based on line number

#1

I’m using the FullProf software, which takes as input a .pcr file. It consists in a predetermined list of values and codewords that correspond to the parameters used by the software.
The syntax depends one the line number (line 1 has a specific syntax, lines 2 to 6 have another syntax, lines 7 to 15 have yet another one, etc).

I’d like to have a text editor for such files with syntax highlighting.
Is it possible to make a new Syntax Definition for Sublime Text, in which rules depend on the line number ?

0 Likes

#2

Yes, it is.

0 Likes

#3

Thanks. Would you care to give me hints as to how I could implement this? I didn’t find that in the documentation.

0 Likes

#4

No it’s not actually possible, however you could use a bunch of clever workarounds in such a way that it does feel like it depends on line numbers.

If you could explain further what line 1 should look like, lines 2-6 should look like and the rest should look like, I can provide a simple toy syntax. Do you have an example?

Also, if the number of different syntaxes is variable, the problem seems impossible. But if it’s fixed, it could be possible.

0 Likes

#5

Generally what you can do is build a stateful chain of matches that consume to end of line. For example:

main:
  - match: $
    set: line-2

line-2:
  - match: $
    set: line-3

line-3:
  - match: $
    set: line-4

# etc

This really isn’t the greatest thing ever, but it will work.

2 Likes

#6

Yep, that’s what I had in mind, and each line match must always succeed as to not break the chain of lines, with the actual format matching being an “optional” match in each case.

0 Likes

#7

Yes, unfortunately, the syntax depends on variables in the file.

Minimal sample syntax:

  • Line 1: plain text
  • Line 2: a list of 19 integers
  • Line 3: a list of 17 integers
  • Line 4: a list of n doubles, where n is the 5th value on line 2
  • Lines 5-16: data, repeated x times, where x is the 3rd value on line 2

Is such a thing still possible?

0 Likes

#8

No, this is not possible. Line 1, 2, and 3 is, but you cannot write a syntax definition, which is based on values inside the definition. (You may create a hack if there is a limited number of possibilities) This would also be a semantical and not a syntactical error and if you really want to check something like that you can write a linter.

0 Likes

#9

The important question is: what do you want to highlight? If you want to highlight integers in one color and doubles in another that is easy and doesn’t require line states.

If you want to check syntax then highlighting is not really the place to do it. As r-stien says, write a linter. You can put it in as a part of your build system so it checks the syntax there, but that may be done by the FullProf software.

You really need to state what you’re trying to achieve.

1 Like

#10

OK, I’ll be more specific.

First, it really is a matter of syntax highlighting and not syntax checking, as FullProf already does that, indeed.

There are mainly three types of data in the file :

  • text strings
  • numerical values (ints or doubles)
  • codewords (in the form of doubles)

The first type is obviously easy to distinguish from the rest, my problem is with the other two. I would like to highlight values in a color (doesn’t matter if it’s an int or a double) and codewords in an other. The problem is, whether a double is a value or a codeword depends on its position (which is what I meant by ‘the syntax depends on the line number’).

Examples:
line 14: 0.00000 0.0 0.00000 0.0 0.00000 0.0 0.000000 0.00 0
Here, the 0.00000 are values and the 0.0 are codewords

line 29: 4.1571165 4.1571165 4.1571165 90.000000 90.000000 90.000000
line 30: 21.00000 21.00000 21.00000 0.00000 0.00000 0.00000
On line 29, these are all values, on line 30, they are codewords.

However, depending on values present before in the file, those lines could have another syntax, which would imply that those doubles should be highlighted in a different color.

0 Likes

#11

If you have no way to identify a code number from a value number other than position, it’s going to be difficult if not impossible I think. There is no direct method as noted because the syntax coloring/highlighting is regular expression based. Now there are contexts, but the contexts really depend entirely on a PRIOR regular expression, so if you don’t have some way to anchor or identify uniquely it’s not going to work.

That being said, your codewords look like they are slightly different from your values? If you can guarantee they are always of the form <whitespace><digit>.<digit><whitespace> then maybe that’s something unique about them that could be parsed out of a line. EDIT: Just noticed your second example where your codewords are just numbers again and not formatted.

I wonder if there’s a way for a method to set a scope on a particular region of text. In that event, you could scan line by line and then then apply some sort of effect. I’ll have to check the API. It’s definitely not a normal syntax highlighting thing though.

0 Likes

#12

OK, just for my own masochistic delight I googled this thing and tracked down some PCR files and the manual. From what I can tell they are essentially data files that pass parameters to some scientific software written in Fortran. Each line contains a number of fields and beyond that they don’t have much of a structure.

BUT lines that begin with a ! are comments which actually makes syntax highlighting this pretty straight forward.

You’ll need to use the comment lines to tag each following line or set of lines that share a common set of columns (fields).

So if your syntax definition matches the comment line

!  Scale        Shape1      Bov      Str1      Str2      Str3   Strain-Model

you know the meaning of these following lines

 0.14336E-04   0.00000   0.00000   0.00000   0.00000   0.00000       0
    11.00000     0.000     0.000     0.000     0.000     0.000

which would allow you to match and format them accordingly, until you encountered this line

!       U         V          W           X          Y        GauSiz   LorSiz Size-Model

which would then trigger a different set of format matching for those kinds of columns.

The syntax highlighting system is actually designed exactly for this sort of thing, so if you familiarize yourself with it you will get the idea. But trying to do it without the comment lines isn’t going to work.

Note that you don’t need to use those exact comments, or match the entire comment line. Anything that uniquely identifies the lines that follow will work.

3 Likes

#13

Oh that’s a blast from the past. I haven’t seen carded data like this since Army CoE work a few decades back. Still that doesn’t seem to match @thjungers example where there are subsets of numbers that should be marked via syntax. If we do have a comment line, we can interpret data after that without too much trouble, but if the context changes midstream without a pattern reference, it might be indistinguishable from the rest of the data.

0 Likes

#14

That it’s inspired by punch cards is not surprising. The comments are indeed critical, but given that you can add them anywhere there’s no excuse not to use them to delineate or markup the data. Even if you’re not parsing the data it makes sense to have them.

0 Likes

#15

Yes, there are comments every now and then to tell what the following parameters are for, but there is no clear way to tell which is a value and which is a codeword, from the comment itself. Or should I make regexes for each possible comment?

EDIT: Actually, I realized that, except for the line 14 I gave as an example above, single lines of numbers after a comment always (I think) contain values, while if there are two lines after a comment, the first one contains values and the second one, codewords. I could make regexes for those two cases, and another for the exception of line 14.

Of course, this assumes that the comments will remain as they are and won’t be changed by the user (which wouldn’t affect the software that reads the file, but would make my code highlight the file wrongly). But I guess I will go with that, as there seems to be no other easy way.

Thanks for the time you spent thinking about this!

0 Likes

#16

You provided a small sample. Would it be possible to see an entire file on pastebin or something, or even just the first 10-15 cards (lines)?

If the comments are pretty uniform, then yeah you could make an expression for one line, then that would lead into the context for the line following, and if the lines toggle, you could perhaps alternate back and forth.

If the comments are not uniform then we don’t have much to hang our hat on. Even after seeing the FullProf doc, I have to admit I’m still fuzzy as to how you’re using the control card file in practice.

0 Likes

#17

I edited my last answer.
If you want your curiosity satisfied, here’s a sample file: rutana.pcr

0 Likes

#18

Actually very helpful. So it looks like there’s two sorts of comments. Lines leading with ! are full line comments and can be pretty much anything, and you have inline comments with # (see line 34). These can be pretty easily styled. It may also be the case that a ! anywhere delineates a comment for text following (see line 17).

I am not sure what the <-- might mean but going back to the document @nutjob2 found might provide more insight if one were to dig into it. Still that’s a symbol that’s pretty easy to create an expression for.

You might be able to create a few custom contexts because the lines:

!  Scale        Shape1      Bov      Str1      Str2      Str3   Strain-Model
!       U         V          W           X          Y        GauSiz   LorSiz Size-Model
!     a          b         c        alpha      beta       gamma      #Cell Info
!  Pref1    Pref2      Asy1     Asy2     Asy3     Asy4      S_L      D_L

are repeated. What you’d want to do is create a match for the comment card, then that would chain into a context for the following line. Create a match for that line (six floats for example) which would chain into a context for the following line, where you might be able to pick out a number in a position and scope that in such a way as it would highlight it, then pop that context off and go back to looking for comment cards.

The only caveat is that since you are looking for very specific comments, it’s not going to be really robust. You’d have to type your comment card line exactly the same way every time (easy enough if you made a snippet for it perhaps, and you could fashion the expression to ignore whitespace). If you ever needed a DIFFERENT card from what you created, you’d have to go re-edit your syntax and add something for the new data set.

1 Like

#19

Okay, so I tried to see if I could do something for this and came up with the following. It is very basic, only provides context for one card, and I have NO idea what you think proper scopes would be. Also it kind of bugs me that the ! on the card that I’m looking for isn’t marked dark grey, but those are fine tuning points.

However, this is going to be really tedious to setup each card you want, and it’s going to take work to really fine tune it into something robust. If you save that as PCR.sublime-syntax and look at your rutana.pcr file, and if you have a color-theme that is looking for the right scopes (I think I’m using Monokai Soda or something like that) you’ll see

  1. Most comments will be styled as comment
  2. The COMM and Name: lines are scoped
  3. My example for the card starts at line 39. You should see that the words in the comment card are coded as keywords (I don’t know if this is desireable, I was just looking for something that would indicate I found the card.)
  4. The next card will match 7 numbers and then move to the second card.
  5. The second card look for 6 numbers. The second number is scoped as a keyword so it ought to change colors.

If this is what you’re looking for maybe this is a push in the right direction. I cannot say that my method is any better or worse than another method, it’s just the way I got it to work. Hope that helps.

%YAML 1.2
---
# See http://www.sublimetext.com/docs/3/syntax.html
file_extensions:
  - pcr
scope: source.fullprof

variables:
  number: '\s[0-9E\-\.]+\s'

contexts:
  main:
    - include: commands
    - include: special-cards
    - include: comments

  commands:
    - match: '^(COMM)\s(.*)$'
      captures:
        1: keyword.other.pcr
        2: entity.name.control-file.pcr
    - match: '^(Name)(:)\s*(\w+)'
      captures:
        1: keyword.other.pcr
        2: punctuation.separator.pcr
        3: entity.name.phase-name.pcr

  special-cards:
    - include: scale-card-start

  scale-card-start:
    - match: ^(!)\s*(Scale)\s*(Shape1)\s*(Bov)\s*(Str1)\s*(Str2)\s*(Str3)\s*(Strain-Model)\s*$
      captures:
        1: punctuation.definition.comment.pcr
        2: keyword.other.pcr
        3: keyword.other.pcr
        4: keyword.other.pcr
        5: keyword.other.pcr
        6: keyword.other.pcr
        7: keyword.other.pcr
        8: keyword.other.pcr
      push: scale-card-1

  scale-card-1:
    - meta_scope: meta.block.scale-card-1.pcr
    - match: |-
        (?xi)
          ({{number}})\s*
          ({{number}})\s*
          ({{number}})\s*
          ({{number}})\s*
          ({{number}})\s*
          ({{number}})\s*
          ({{number}})\s*
      captures:
        1: constant.numeric.pcr
        2: constant.numeric.pcr
        3: constant.numeric.pcr
        4: constant.numeric.pcr
        5: constant.numeric.pcr
        6: constant.numeric.pcr
        7: constant.numeric.pcr
      set: scale-card-2

  scale-card-2:
    - meta_scope: meta.group.scale-card-2.pcr
    - match: |-
        (?xi)
          ({{number}})\s*
          ({{number}})\s*
          ({{number}})\s*
          ({{number}})\s*
          ({{number}})\s*
          ({{number}})\s*
      captures:
          1: constant.numeric.pcr
          2: keyword.other.pcr
          3: constant.numeric.pcr
          4: constant.numeric.pcr
          5: constant.numeric.pcr
          6: constant.numeric.pcr
      pop: true

  comments:
    # Generic in-line comment
    - match: '#'
      scope: punctuation.definition.comment.pcr
      push:
        - meta_scope: comment.line.pcr
        - match: \n
          pop: true
    # Generic card comment
    - match: '!'
      scope: punctuation.definition.comment.pcr
      push:
        - meta_scope: comment.line.pcr
        - match: \n
          pop: true

1 Like

#20

Also, a little snippet to indicate how it looked for me:

1 Like