Sublime Forum

[Solved] Syntax definition based on line number

#21

Sorry about the spam, but created slightly improved (in my view) scoping with this syntax file: https://pastebin.com/KntDsrN8

I didn’t like the way the card itself didn’t look exactly like a comment. Plus I cleaned up the scoping around the number values. Kind of an odd combination of using scope, meta_scope, and meta_content_scope, but each line should indicate (if you are looking at the full scope) the comment card, the data card 1, and data card 2.

If you got really fancy, you could create patterns for floats, integers, etc. I just went kind of sloppy and assumed the user was going to type in something legitimate, so I just made it search for numbers, e and -.

EDIT: Also technically I called the source source.fullprof but then used pcr for everywhere else for the source name. Honestly you’d want to pick one or the other and just roll with it consistently. It’s a control file for FullProf, but the file type is PCR, so… do what you will with it.

EDIT2: ACTUALLY, since you have comments, you could create your own sub-language for this if you wanted that would really help with pattern matching. For each card type you wanted you could create a keyword. Though I’ve looked through the PCR document and maybe this is already the case. I’ve been unable to tell exactly how it knows which cards are next and maybe the code is looking at the comments already, and you may not want to change their content. But if you can, you can make your own keyword language that will help you scope the numbers properly.

2 Likes

#22

Thanks! Your code really helped me make a first syntax definition.

I ended up expecting a line of values after a comment and a line of codewords after that, if it’s not a comment. I’ll write exceptions to those rules for the few blocks of lines that behave differently.
Here’s my code: https://pastebin.com/ghSpZtjk

It highlights the PCR files I tested quite well, but I have two small issues:

  1. I’d like to have codewords in a different color when they are equal to zero. At line 58, I made a match for {{null-number}} (defined at line 9), but it doesn’t match those values. I tested the regex at regex101 and it seems OK.
  2. I made an exception to the generic-block (called atomic-content-block), that should match the comment lines starting with !Atom and the lines after that, but it matches generic-block instead. If I comment out lines 15 and 16 (that match generic-block), then the atomic-content-block lines are highlighted correctly.
    EDIT: I tried changing the regex at line 15 to (?i)^(?=\!(?!\s*Atom)) or (?i)^(?=\!)(?!\!\s*Atom), so that the generic-block wouldn’t match ! Atom, but that doesn’t seem to word either.

Any ideas on what could be the source of those problems?

Thanks.

0 Likes

#23

I’m not entirely sure I have the whole file, but I tried what you put on pastebin. At the end of the file though it has ... which makes me think maybe there was more.

Anyway, looking at that Rutile/Anatase file you provided earlier, it does show a great deal of scoping so that’s good. It might be useful to you to either write a little command to print out the scope at the point and then bind it to a keystroke, or there is a plugin called ScopeAlways that you can toggle on and it puts the scope at the point in the status bar. (I prefer my own because the status bar has kind of tiny text and my eyes aren’t fantastic – I’ll include it at the end of the post just for reference; it’s quite basic and easy.)

So, I’m not entirely sure, but I think perhaps the issue you’re having is due to the fact that your blocks have no closing aspect to them. It looks like your end-block context is trying to forward look to a comment character. I don’t think that’s working, and so you never pop a context off the stack. For example it looks like you’re trying to trap the !Atom line and create a context on that. If you move the cursor a little past Titanium into empty space, you’ll see the scope stack still reads:

vhdl-mode: source.pcr meta.generic-block.pcr meta.generic-line-values.pcr 

(Ignore the vhdl-mode part, that’s simply because the scope sniffing utility is in my VHDL language package and I make sure any console output that comes out of my package is tagged with the name.)

Once you are in a context, it’s not scanning for everything else (usually – there is a method for prototype but use that with caution because it takes precedence over everything. It’s good for comments though).

Also everything works on a first-come-first-served preference. So if you match your generic block first, it’s not looking for your atomic block.

However this is a good place to start.

Some thoughts (I may start to ramble here – I keep going up and down, looking at different things and this little editor window is tiny.)

  1. Could you take out every comment line and would the program still know what to do?
  2. How does it know what to do? Is the order of cards fixed?
  3. I know COMM is a keyword for the top card. Is Name also this way, or is that just ignored by the program?
  4. What is the <--Space group symbol text? Is it a command or is it ignored because the program recognized a P command and it ignores everything after the pattern?

Reasons I ask, if the cards are in a fixed order, then you could just create a chained context card1, card2, card3 and then end when you got to the end of a group. That might be one strategy.

The two consecutive card method is working but either it’s popping something off the stack and immediately putting it back on or something. At the end of codewords though, it’s not shifting back well.

Not entirely sure why your null-number variable isn’t matching before the regular number is. I think the regex for your null is a little off, but not in the way you’re using it (you can try it out at http://regexr.com/ and see that the exponential thing isn’t working, and it’ll try to find a match for the 0 in 0.23 when it should ignore that.) Still it matches 0.000 so that seems a little strange. EDIT: I think the exponential issue is that I don’t always put a + in front of a positive exponent. If you change [+-] to [+-]? it seems to match exponents correctly.

It might make sense to start slightly fresh and just see if you can style Atom lines by themselves. I am also wondering if the cards work on the Fortran method of using the first character of every line to indicate something, though this kind of breaks on the Atom lines so I don’t know if that’s true. I wonder about the P card though and I wonder how the FullProf program determines the dataset for each card. I think it might be the first two lines – Npr Nph Nba makes me wonder if it’s listing the number of cards of various types but it doesn’t exactly match up with the data following so I don’t know for sure.

I’m a little scattered today, so I don’t know that this helped all that much. Maybe I can take a closer look at it ovre lunch or if I get a longer lull this afternoon. There are some weird things in there too like the inline comment not styling properly, etc.

And just in case it’s helpful, here’s my little scope sniffing command. Or, like I said, you can try that ScopeAlways plugin if you find that easier. I just didn’t like the pop up phantom with scope (though that’s also a possibility if you like that.)

class vhdlModeScopeSnifferCommand(sublime_plugin.TextCommand):
    """
    My own scope sniffing command that prints to
    console instead of a popup window.
    """
    def run(self, edit):
        """ST3 Run Method"""
        region = self.view.sel()[0]
        sniff_point = region.begin()
        print('vhdl-mode: {}'.format(self.view.scope_name(sniff_point)))
1 Like

#24

Aha! I think I’m onto something, but it’ll take some time to elaborate. In true FORTRAN card fashion, the key is column 1.

0 Likes

#25

Okay, fiddled a little at lunch. The following file seems to style rutana.pcr fairly well, including generic cards.

The basic premise:

  1. COMM starts a command file name card.
  2. Name starts… I’m not entirely sure. But it looks important.
  3. The comment !Atom will precede the atomic cards. There might be a way to know this but I don’t know all the card rules.
  4. Column 1 will always be whitespace if it’s a generic card.

That’s how we get started. Generally you want to search for the specific down to the generic because it’s priority encoding based. Ending the cards was actually a bit of the tricky part, as well as changing comments to not pop on a newline, but on a forward looking newline. Since the cards have no terminator character, newline is CRUCIAL and it cannot be consumed.

So, ending rules.

  1. Any card with another card after it is assumed to be a codewords card. Thus this does impose some rules on you for making sure to separate lines with ! lines.
  2. The context searches forward non-consuming for ! and then pops.

The Atom cards

  1. Atom cards are done a little differently because they’re specific and they don’t always have comments between. So if column 1 is non-whitespace, it’s assumed to be an element and we look for an element card.
  2. In the Atom context, if column 1 is whitespace, it’s assumed to be a codewords card and styled accordingly.

Miscellaneous:

  1. I changed your number format back to mine, because it seemed to work better.
  2. I got zeroes working.
  3. I don’t know how robust this is for any other PCR file.
  4. I have no idea what to do with P cards and I cards. You’ll see those are white and we don’t know what to do with them.
  5. On line 17 I changed the ! to a # because it looks more like an inline comment.
  6. The <-- Space group symbol actually looks like an inline comment but instead of being marked as such, it’s using the (speculation) idea that once the FORTRAN program figures out what’s in column 1, it parses the line for the content it wants and ignores the rest.

Try this on for size:

%YAML 1.2
---
# See http://www.sublimetext.com/docs/3/syntax.html
file_extensions:
  - pcr
scope: source.pcr

variables:
  number: '-?[0-9eE\+\-\.]+'
  zeroes: '-?[0eE\+\-\.]+'
  element: '[A-Z][a-zA-Z]?[+-]?[0-9]*'

contexts:
  main:
    - include: commands
    #- include: special-cards
    - include: comments

  commands:
    - match: '^(COMM)\s(.*)$'
      captures:
        1: keyword.other.pcr
        2: entity.name.control-file.pcr
    - match: '^(Name)(:)\s*(\w+)'
      captures:
        1: keyword.other.pcr
        2: punctuation.separator.pcr
        3: entity.name.phase-name.pcr
    - match: '^(!)(Atom)(.*)\n'
      scope: comment.line.pcr
      captures:
        1: punctuation.definition.comment.pcr
        2: keyword.other.pcr
      push: atomic-cards
    - match: '^\s'
      push: generic-value-card

  atomic-cards:
    - meta_content_scope: meta.block.atomic-cards.pcr
    - match: (?=!)
      pop: true
    - match: '^({{element}})'
      captures:
        1: entity.name.element.pcr
      push: atomic-element-card
    - match: '^\s'
      push: atomic-codewords-card

  atomic-element-card:
    - meta_scope: meta.group.element-card.pcr
    - match: \n
      pop: true
    - match: '{{number}}'
      scope: constant.numeric.real.pcr
    - match: '{{element}}'
      scope: entity.name.element.pcr
    - include: inline-comment

  atomic-codewords-card:
    - meta_scope: meta.group.codewords-card.pcr
    - match: \n
      pop: true
    - match: '{{zeroes}}'
      scope: comment.null-command.pcr
    - match: '{{number}}'
      scope: keyword.other.pcr
    - include: inline-comment

  generic-value-card:
    - meta_scope: meta.group.values-card.pcr
    - match: \n
      set: generic-codewords-card
    - match: '{{number}}'
      scope: constant.numeric.real.pcr
    - include: inline-comment

  generic-codewords-card:
    - meta_scope: meta.group.codewords-card.pcr
    - match: (?=!)
      pop: true
    - match: '{{zeroes}}'
      scope: comment.null-command.pcr
    - match: '{{number}}'
      scope: keyword.other.pcr
    - include: inline-comment

  comments:
    - include: full-line-comment
    - include: inline-comment

  inline-comment:
    - match: '#'
      scope: punctuation.definition.comment.pcr
      push:
        - meta_scope: comment.inline.pcr
        - match: (?=\n)
          pop: true

  full-line-comment:
    - match: '!'
      scope: punctuation.definition.comment.pcr
      push:
        - meta_scope: comment.line.pcr
        - match: (?=\n)
          pop: true

Photo op:

2 Likes

#26

It is my whole file. However, I didn’t write the syntax for every component of the pcr file, yet. I stopped to try and fix the two issues I mentioned.

I actually use Ctrl+Shift+Alt+P, it shows the scope at the position of the cursor, in a popup. I used it a lot when I tried to make the regexes match my lines (which I had a LOT of trouble doing!)

When I add new lines after the two expected lines, the text does not have the scope meta.generic-block.pcr, so it pops off the stack (at least, when there are two lines after the comment). As I said, if I comment out the part matching generic-block, then the Atom comments and the following expected lines are colored correctly. And the scope pops off the stack at the first comment line (which is then not colored as generic-block is commented out).

This is why I defined the match for !Atom before !. Isn’t that enough?

Yes. However, once the pcr file is read, the program rewrites it to put the comments back.

Yes, the order of cards is fixed. However, it depends on previous values (e.g. if the 4th value is set to n (int), then the program expects, at the following line, n*2 doubles. So it can’t be matched with fixed regexes (which was my question at first)).

Actually, this line corresponds to the name of the phase. In this file, the phase is named “Name: phasename”, but in other files, it’s just “phasename”. So the program doesn’t care about it.

It’s actually not a P command, juste a bunch of characters (in [1-9A-za-z/\s]) that correspond to one of the 230 space groups. The rest is considered a comment (I know, why not keep using the ! ?).

I didn’t put the ? because the program always outputs exponent with a + after the E.

1 Like

#27

Oh, didn’t see you had replied.
Your idea of using (?=\n) to pop seems interesting. I’m not on my computer right now, I’ll try that as soon as I can.

0 Likes

#28

Yeah, those questions were all just sort of that mental muddle as the problem sorts itself out in my head. All the information is good though. Take a look at my updated suggested syntax for it.

And yes, the scope in the phantom (that’s what Sublime API calls the little pop up box) is possible. I just like my own little checker in the console because I can read it better.

You did define !Atom first, however it was already in the generic context. I don’t believe it EVER popped off so it never actually looked for !Atom. In my revised syntax file, anything that could be a subset of a line (like inline comments) used forward non-consuming lookahead for newline to end. That way I could consume newline usefully to either switch contexts like a generic value card to a generic codeword card.

It’s important to realize (and this is very hard to see. I don’t know of a way to expose it) that there are actually two stacks in play. There is the scope stack which is what you see when you look at the scope at a point. However there’s also a context stack and keeping that stack clean is tricky. Like I said before, I think you’d gotten into the generic card context and never left it. Once you are in a context those are the ONLY rules that apply. So until you pop it back, you won’t see your main context.

Anyway, hope that helps a bit more. It was an interesting problem to fuss with.

1 Like

#29

OK, I got everything to work as intended :grinning:

Regarding the zeroes not getting highlighted: it seems SublimeText syntaxes don’t fancy dashes in variable names. I replaced null-number with zeroes and it worked like a charm.

I kinda like the idea of watching the following lines to determine whether it is a value or a codeword, as I’m not sure there will always be a space before the values, so I tried to make it work that way.

You’re absolutely right, that’s why I couldn’t match the rest. To solve that, I got rid of the generic-block context: instead of having a generic-block encapsulating all the generic lines, I push generic-comment directly from main, which then sets generic-values, then generic-codewords, then pops. That way, I’m not stuck with a context in the stack and everything works as intended.

The other “exception” lines will be easy to implement, now that it’s working.

Here’s the revision of my previous code: https://pastebin.com/a22hpqHK

Thanks for the time you spent on this problem.

1 Like

#30

Sure thing! Glad you got something that’s working for you. Best of luck.

0 Likes