Sublime Forum

Syntax coloring special comments for IBM Assembler


I’m working on a Sublime syntax definition file for IBM Assembler (yes, really). There is an interesting problem that I ran into.

Ordinary comments are easy to describe: asterisk * in column 1 to the end of the line (internal macro comments start with .* in columns 1 and 2). However, you can add “remarks” after an instruction statement, which are also treated as comments by the assembler.

Example “Hello world” Assembler program (“remarks” are after the instruction statement on each line):

         USING    *,R15           Map the module in R15
         BALR     R15,0           Put my address in R15 and continue
         STM      R14,R12,12(R13) Put caller's regs in his save area
         DROP     R15
         USING    HELLOASM,R12
         LR       R12,R15         Put my base address in R12
         ST       R13,SAVE+4      Store caller's SA in my back pointer
         LA       R1,SAVE         Put my save area into caller's...
         ST       R1,8(R13)        ...forward pointer
         LA       R13,SAVE        Now put my save area in R13
         WTO      'Hello world!'  Print 'Hello world!' to console
         L        13,SAVE+4       Put caller's SA address back in R13
         LM       R14,R12,12(R13) Restore caller's regs
         XR       R15,R15         Zero out R15 (good return code)
         BR       R14             Branch to return address
SAVE     DC       18F'0'          My save area, 18 fullwords (72 bytes)
R0       EQU      0
R1       EQU      1
R2       EQU      2
R3       EQU      3
R4       EQU      4
R5       EQU      5
R6       EQU      6
R7       EQU      7
R8       EQU      8
R9       EQU      9
R10      EQU      10
R11      EQU      11
R12      EQU      12
R13      EQU      13
R14      EQU      14
R15      EQU      15
         END      HELLOASM

Writing a regular expression to handle this is tricky because there is no explicit delimiter. The remarks simply start at the end of the instruction statement, and there doesn’t seem to be any restriction to the starting column (i.e., must start in column X) which would make it much easier to handle. The only column restriction I can see is that it must end in column 71, because a non-blank character in column 72 means the statement continues on the next line.

I’ve thought about writing a regex for a general instruction statement and using capture groups to segment out portions of the statement, but that would also be tricky because some things that look like instructions aren’t really machine instructions. They could be macro calls or assembler directives (e.g., EJECT, EQU, USING, etc.) which I would want to color differently. That is, unless you can implement further matching logic within capture groups, but I don’t see that capability.

For instance,

- match: '^([\w@#$]+)?\s(\w+)\s(\S+)?\s(.*)$'
    1: optional statement label
    2: instruction mnemonic (these would be different colors, depending on what they are - possible in this context?)
    3: optional instruction parameters
    4: remarks (comments)

Any other ideas on this, or is this complex enough to where I might have to write a more advanced script to analyze the file contents?



People still use IBM Assembler?



I guess it’s time to learn push / pop / set and maybe also include if you want to do it context-aware.

You don’t have to write a powerful/complex regex to parse one single line. And when they are across lines, you just can’t do it in a single regex in ST’s syntax definition.



I’m aware of the context facility, just posting here to see if anyone had additional ideas.



@jonbradley Those of us who work on mainframes absolutely do!



This should be no problem with push/set, just as jfcherng said. E.g. something like: (not tested)

  - match: '^\S+'
    scope: optional statement label
  - match: \S+
    scope: instructional mnemonic
      - match: \S+
        scope: optional instruction parameters
          - meta_content_scope: comment
          - match: $\n?
            pop: true
      - match: $\n?
        pop: true

If you use named contexts and push/set/include them, you can add more specific rules for the different parts of a line.