I’m working on a Sublime syntax definition file for IBM Assembler (yes, really). There is an interesting problem that I ran into.
Ordinary comments are easy to describe: asterisk *
in column 1 to the end of the line (internal macro comments start with .*
in columns 1 and 2). However, you can add “remarks” after an instruction statement, which are also treated as comments by the assembler.
Example “Hello world” Assembler program (“remarks” are after the instruction statement on each line):
HELLOASM CSECT
USING *,R15 Map the module in R15
BALR R15,0 Put my address in R15 and continue
STM R14,R12,12(R13) Put caller's regs in his save area
DROP R15
USING HELLOASM,R12
LR R12,R15 Put my base address in R12
ST R13,SAVE+4 Store caller's SA in my back pointer
LA R1,SAVE Put my save area into caller's...
ST R1,8(R13) ...forward pointer
LA R13,SAVE Now put my save area in R13
WTO 'Hello world!' Print 'Hello world!' to console
L 13,SAVE+4 Put caller's SA address back in R13
LM R14,R12,12(R13) Restore caller's regs
XR R15,R15 Zero out R15 (good return code)
BR R14 Branch to return address
SAVE DC 18F'0' My save area, 18 fullwords (72 bytes)
R0 EQU 0
R1 EQU 1
R2 EQU 2
R3 EQU 3
R4 EQU 4
R5 EQU 5
R6 EQU 6
R7 EQU 7
R8 EQU 8
R9 EQU 9
R10 EQU 10
R11 EQU 11
R12 EQU 12
R13 EQU 13
R14 EQU 14
R15 EQU 15
END HELLOASM
Writing a regular expression to handle this is tricky because there is no explicit delimiter. The remarks simply start at the end of the instruction statement, and there doesn’t seem to be any restriction to the starting column (i.e., must start in column X) which would make it much easier to handle. The only column restriction I can see is that it must end in column 71, because a non-blank character in column 72 means the statement continues on the next line.
I’ve thought about writing a regex for a general instruction statement and using capture groups to segment out portions of the statement, but that would also be tricky because some things that look like instructions aren’t really machine instructions. They could be macro calls or assembler directives (e.g., EJECT
, EQU
, USING
, etc.) which I would want to color differently. That is, unless you can implement further matching logic within capture groups, but I don’t see that capability.
For instance,
- match: '^([\w@#$]+)?\s(\w+)\s(\S+)?\s(.*)$'
captures:
1: optional statement label
2: instruction mnemonic (these would be different colors, depending on what they are - possible in this context?)
3: optional instruction parameters
4: remarks (comments)
Any other ideas on this, or is this complex enough to where I might have to write a more advanced script to analyze the file contents?