I’m working on a Sublime syntax definition file for IBM Assembler (yes, really). There is an interesting problem that I ran into.
Ordinary comments are easy to describe: asterisk
* in column 1 to the end of the line (internal macro comments start with
.* in columns 1 and 2). However, you can add “remarks” after an instruction statement, which are also treated as comments by the assembler.
Example “Hello world” Assembler program (“remarks” are after the instruction statement on each line):
HELLOASM CSECT USING *,R15 Map the module in R15 BALR R15,0 Put my address in R15 and continue STM R14,R12,12(R13) Put caller's regs in his save area DROP R15 USING HELLOASM,R12 LR R12,R15 Put my base address in R12 ST R13,SAVE+4 Store caller's SA in my back pointer LA R1,SAVE Put my save area into caller's... ST R1,8(R13) ...forward pointer LA R13,SAVE Now put my save area in R13 WTO 'Hello world!' Print 'Hello world!' to console L 13,SAVE+4 Put caller's SA address back in R13 LM R14,R12,12(R13) Restore caller's regs XR R15,R15 Zero out R15 (good return code) BR R14 Branch to return address SAVE DC 18F'0' My save area, 18 fullwords (72 bytes) R0 EQU 0 R1 EQU 1 R2 EQU 2 R3 EQU 3 R4 EQU 4 R5 EQU 5 R6 EQU 6 R7 EQU 7 R8 EQU 8 R9 EQU 9 R10 EQU 10 R11 EQU 11 R12 EQU 12 R13 EQU 13 R14 EQU 14 R15 EQU 15 END HELLOASM
Writing a regular expression to handle this is tricky because there is no explicit delimiter. The remarks simply start at the end of the instruction statement, and there doesn’t seem to be any restriction to the starting column (i.e., must start in column X) which would make it much easier to handle. The only column restriction I can see is that it must end in column 71, because a non-blank character in column 72 means the statement continues on the next line.
I’ve thought about writing a regex for a general instruction statement and using capture groups to segment out portions of the statement, but that would also be tricky because some things that look like instructions aren’t really machine instructions. They could be macro calls or assembler directives (e.g.,
USING, etc.) which I would want to color differently. That is, unless you can implement further matching logic within capture groups, but I don’t see that capability.
- match: '^([\w@#$]+)?\s(\w+)\s(\S+)?\s(.*)$' captures: 1: optional statement label 2: instruction mnemonic (these would be different colors, depending on what they are - possible in this context?) 3: optional instruction parameters 4: remarks (comments)
Any other ideas on this, or is this complex enough to where I might have to write a more advanced script to analyze the file contents?