Sublime Forum

Syntax highlighting for column-based languages possible?

#1

I am wrapping my head around the question whether it is possible to create a custom syntax highlighting using the new .sublime-syntax definitions for the IBM High-Level Assembler (HLASM), a computer programming language still widely used on Mainframes today. The language stems from a time where the code lines were stored on punched cards of 80 characters width and therefore have some rather obscure properties from today’s point of view.

Problem:
When there is any character in column 72, then the contents of the next line must be matched as if both were concatenated in one line. This can be repeated, i.e. a continuation line may also have a line-continuation character in position 72, so that the continuation line continues on the next line, and so on…

Additional Problems:

  • There might be characters after the line-continuation character (e.g. usually a so called sequence number of 8 digits), that should be highlighted differently as they are discarded by the assembler.

  • Any characters in a continuation line before column 16 should also be highlighted differently, as they are also discarded by the assembler.

  • Characters in the continuation-column should be highlighted differently.

Each line consists of the following entries separated by one or more blanks (the entries themselves may not contain any blanks, except in strings):

[name] operation operand(s) [remark]

The name and remark entries are optional. The operand(s) are sometimes optional too.

From my point of view it should be possbile to get a rather basic highlighting done, which would consist of the four contexts name, operation, operand, and remark. Each of them would consume one character at a time unless it is in column 72 or a blank. A blank would move to the next context. A line-continuation would eat up all character until the next line, all characters until column 16, and then continue the current context.

But if I would want to have some more sothisticated solution, where the operands are highlighted differently based on the content, then I see problems how to achive that (i.e. a regular expression that consumes more than one character but must not consume any character past column 71 - not to speak about that it may continue on the next line on column 16).

I am still waiting for a ST3 beta that supports the .sublime-syntax to give my basic ide a try.

Thanks in advance, greetings from Stuttgart, Germany,
Jens

0 Likes

#2

Your best bet would probably to make use of (many) look-behinds that check if a certain number of characters has been reached before attempting to match the following text. Then you would push appropriate contexts on the context stack that will "match everything until the end of the line as " and “continue matching on the next line” where it left of. If you make each segment of the line its own context (which you then change with set: next-context or set: base-context for the last segment, you will be able to appropriately split the line at any position. Only remaining concern is that highlighting could go insane if you are in the middle of editing a line and a context condition is not met.

0 Likes

#3

That’s what I’m doing for a Cobol syntax description, with statements like (sorry, tmLanguage format)

[code]
fileTypes

cob
cpy
cbl

name
Cobol
patterns


comment
Line comments (fast version, no prefix/line/suffix distinction)
begin
^.{6}*
end
(?=^.{6}^*])
name
comment.line.cobol

	<dict>
		<key>comment</key>
		<string>Multiline COMPUTE clause ('=' not on first line)</string>
		<key>begin</key>
		<string>(?&lt;=^.{6})^*/]\s*((?i:COMPUTE))\s+^\n=]*$</string>
		<key>end</key>
		<string>(^.{6})^*/]\s+(=)\s</string>
		<key>name</key>
		<string>meta.scope.compute.cobol</string>
		<key>beginCaptures</key>
		<dict>
			<key>1</key>
			<dict>
				<key>name</key>
				<string>keyword.control.reserved.cobol</string>
			</dict>
		</dict>
		<key>endCaptures</key>
		<dict>
			<key>1</key>
			<dict>
				<key>name</key>
				<string>comment.block.prefix.cobol</string>
			</dict>
			<key>2</key>
			<dict>
				<key>name</key>
				<string>keyword.control.reserved.cobol</string>
			</dict>
		</dict>
		<key>patterns</key>
		<array>
			<dict>
				<key>match</key>
				<string>(?&lt;=^.{6})*/].*</string>
				<key>name</key>
				<string>comment.line.cobol</string>
			</dict>
			<dict>
				<key>match</key>
				<string>(?&lt;=^.{72}).*</string>
				<key>name</key>
				<string>comment.block.suffix.cobol</string>
			</dict>
			<dict>
				<key>match</key>
				<string>^.{,6}</string>
				<key>name</key>
				<string>comment.block.prefix.cobol</string>
		    </dict>
		</array>
	</dict>
	<dict>

[/code]

If your files are padded to 80 columns, you can also handle the suffix area with statements like

<dict> <key>comment</key> <string>Prefix area</string> <key>match</key> <string>^.{,6}</string> <key>name</key> <string>comment.block.prefix.cobol</string> </dict>

As long as you’re putting the prefix/suffix area first in your syntax file, you no longer have to consider the first 6 columns, and you can use statements like

<dict> <key>comment</key> <string>Special registers. Source: Enterprise COBOL for z/OS &gt; Language Reference &gt; Part 1: COBOL language structure &gt; Character-strings</string> <key>match</key> <string>(?&lt;!-_A-Za-z0-9])(?i:ADDRESS-OF|DEBUG-ITEM|JNIENVPTR|LINAGE-COUNTER|RETURN-CODE|SHIFT-(IN|OUT)|SORT-(CONTROL|CORE-SIZE|FILE-SIZE|MESSAGE|MODE-SIZE|RETURN)|TALLY|XML-(CODE|EVENT|(N?)(NAMESPACE(-PREFIX)?|TEXT))|WHEN-COMPILED)(?=^-_A-Za-z0-9])</string> <key>name</key> <string>variable.language.special-register.cobol</string> </dict>

The only slightly annoying thing is that you can’t handle the first suffix (not prefix) area in a multi-line statement (remaining suffix areas are correctly handled, as shown in the COMPUTE clause in the first snippet).

Oh, and handling multi-line strings is also interesting :smile:

In yaml, it gives Something like (using Sublime Text translator)

[code]%YAML 1.2

http://www.sublimetext.com/docs/3/syntax.html

name: Cobol
file_extensions:

  • cob
  • cpy
  • cbl
    scope: source.cobol
    contexts:
    main:
    • match: ‘^.{6}*.*’
      comment: Line comments (fast version, no prefix/line/suffix distinction)
      scope: comment.line.cobol
    • match: ‘(?<=^.{6})^/]\s((?i:COMPUTE))\s+^\n=]*$’
      comment: Multiline COMPUTE clause (’=’ not on first line)
      captures:
      1: keyword.control.reserved.cobol
      push:
      • meta_scope: meta.scope.compute.cobol
      • match: ‘(^.{6})^*/]\s+(=)\s’
        captures:
        1: comment.block.prefix.cobol
        2: keyword.control.reserved.cobol
        pop: true
      • match: “(?<=^.{6})/].
        scope: comment.line.cobol
      • match: “(?<=^.{72}).*”
        scope: comment.block.suffix.cobol
      • match: “^.{,6}”
        scope: comment.block.prefix.cobol

- match: "(?<=^.{72}).*"
  comment: Suffix area
  scope: comment.block.suffix.cobol

- match: "(?<!-_A-Za-z0-9])(?i:ADDRESS-OF|DEBUG-ITEM|JNIENVPTR|LINAGE-COUNTER|RETURN-CODE|SHIFT-(IN|OUT)|SORT-(CONTROL|CORE-SIZE|FILE-SIZE|MESSAGE|MODE-SIZE|RETURN)|TALLY|XML-(CODE|EVENT|(N?)(NAMESPACE(-PREFIX)?|TEXT))|WHEN-COMPILED)(?=^-_A-Za-z0-9])"
  comment: "Special registers. Source: Enterprise COBOL for z/OS > Language Reference > Part 1: COBOL language structure > Character-strings"
  scope: variable.language.special-register.cobol[/code]

Hope this helps,

Martin

0 Likes