Sublime Forum

Unusual Syntax: Help needed with Scopes Naming

#1

I’m working on a syntax definition and am struggling to work out the right scope namings for the language at hand.

I’ve read the ST3 official docs, and the Textmate docs too, and looked at many packages syntax definitions as examples. My understanding is that, for the sake of consistency across syntaxes, and for allowing color schemes to work with most syntaxes, every effort should be done to stick to the common scope names mentioned in the documentation.

The language I’m currently working on has some unusual syntax constructs, and I just can’t see them fitting any of the common scopes. What should I do then? Should I just go ahead and use their natural semantics as scopes?

It’s a language for creating text-adventures, so it tries to hide programming complexity as much as possible, and use words instead of more classical “programmer symbols”, and almost no punctuation (except for string and comments delimeters).

For examples, the languages defines “verbs”. Verbs remind of functions, but their structure is too flexible to fall in that category — beside, verbs can be defined at a global level, inside classes, or inside instances. Furthermore, there are also “syntax” statements which are used to define how the player input relates to one (or more) verbs; paramters are definied in syntax statements, not in verb declarations (even though it’s inside the verb block that the parameters are actually used).

Can I just use the verb and syntax scopes for these? And other semantic-based scopes for other elements of the syntax that don’t quite fit the common languages paradigm?

Unfortunately I’ve found both the ST and Textmate documentation on scopes a bit vague regarding edge cases and unusual syntaxes. When it comes to deciding scope names it seems like a “no man’s land”: we’re told it’s all both arbitrary and convention-based; on the one hand there seems to be freedom of choice when it comes to semantics, but on the other hand sticking to common practice is advised.

Mainstream languages do tend to fall in line with each other when it comes to conventions (each one borrowing from the successes of other languages); but for unconventional languages the situation is not so clear.

Some more guidelines on how to handle uncommon syntaxes would help.

Thanks.

1 Like

#2

Why not keyword for verbs and variable.parameter for parameters?

0 Likes

#3

Hi @kingkeith . Thanks for the prompt reply.

I need to capture the whole verb block with some semantic scope that will allow plugins, scripts and settings to be aware of it (eg: for determining when to use an autocompletion list or not); also, the verb block can be quite lengthy, having checks and conditions, and other code inside it.

If my understanding is correct, with meta scopes there is quite some freedom in using custom semantics, if so I’d like to capture the verb block as verb.

My confusion boils down to a failure to see the bigger picture where scopes fit in, and the whole purpose of them. I know that ST uses some of the common scopes to implement its editing functionality, while other scopes are intended for the package developer’s own use in scripts, plugins and settings. I’m never sure which scopes have a special meaning in ST.

On the one hand, I see some language constructs as resembling interfaces and functions, but because the overlap is not 100% my question is: In case of ambiguity, would it be better to choose an arbitrary scope name based on semantics, or is it better to use one of the common scopes even if their correspondence is weak?

I thought of just scoping parameters as parameter, even though they are not inside a function scope; would that be fine?

This syntax I’m working on is really ambiguos at times, because it aims to look like natural language. For example, to define a class and create an instance, the syntax would be:

EVERY person IsA actor
  -- [... some attributes...]
END EVERY.

THE teacher IsA person At kitchen
  DESCRIPTION "A serious looking teacher."
  -- [... some attributes, verbs, etc...]
END THE Teacher.

Should I then scope the keywords EVERY and THE as storage.type? After all, even if disguised as natural English, they are equivalent to class and new in other languagess; pretty much like And and Or are the word equivalents of & and |.

0 Likes

#4

I’ve just found this scopes list, from an Atom package:

It’s very interesting and provides many useful scopes example which might be of inspiration when unable to determine a scope name. I thought of sharing it here.

0 Likes

#5

Whenever possible, you should use the standard scope names. Even when you have an unusual language, it is usually better to use the standard scope names as best you can, even if they don’t mean quite the same things as they would in a more typical language.

EVERY person IsA actor
  -- [... some attributes...]
END EVERY.

THE teacher IsA person At kitchen
  DESCRIPTION "A serious looking teacher."
  -- [... some attributes, verbs, etc...]
END THE Teacher.

Should I then scope the keywords EVERY and THE as storage.type?

The storage.type scope is definitely appropriate. I think that EVERY would even make sense as storage.type.class. Then, person would be entity.name.class, IsA would be storage.modifier.extends, and actor would be entity.other.inherited-class. END EVERY would be keyword.control. In the THE block, teacher would probably be entity.name and At would be storage.modifier. The periods would be punctuation.terminator.

0 Likes

#6

Thanks a lot @ThomSmith, this was very useful; especially the tips on IsA and At, which I had no clue as to their scope.

I’m still trying to understand the full coverage of the keyword.control scope, it seems a very broad category and I’m often tempted to use it for whatever keyword I’m not quite sure where it should belong to (after all, everything about a programming language boils down to controlling in some way or another the code, so it’s easy to see it as a fix-it-all category).

I’m confident that as I start to assign scopes to the key syntax elements, the others will fall in place as a consequence.

0 Likes

#7

Ok, I’ve managed to implement the class definition block (the head and tail sections, the full body still needs to be implemented) and used the scopes suggested above.

Quoted Identifiers

I’m now facing another conondrum with unusual syntax: in Alan language an identifier containing spaces is enclosed within single quotes (this is because identifiers are also exposed to the player during game, as objects names, etc.). So, a class definition could also be:

EVERY 'evil troll' IsA actor
  -- [... some attributes...]
END EVERY.

(of course, the use of quoted identifiers would make more sense in instances, since classes aren’t going to be actually exposed to the player, but for semplicity’s sake I’m going over the previous example. In any case, any identifier in the source can be in the quoted variant).

Right now, I’m capturing the quoted identifier as:

EVERY 'evil troll' IsA actor
      ^ punctuation.definition.string.begin.alan
       ^^^^^^^^^^ entity.name.class.alan
                 ^ punctuation.definition.string.end.alan

I was wondering is this is ok; ie, that the single quotes are scoped as string delimiters even though the actual content inside them is not scoped as as tring but as a class name instead.

The point here is that I want to index quoted identifiers just like their unquoted counterparts. Semantically, they are not strings; but the presence of the quote delimiters make them resemble strings.

I tried to look for similar examples in other syntaxes but couldn’t find any — the only similar cases that came to mind were BNF/EBNF syntaxes, which (in some flavours) they place non-terminal symbols in single quotes (unlike terminals, which are in double quotes); but there aren’t many BNF syntaxes for ST, and in syntaxes I’ve found for other editors (that use Textmate like syntax definitions) I’ve noticed that they either use common scope (like string which doesn’t deliver good semantics and doesn’t help indexing symbols at all) or use some completely arbitrary scope name (eg: terminal, non-terminal).

Are there any reasons why I shouldn’t scope the single-quotes as punctuation.definition.string in quoted identifiers? (ie, do they have a special meaning for ST, which might result in some erratic IDE behaviour?)

Redundant Class Name

Another issue I’m facing with this syntax is that in many constructs it allows an optional repetition of the initial identifier when ending the construct block:

EVERY toy IsA object
  -- [... some attributes...]
END EVERY toy.

How should I scope the last toy in END EVERY toy.?

Obviously, it’ again the class name; but scoping it as entity.name.class would result in redundancy in the symbols index (when using Goto Symbol, user would expect to find a class/instance definition, not it’s block-tail repetition). This optional identifier in the END line is useful when dealing with long code blocks, as a reminder of what the END refers to.

So, on the one hand I don’t want it indexed, but on the other hand I think it should be highlighted like it’s counterpart at the beginning of the block.

Should it be scoped as entity.other.inherited-class?

Thanks for the all kind support.

0 Likes

#8

you could scope it the same, and include a .tail suffix, for example, and then use a .tmPreferences file to exclude this scope from the symbol index.

some plugins might get confused with this, but ST’s built-in functionality maybe won’t be affected - it’s hard to say for sure, but I’d recommend not using punctuation.definition.string just in case, in favor of something like punctuation.definition.identifier.

the following resource may help identify the most common/appropriate punctuation scope to use:
http://brianreilly.me/sublime-syntax-dashboard/#/scopes/Punctuation

2 Likes

#9

Thanks @kingkeith, you’ve been of great help.

you could scope it the same, and include a .tail suffix, for example

I like the solution, and this would allow me to exclude them from the Symbols index via settings.

How much freedom there is when it comes to adding further suffixes to scopes? I mean, the hard part is to find the right scope from the list of common scopes; but once found it, are we totally free to use custom semantics in the suffixes?

If I understood correctly, what really matter is the head part of the scope naming that matters, for most plugins, color schemes, etc., will be looking at first two or three scopes only, usually.

something like punctuation.definition.identifier

I really like this, and I’m starting to see how problems should be handled when it comes to scoping.

Once I’ve really worked my way through syntax scopes, I’d like to create a repo with commented examples on how to use scopes in custom syntaxes. The documentation on this is still vague, and finding threads that discuss the issue is not always easy. I think that a project with comments examples and some documents would really be helpful, especially if it provided small syntaxes as examples, for looking at existing syntaxes like C, Python, etc., can be quite difficult because of their size.

0 Likes

#10

correct - though it probably makes sense to also try to keep custom suffixes as common as possible too where possible :wink:

0 Likes

#11

I was wondering is this is ok; ie, that the single quotes are scoped as string delimiters even though the actual content inside them is not scoped as as tring but as a class name instead.

Yes.

EVERY 'evil troll' IsA actor
      ^^^^^^^^^^^^ meta.string
      ^ string.quoted.single punctuation.definition.string.begin.alan
       ^^^^^^^^^^ entity.name.class.alan - string
                 ^ string.quoted.single punctuation.definition.string.end.alan

I tried to look for similar examples in other syntaxes but couldn’t find any

The only example I can think of off the top of my head is Oracle, which lets you double-quote identifiers. To implement optionally-quoted identifiers without a ton of duplicate code, that syntax uses YAML Macros.

0 Likes

#12

Thanks for the link. But I notice that in the Oracle syntax example it does however seem to also scope the whole captured regex as string:

- match: (")([^"]+)(")
  captures:
    '3': punctuation.definition.string.end.sql
    '2': variable.other.table.sql
    '1': punctuation.definition.string.begin.sql
  pop: true
  scope: string.quoted.double.sql

Here it’s using both captures: and scope: — I thought it would have to be either one or the other. Unless scope: is equivalent to 0 capturing group. In any case, it would seem to make more sense to write the above as:

- match: (")([^"]+)(")
  captures:
    '3': punctuation.definition.string.end.sql
    '2': variable.other.table.sql
    '1': punctuation.definition.string.begin.sql
    '0': string.quoted.double.sql
  pop: true

So, what is actaully happening here? it seems that the actual contents within quotes are being double scoped as both string and variable.other.table.

Right now, I’m experiencing lots of problems in implementing the example of this thread because I’m loosing scope with the various push, set and include, and can’t seem to isolate the problem.

For sure, whitespace uncosumed by the regexs is getting in my way. But from past experience, I know that sometime the scope: syntax also is a cause of problems, and often just chaning it to captures: and 1: solves the issue.

I’ve spent all morning trying to fix these problems and create some reusable contexts to avoid redundancy, but it seems that with every include the problem complicates further. Especially when using multiple push or set:

push: [class_body, class_identifier]

… I end up either loosing the meta scope prematurely, or either not closing it. But the code seems right, for every push there is a pop (at least, there should be with the addition of the forced pops mentioned above).

Is there a way to probe the syntax highlighter stack via ST console?

0 Likes

#13

Thanks for the link. But I notice that in the Oracle syntax example it does however seem to also scope the whole captured regex as string:

Yeah, that looks like a bug. The whole syntax isn’t in great shape; its primary purpose is to handle the subset of PL/SQL that I actually use at work. I should fix that.

Here it’s using both captures: and scope: — I thought it would have to be either one or the other. Unless scope: is equivalent to 0 capturing group. In any case, it would seem to make more sense to write the above as:

Yes, scope: is the same as a 0 capture. They should be completely identical and interchangeable. In a sense, scope: is just syntactic sugar.

So, what is actaully happening here? it seems that the actual contents within quotes are being double scoped as both string and variable.other.table.

You are correct. (The Oracle syntax is not correct.)

Is there a way to probe the syntax highlighter stack via ST console?

Alas, no.

0 Likes

#14

Scoping Verbs & Syntaxes

I’m again struggling with some Alan language syntax constructs which I can’t work out how to scope; namely, the VERB and SYNTAX constructs. These are used to create verbs that the player can type in the adventure game, and which do something in the game world, and to define their syntax.

The VERB BNF is:

verb =  ['META'] 'VERB' id {',' id}
            verb_body
        'END' 'VERB' [id] '.'

The SYNTAX BNF:

syntaxes = 'SYNTAX' {syntax}

syntax = id '=' {element} syntax_end

element = id
        | '(' id ')' [indicator]

Are these functions?

To me, verbs look like functions, but they don’t follow the usual syntax of functions or procedures in other languages. For one thing, verbs can be defined globally (only if they are intransitive), or inside classes and instances; they can even be added to existing instances via the ADD TO EVERY construct. An example of SYNTAX and VERB, where the verb is added to every instance at initialization time:

SYNTAX paint = paint (obj)
    WHERE obj ISA OBJECT
        ELSE "That's not something you can paint."

ADD TO EVERY OBJECT
    VERB paint
        DOES "You paint" SAY THE obj. "."
        -- (MAKE obj painted.)
        -- (DECREASE amount_left OF red_paint BY 2.)
    END VERB.
END ADD TO.

And here an example how to implement an instance-specific variant of an already existing verb (overriding/overloading it):

THE street ISA OBJECT AT town
    -- [... some definitions ...]
    VERB cross
       DOES "There's too much traffic."
    END VERB.
END THE street.

Maybe Function calls or Forward declarations?

But what perplexes me most is the SYNTAX construct, which defines its parameters and how the verb is constructed, allowing different phrasings of the same VERB — ie, syntaxes point to verbs, so they look like actual function calls, or maybe forward declarations?

Why not interfaces?

Syntaxes have strictly tied to how player’s input is linked to a specific verb (a verb’s identifier is usually also available to the player as a command, but not always: the presence of an _ in the verb ID will hide it from the the parsable player commands). From this angle, they resemble interfaces.

The problem is that parameters are defined in SYNTAX constructs, while in VERB blocks you can only hanlde them, but not define them. For example:

SYNTAX
    take = 'take' (obj)*.
    talk_about = 'talk' 'to' (act) 'about' (subj)!.

… defined the parameters (their number and names) which will be passed on to the VERB by the commands parser. So if the player types take bottle, then the verb take will be invoked, and bottle will be passed to it as parameter obj. It looks like both a function call and and interface to me.

Moreover, syntax statements can also have restriction to them:

SYNTAX
    take = 'take' (obj)
        WHERE obj IsA object
            ELSE "You can't take that."

So it looks like they are both part of the same function/procedure construct, whose definition is scattered in different places. I find it a bit puzzling when it comes to decide how to scope these.

Could they be function pointers?

Also, different syntaxes can point to the same verb, which is intended to allow the player to impart commands in different word ordering. Example to allow both give the pen to Bob and give Bob the pen:

SYNTAX
    give = 'give' (obj) 'to' (recip)
        WHERE obj ISA OBJECT
            -- [...more checks/restrictions...]
    give = give (recip) (obj).

… in this last example, the checks and restrictions can only be written for the first give syntax definition, and they’ll will also apply to all give variants/synonims. Which reminds me of function pointers!

The point here is that I’d definitely like verbs to be indexed, and possibly also to use the Goto Definition functionality with them, allowing to jump to the block that defines a verb. I haven’t actually worked out how ST handles Goto Definition, but I had the impression that this is a special built-in fucntionality that only applies to functions (ie: elements scoped as functions), is that correct?

When The Duck Test Isn’t Enough…

Again, the question of how to scope these syntax constructs boils down to be able to make use of them somehow (autocompletion, indexing, plugins, color schemes, etc.). Finding parallels with other languages is not easy for me, as they seem to overlap different constructs, and I have no idea how strict the criteria are when it comes to fitting common scopes.

I try to follow the Duck test here, assuming that:

If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

… except that to me these constructs seem to walk like a function, swim like a procedure call, and quack like an interface!

… I’m really lost in semantic translation here!!! :weary: :weary: :weary:

0 Likes

Syntax Scoping Help: Class vs Instance
#15

Just posting some ST3-specific official references here:

0 Likes

#16

Thanks!

I’ve been reading through these, but I don’t seem to be able to relate this language to the constructs of the more classical ones — which is actually the intention of the language, ie, to hide the complexity of programming in favor of a more English like syntax, aking to prose.

Also, having looked at most of the scopes in the documentation (ST and elsewhere), I’ve come to the conclusion that most of these scopes are based on the classical programming paradigms, and when facing unusual syntaxes it’s not easy to accomodate them to the more standard scopes (for example, *BNF like syntaxes, or DSLs).

0 Likes

#17

I haven’t read everything but there are two main features tied to scopes beyond syntax highlighting:

Jump to symbol and symbol list (ctrl+r)
For being able to jump to a symbol (Goto definition) you need to have tagged it as entity.name.xxx.
By default the symbol list contains every entity.name.class and entity.name.function from a file. If you use things like entity.name.verb I’m not sure they will appear in the symbol list which might get confusing for users.

For me verb and syntax are both kind of functions, so maybe I’d use the same scope for them. If you think they are different and you don’t have something else that also need an entity.name you can assign entity.name.class to the least frequent of the two. This is mostly for visual consideration because color scheme author often put more flashy colors for class since they are fewer of them than function and method.

But your language also seems to have objects (street, town, …) maybe those should be marked as classes.

0 Likes

#18

Thanks for the clarification on how ST hanldes natively the Goto Symbol and Symbol list, this was really important to me as it is one of the main goals.

I agree with what you write, both verbs and syntaxes look like functions. Maybe the best think is to scope them as function followed by either verb or syntax — eg: entity.name.function.verb.alan, and so on.

This way, the syntax will look good with most color schemes, behave as expected with ST functionality and plugins, and at the same time allow me to have scope selectors which can distinguish between the two. At least, my understanding is that any additional scope will not break the benefits of Goto Symbol Definition, etc.

But your language also seems to have objects (street, town, …) maybe those should be marked as classes.

Definitely, classes and instances are the foundation of this language — where instances are statically created at game initalization, and can’t be added after game has started. So I’ve implemented both entity.name.class and entity.name.instance (I’ve found some language which use instance scope.). The term “object” is avoided when speaking of the language because it is mainly used to refer to objects in the sense of “things” in game play, so to avoid confusion they are always called instances.

At the end of the day, the scope names are for internal use, so even if they don’t match the syntax naming convention found in the manual is not a problem, as long as these scopes serve a purpose in the editor (selective auto completion, color schemes, etc.). The problem is that I often can’t grasp what some constructs are — for example, “synonims” are some constants of sort, which define how commands typed by the player should be aliased to instances, verbs or syntaxes, so one side of their definition might be a pointer or variable (I’ve looked into many syntax definitions, but so far I didn’t seem to find any special scoping for pointers, they are just variables precede by an operator keyword in many syntaxes).

Ideally, I would like that in the end these various scopes have a meaningful relation between them, even if they are not a 100% match to the current language (ie: that functions don’t end up scoped as constants, or strings as classes, etc.). But this is not so easy in this case since some constructs can be seen as being different things at once.

0 Likes