Sublime Forum

Highlighting and definitions for prose (first plugin advice?)

#1

I’m thinking about using Sublime Text for writing fiction, and as a programmer, I really want to be able to treat character and place names (and other custom terminology for my world) the same way I treat function names – goto definition, find references, and colorize them in text, maybe even pop up information from the definition file for that thing.

I’ve never written a submlime plugin before, but it seems like this could be a reasonably straightforward process, but what a lot of things look like they rely on your text kind of being “code-like” in that there’s a reliable syntax I can define; whereas I would prefer to have a project config file and dynamically load definitions of character/place/lore and then colorize them dynamically in whatever text you’re looking at.

What kinds of plugin facilities should I be looking at for this use case, do you think? (i.e., how do I colorize text without using a static syntax? how to I mark regions as having a definition and also include a file and location for that definition?)

I may have missed the right kind of search terms, but I didn’t really find anything that sounded similar to this…

0 Likes

#2

Generally speaking, syntax highlighting tokens in files is the purview of syntax definitions in combination with a color scheme; the syntax definition lexes the file to determine it’s structure and apply a scope to it that describes what it is, and the color scheme has rules that apply colors based on scopes.

The view.add_regions() API allows you to mark up regions of text, but this applies background colors, boxes, underlines (in various styles) or overstrike to the text; application of foreground colors is not one of the things it does.

In the spirit of disclosure, it’s technically possible to make this API color the foreground color of text, but it requires a specially crafted color scheme in order to do it, which makes it a technique not suitable for general consumption as it forces a color scheme on someone. That may be less problematic if you’re constructing the plugin just for yourself.

In that case, you’d be looking at having to do your own parsing of the file manually (maybe as simple as just a view.find_all()) to find all of the things that look like characters or places to mark them up in this way.

One thing you definitely can’t do is apply “definitions” to a region of text via any API; that is strictly the bailiwick of a tmPreferences metadata file telling Sublime what scopes it should add to the symbol index, and scopes can only be applied by syntax definitions.

So for something like that, you would need to not only parse the current file, but also every file that might conceivably be a part of the project and keep track of the symbols and their definitions manually, and keep that up to date as files are modified (potentially problematic if something outside of Sublime modifies the file).

This latter part (parsing files to determine what bits of them are interesting) isn’t Sublime related per se; that would just be standard Python code for reading and analyzing the file. You can leverage the view.find() and view.find_all() API methods if you want to examine text in a file that happens to be open, or view.substr() to pull text out of the buffer to work with it, etc.

0 Likes

#3

EDIT: I found an example of enabling “Goto Definition” in Sublime Text. It definitely doesn’t seem to support tokens with spaces – or I’m doing the regex wrong, which is decidedly possible. If it’s possible, what should the regex be to identify multi-word tokens in a paragraph of words?

With this example, even single-word known-tokens aren’t being highlighted – is there a way to highlight only the known tokens? In the definition file I’ve made, I can get it to highlight via regex syntax, but I’d like words to just be words, and to highlight any found tokens…

0 Likes

#4

Your question is hard to follow because you edited the original and removed most of it, so your inline edit makes little sense without checking into the history of the post. It’s generally better to either tack stuff onto the end of a post if you’re editing, or just add another post in the thread so that the history is easier to track.

Something to keep in mind is that Sublime isn’t a compiler, so the only information it has about the construction of your files is based on inferring it’s meaning from the text in the file.

So for example, with a very simple syntax:

Syntax highlighting on the names of characters and their general descriptions happens inside of the special characters: section of the file because the rules in the syntax say that while in that section of the file, a construct like - name : description outlines a character. Hence, the syntax can recognize that rule and apply a scope, and the color scheme can color it.

Inside of this file, the appropriate metadata adds the characters as local symbols, so that if you were to open the symbol list while editing that file, you can jump directly to where they were originally defined:

image

If desired, you can adjust the metadata so that not only will the file appear in the local symbol list (i.e. only in the buffer that has the file open) but also in the global symbol list as well (the example you linked to was doing both of these things):

image

Here the symbols are visible in the global symbol list, which shows the names of the characters even from inside of the syntax definition file; choosing the item takes you to the same place as it does when inside of the file locally.

Spinning back to the first image, note that even though Bob Jones is syntax highlighted in the characters section, it’s not highlighted in the story at the bottom.

That’s because the syntax has no rule that allows it to recognize Bob Jones as being special at that location, so it’s just regular text like everything else. Since it’s regular text, it’s not syntax highlighted, and you can’t hover over it to attempt to get at the definition either.

This is an important distinction to make and highlights the inherent statelessness of the syntax definition system. Outside of the place where the name of the character is defined, there’s no way to determine that any other instance of the text Bob Jones means the same thing; there’s no state that tracks what happened the last time text was seen.

Keeping that kind of state is the job of something else, like the compiler if you were writing code or your brain if you’re writing text, because there’s not enough information to know what the text means without very specific context.

To achieve that, you’d need to have markup on things not only at the point where they’re defined, but also at the point where they’re used as well, to provide the same sort of hint. For example:

image

Now the syntax definition knows that any text that’s inside of back ticks is supposed to represent a character name, and it can provide the appropriate highlighting to it.

This combination of items technically provides all that you need to be able to look up the definition of things and jump to it. However there are still inherent issues with this, such as symbol lookups only working in source code files and not text, and symbols not being allowed to have spaces in them (which is something you already saw).

What that means is that you can only look up symbols with either manual intervention or some plugin work. For example, the built-in functionality works if you select the entire symbol first and then press the key:

image

To achieve the same results as the internal hover popups and key bindings do, you’d need to provide your own code that shows hover popups and goes to definitions that doesn’t follow the same rules (for reference, the code for the internal mechanism is stored in Default/symbol.py and can be viewed via View Package File from the command palette).

The examples here use the same syntax definition for everything; in theory there’s nothing stopping you from having multiple syntax definitions work together in this regard, so long as you’re the one that’s writing the code that looks up symbols.

3 Likes

#5

Thanks for the detailed writeup, and sorry for the bad edit – I had figured out how to get a lot of what I’d initially asked for and was trying NOT to ask for too much that I didn’t need any more; but it was apparently very late last night and I didn’t do that well.

The key point seems to be that, unless I want to be marking up all of my text, I will need to do pretty much everything via scripting; and that symbols with spaces aren’t really supported. I was able to get symbols with spaces to show up in separate files, but the “Goto Definition” command only worked with symbols that didn’t have spaces.

Shifting to the scripting aspect, I didn’t find anything in the API documentation that allowed me to, for instance, query all “found” symbols inside the given view? (I was hoping I could just get all found symbols and then underline them or something.)

I just woke up, though, so I’ll have a proper look at the Default/symbol.py file you mentioned. That seems like a really great starting point for implementing more of what I want. Cheers!

0 Likes

#6

No worries. :slight_smile: Probably a good starting point would be these (they’re not in the API documentation, but the plugin above uses them):

def function():
    pass

function()
>>> view.indexed_symbols()
[(Region(4, 12), 'function')]
>>> view.indexed_references()
[(Region(26, 34), 'function')]

Those work with a particular view, but there are also items for looking up symbols in the window in general as well. However in that case you can only look up information for symbols that you already know, since the list could be conceivably immense:

1 Like