Sublime Forum

Problem with clear_scopes and BracketHighlighter

#1

Problem in Bash

Consider the following bash code.

echo " 123 `  echo hello  ` 456  " # this echos: 123 hello 456
  • Expected highlight result:
  • Possible (current) highlight result:

Without clear_scopes

Things inside `...` would be colored yellow because they are under the scope of string.quoted.double.

With clear_scopes

If I apply clear_scopes: true on `...` with the following code,

interpolation_backtick:
  - match: '`'
    scope: punctuation.definition.string.begin.shell
    push:
      - meta_scope: source.shell
      - clear_scopes: true
      - match: '`'
        scope: punctuation.definition.string.end.shell
        pop: true
      - ... # some other codes

the highlight result is the expected one, which looks good to me. But this would make the scope string.quoted.double discontinued hence BracketHighlighter would no longer work with matching the double quotes in this case.

Is it possible to make both `...` and "..." working with BracketHighlighter while they are nested with each other and get the expected highlight as well?

Similar problem in PHP

<a href="http://<?= $host ?>">

If clear_scopes: true is applied on the php short tag, the scope string.quoted.double would be discontinued too.If not applying that, then things inside php short tag may be colored yellow because they are in the string scope. Correct?

2 Likes

#2

I’m not familiar with how BracketHighlighter works, but it sounds like the concept of clear_scopes is not going to play well with it.

It is unfortunate since I think we are going to increase the usage of clear_scopes in the default syntaxes to make embedding of code in strings nicer.

1 Like

#3

There seems to be some problems in matching “quotes” with simple text matching if the begin/end symbols are the same. For example, double quotes, single quotes and backticks (in bash). So BracketHighlighter introduced some scope matching things.

Here are some comments in BracketHighlighter’s settings.

    // Rules that define the finding and matching of brackets
    // that are contained in a common scope.
    // Useful for bracket pairs that are the same but
    // share a common scope.  Brackets are found by
    // Finding the extent of the scope and using regex
    // to look at the beginning and end to identify bracket.
    // Use only if they cannot be targeted with traditional bracket
    // rules.
    "scope_brackets": [
        {
            "name": "double_quote",
            "open": "(\")",
            "close": "(\")",
            "style": "double_quote",
            "scopes": ["string", "string.quoted"],
            "language_filter": "blacklist",
            "language_list": ["Plain text", "Hex"],
            "sub_bracket_search": "true",
            "enabled": true
        },
        ...
    ],

I know and understand that not everyone uses BracketHighlighter but there would be quite lots of people using it because it is a top-25 plugin.

0 Likes

#4

Well, with the default syntaxes we should be using string.quoted.double.begin and string.quoted.double.end, and variations like that for single quotes and backticks. Perhaps BH could be updated to scan for a matching end in certain whitelisted syntaxes?

1 Like

#5

@facelessuser Would you please take a look on this?

0 Likes

#6

@jfcherng, Can you explain better what clear scopes does?

I will explain roughly how things like scopes get matched in BracketHighlighter (BH).

  1. It is difficult to know if a " is the start of of a string or the end, especially since BH searches a sliding window of the whole view. Things like { are easy because it is clearly different from }, so things like pure regex can be used with maybe some exclude scopes. So quotes, which have no clear difference between the opening symbol and the closing symbol don’t bother searching with pure regex, they use scope matching.

  2. Scope matching is performed by looking at the scope under the cursor and determining if we have a bracket rule associated with that scope. If the scope is one of interest, we will get it’s full extent and then apply a regex to the start and end of that scope to see if the respective opening and closing is symbol is found. Now using the context of scope we know that the two symbols we found are the opening and closing of the string.

If forced to using something like regex exclusively, this is too difficult for BH as it can be difficult to tell if when we find a " if it is a start or end. You can add some further logic by looking at whether or not a space or comma or something precedes or follows the quote to give some usage context to determine if it is an opening or closing, but this becomes less reliable. The most reliable method then is to start parsing the file from the very beginning to ensure we know when we get the start and end bracket, but we also have to add additional logic to BH to then be aware and count every other un-escaped quote as either opening and closing. Also, you will suffer a large performance hit as you have to start at the beginning of the file instead of using a sliding window.

I am assuming that this clear_scope feature is breaking the string scope. If that is the case, this clearly breaks the algorithm for finding opening and closing symbols for things like quotes. I would have to come up with a way to add additional logic to crawl out from and grap adjacent scopes and then use some algorithm to determine if they are linked. This sounds like more work than I am probably willing to put in as it will also impact performance.

It might be possible to come up with a way to handle this, and still keep performance reasonable, but it would be a major change to the current algorithm. I assume I would need to search for symbols with regex at first and then apply more some advanced scope context logic to determine if the quote is appearing at the start or end of the scope fragment etc. This is just me thinking out loud, and I am not sure when I would even have time to implement something like this assuming I understand the cause of the issue listed in this thread.

I don’t know, I’d like to understand this problem more before I give an official statement how feasible mitigation from my side would be. And even if mitigation is possible, I have to find time to experiment with it and evaluate performance etc.

0 Likes

#7

clear_scopes is documented at http://www.sublimetext.com/docs/3/syntax.html. Basically, it allows removing scopes from the stack, without popping to a parent context. When dealing with embedded source code, we are starting to remove the string scope since it makes the embedded code hard to read since basic identifiers are now highlighted as strings even though they aren’t.

1 Like

#8

Why would we want to remove the string scope? What if we have legitimate reasons to target strings (cases outside of BH)? This seems like a weird move. How will we be able to target or avoid strings?

0 Likes

#9

Because of code like:

foo = "This is my #{num_foo + num_bar} try!"

num_foo and num_bar will have the string scope even though they aren’t strings. They are just plain variables, and don’t get a specific scope according to the Ruby syntax. By adding clear_scopes: 2 in the context pushed by #{ we now get “correct” scopes for everything inside of #{ and }.

0 Likes

#10

Maybe some sort of meta.string scope could be added in such cases so that it would still be possible to tell from the scopes that those variable names were contained in a string, without it affecting styling?

1 Like

#11

I think that sounds like a good idea.

0 Likes

#12

num_foo and num_bar will have the string scope even though they aren’t strings.

It is, but it also isn’t. It is still contained within the string scope, but I understand what you are saying. I guess what I want to know is what kind of cases would this be applied? Are removing the string scope in HTML attributes because there is JavaScript code in them? To be honest, they are 100% strings, but yes they will be passed to be interpreted as JavaScript at some point.

I would argue this is still a bad direction. While I understand what you are trying to solve, at the very least, the full string should still be scoped with a meta.string or something. Certain plugins may not want to evaluate content in comments. But they also might have reasons to avoid content in strings.

If syntax files scoped the start quote and end quote with something like quote.start and quote.end (and other like syntax follow something similar), BH could probably work around this quite easily. But my big concern is outside of BH.


While I was typing this up, it appears others have also mentioned meta.string, this would be useful.

1 Like

#13

For example, just this morning: https://github.com/sublimehq/Packages/issues/643

0 Likes

#14

Right, so code that is trying to build functionality on understanding the code should probably utilize meta scopes, whereas we recommend color schemes not color them. I will put this high on the priority list to add meta.string on the default syntaxes, especially when using clear_scopes for better embedded code handling.

4 Likes

#15

Yes, I understand the problem. I guess I’ve never found it to be that big a deal. Ultimately I am not against changing how things are scoped, but I still would want the entire string wrapped in some kind of string scope container. meta is completely fine. I’m not against change, I just don’t want useful things to be discarded either.

0 Likes

#16

Is meta.string a special case in ST? Before clear_scopes is invented, I try to add an extra meta_content_scope meta.reset.color to make the color goes to default (usually white for a dark theme) but doing this eventually messed up my color scheme.

Consider the following bash code.

echo " 0   `   echo    $(    echo hello    )   ` 1 " # this echos: 0   hello 1

I would like to reset the color (or say, scope if we use clear_scope) at positions of those arrows

After that, the scope of hello becomes quite a mess and so does my color scheme. And still, hello may be colored as a string depending on the score of score_selector(scope, selector), which I thought its algorithm is counter-intuitive.


References:

As FichteFoll said, the appearance of clear_scopes resolves this problem, but now, here says adding an additional scope like meta.string resolves this problem for not using clear_scopes.

This post explains why I think the algorithm of score_selector(scope, selector) is weird.

People can find information about score_selector(scope, selector) if they do not know it.

0 Likes

#17

No, meta.string isn’t special in any way. But meta scopes are not intended for applying color, but for understanding the structure of code. By removing string. and adding meta.string, color schemes won’t color the code as a string, but a plugin like BH can still determine where a string starts and ends.

0 Likes

#18

I am afraid that by doing so, either of the following situation, which is wrong, would happen.

echo       "           '         '         "
#                            ^ cursor is here
#          ^                               ^      BH matchs


echo       '           "         "         '
#                            ^ cursor is here
#          ^                               ^      BH matchs
0 Likes

#19

The scope will be stacked, so BH would need to count the number of meta.string scope on the stack and when that number changes, it has either entered or exited a string.

1 Like

#20

I am neither the author of BH nor an expert of ST. If you think this issue has been theoretically solved, I would waiting for further changes on either BH or ST then. Thank you guys for discussing this. :slightly_smiling:

0 Likes