Sublime Forum

How does extract_scope works?

#1

I’ve been trying to figure out for days how extract_scope works behind the curtains but I can produly say I’ve failed miserably, I’ve prepared a puzzle in case any smart guy from the forums wanted to enlight us:

import os

table = [
    [0, '#',    (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python punctuation.definition.comment.python "],
    [1, ' ',    (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [2, 'I',    (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [3, "'",    (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [4, 'm',    (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [5, ' ',    (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [6, 'a',    (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [7, ' ',    (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [8, 'c',    (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [9, 'o',    (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [10, 'm',   (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [11, 'm',   (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [12, 'e',   (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [13, 'n',   (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [14, 't',   (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [15, '\n',  (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python "],
    [16, '\n',  (0, 54),        "# I'm a comment\n\ndef foo():\n    print('# No comment')\n",    "source.python "],
    [17, 'd',   (17, 24),       'def foo',                                                       "source.python meta.function.python storage.type.function.python "],
    [18, 'e',   (17, 24),       'def foo',                                                       "source.python meta.function.python storage.type.function.python "],
    [19, 'f',   (17, 24),       'def foo',                                                       "source.python meta.function.python storage.type.function.python "],
    [20, ' ',   (17, 24),       'def foo',                                                       "source.python meta.function.python "],
    [21, 'f',   (17, 24),       'def foo',                                                       "source.python meta.function.python entity.name.function.python meta.generic-name.python "],
    [22, 'o',   (17, 24),       'def foo',                                                       "source.python meta.function.python entity.name.function.python meta.generic-name.python "],
    [23, 'o',   (17, 24),       'def foo',                                                       "source.python meta.function.python entity.name.function.python meta.generic-name.python "],
    [24, '(',   (24, 26),       '()',                                                            "source.python meta.function.parameters.python punctuation.section.parameters.begin.python "],
    [25, ')',   (24, 26),       '()',                                                            "source.python meta.function.parameters.python punctuation.section.parameters.end.python "],
    [26, ':',   (25, 27),       '):',                                                            "source.python meta.function.python punctuation.section.function.begin.python "],
    [27, '\n',  (0, 54),        "# I'm a comment\n\ndef foo():\n    print('# No comment')\n",    "source.python "],
    [28, ' ',   (0, 54),        "# I'm a comment\n\ndef foo():\n    print('# No comment')\n",    "source.python "],
    [29, ' ',   (0, 54),        "# I'm a comment\n\ndef foo():\n    print('# No comment')\n",    "source.python "],
    [30, ' ',   (0, 54),        "# I'm a comment\n\ndef foo():\n    print('# No comment')\n",    "source.python "],
    [31, ' ',   (0, 54),        "# I'm a comment\n\ndef foo():\n    print('# No comment')\n",    "source.python "],
    [32, 'p',   (32, 38),       'print(',                                                        "source.python meta.function-call.python meta.qualified-name.python support.function.builtin.python "],
    [33, 'r',   (32, 38),       'print(',                                                        "source.python meta.function-call.python meta.qualified-name.python support.function.builtin.python "],
    [34, 'i',   (32, 38),       'print(',                                                        "source.python meta.function-call.python meta.qualified-name.python support.function.builtin.python "],
    [35, 'n',   (32, 38),       'print(',                                                        "source.python meta.function-call.python meta.qualified-name.python support.function.builtin.python "],
    [36, 't',   (32, 38),       'print(',                                                        "source.python meta.function-call.python meta.qualified-name.python support.function.builtin.python "],
    [37, '(',   (37, 38),       '(',                                                             "source.python meta.function-call.python punctuation.section.arguments.begin.python "],
    [38, "'",   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python punctuation.definition.string.begin.python "],
    [39, '#',   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python "],
    [40, ' ',   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python "],
    [41, 'N',   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python "],
    [42, 'o',   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python "],
    [43, ' ',   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python "],
    [44, 'c',   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python "],
    [45, 'o',   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python "],
    [46, 'm',   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python "],
    [47, 'm',   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python "],
    [48, 'e',   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python "],
    [49, 'n',   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python "],
    [50, 't',   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python "],
    [51, "'",   (38, 52),       "'# No comment'",                                                "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python punctuation.definition.string.end.python "],
    [52, ')',   (51, 53),       "')",                                                            "source.python meta.function-call.python punctuation.section.arguments.end.python "],
    [53, '\n',  (0, 54),        "# I'm a comment\n\ndef foo():\n    print('# No comment')\n",    "source.python "],
]

def extract_scope(scopes, i):
    length = len(scopes)
    base = scopes[i]

    # a
    b = i
    for s2 in scopes[i + 1::1]:
        s1 = min([base, s2], key=len)
        s2 = max([base, s2], key=len)
        if not s2.startswith(s1):
            break
        b += 1

    # b
    a = i
    for s2 in scopes[i - 1::-1]:
        s1 = min([base, s2], key=len)
        s2 = max([base, s2], key=len)
        if not s2.startswith(s1):
            break
        a -= 1

    return (max(0, a), min(b, length))


def extract_regions(data):
    lst = [list(map(operator.itemgetter(1), g)) for k, g in groupby(enumerate(data), lambda v:v[0] - v[1])]
    return [(v[0], v[-1]) for v in lst]


def test():
    from collections import defaultdict
    import json
    import operator
    from itertools import groupby

    scopes = [v[4] for v in table]
    scopes_positions = defaultdict(list)

    for i, v in enumerate(table):
        scopes_positions[v[4]].append(i)

    regions = {}
    lst = []
    for k, v in scopes_positions.items():
        for r in extract_regions(v):
            regions[r] = k
            lst.append(r)

    lst = sorted(lst)
    for v in lst:
        print(v, regions[v])


if __name__ == '__main__':
    scopes = [v[4] for v in table]

    for i, v in enumerate(table):
        r1 = v[2]
        r2 = extract_scope(scopes, i)
        print("{:<5}{:<10}{:<10}{}".format(
            i, f"({r1[0]},{r1[1]})", f"({r2[0]},{r2[1]})", "OK" if r1 == r2 else "FAILED"
        ))

The goal of this puzzle is to tweak extract_scope function so it’ll get a similar behaviour than sublime’s original one.

Thanks in advance :smiley:

Ps. If you wonder how I’ve got the original table data, I’ve used this command:

class TestScopeCommand(sublime_plugin.TextCommand):

    def run(self, edit, block=False):
        print('-' * 80)

        view = self.view
        for i in range(view.size()):
            a = i
            b = repr(view.substr(i))
            c = view.extract_scope(i)
            d = repr(view.substr(view.extract_scope(i)))
            e = view.scope_name(i)
            print("{:<5}{:<5}{:<10}{:<65}{}".format(a,b,c,d,e))

on top of this file:

# I'm a comment

def foo():
    print('# No comment')
0 Likes

Open Source
#2
0 Likes

#3

Thanks for the link, although whether the existing extract_scope gives or not reliable results doesn’t matter too much really, this function is used behind the curtains by the amazing toogle_comment.py command, i say amazing cos that command works perfectly well in 99.9999% of times… so I’d like to mimick it on some Scintilla widget if possible. So the first step is trying to get a similar extract_scope algorithm… :slight_smile: . Problem is I’m too dumb to reverse engineer the logic of it by myself hehe :smiley:

0 Likes

#4

for the purposes of determining the extent of a comment, presumably just check the scope_name (if Scintilla even has such a notion, being far removed from TextMate grammars) of each point to the left and right of the caret pos and keep going until it doesn’t match the comment selector. Note that this could select multiple line comments together if there is no gap between them.
Perhaps for other cases, if the points either side of the given point/caret pos don’t match the scope at that point, try removing a level of specificity and try again. i.e. source.python meta.function.parens could become source.python meta.function etc

0 Likes

#5

Sure, although that’s not the point of my question :slight_smile: , I’d know how to use selectors to extract comments out from the text but in this particular case I’m really interested to understand how sublime’s extract_scope works behind the curtains, that’s all. It feels to me the algorithm must be something really trivial once you understand the logic behind.

Consider you’ve computed all scopes for each one of the text positions… let’s say we’ve got that type of information already, in that case, what’d the concept of region?

Let me put you some examples, take a look the table in my posted code at the beginning of the thread and consider this case:

[17, 'd',   (17, 24),       'def foo',                                                       "source.python meta.function.python storage.type.function.python "],

Why is the position 17 giving you (17, 24) and not… let’s say (17, 27)… it seems it’s not using os.path.commonprefix at all but considering another criterions… i think the key here is (once again) to understand what’s the concept of region

0 Likes

#6

I’m not sure what a method from the Python path module has to do with anything… Do you know how scope selectors work?
http://www.sublimetext.com/docs/3/selectors.html

1 Like

#7

Not very much, first time I see that part of Sublime docs, that’s helpful, gonna read carefully those docs to make sure I fully understand all the important concepts. To be honest, my only experience with scopes&selectors is thanks to watching one really educative video from Odatnurd, it gives a very nice introduction to scopes (as a casual user, not as a coder though).

Anyway, let’s assume for the sake of this thread I’ve already understood the concepts behind selectors… what now? do you know how extract_scope(pt_or_region) algorithm works?

0 Likes

#8

Ok, after reading the Sublime docs of both scope & selectors I must say the whole subject has become more or less clear. Talking about score selectors I’ve been messing around a little bit:

class TestScopeCommand(sublime_plugin.TextCommand):
    def run(self, edit, scope_must_match = False):
        lst = [
            "source.python comment.line.number-sign.python punctuation.definition.comment.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python comment.line.number-sign.python ",
            "source.python ",
            "source.python meta.function.python storage.type.function.python ",
            "source.python meta.function.python storage.type.function.python ",
            "source.python meta.function.python storage.type.function.python ",
            "source.python meta.function.python ",
            "source.python meta.function.python entity.name.function.python meta.generic-name.python ",
            "source.python meta.function.python entity.name.function.python meta.generic-name.python ",
            "source.python meta.function.python entity.name.function.python meta.generic-name.python ",
            "source.python meta.function.parameters.python punctuation.section.parameters.begin.python ",
            "source.python meta.function.parameters.python punctuation.section.parameters.end.python ",
            "source.python meta.function.python punctuation.section.function.begin.python ",
            "source.python ",
            "source.python ",
            "source.python ",
            "source.python ",
            "source.python ",
            "source.python meta.function-call.python meta.qualified-name.python support.function.builtin.python ",
            "source.python meta.function-call.python meta.qualified-name.python support.function.builtin.python ",
            "source.python meta.function-call.python meta.qualified-name.python support.function.builtin.python ",
            "source.python meta.function-call.python meta.qualified-name.python support.function.builtin.python ",
            "source.python meta.function-call.python meta.qualified-name.python support.function.builtin.python ",
            "source.python meta.function-call.python punctuation.section.arguments.begin.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python punctuation.definition.string.begin.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python ",
            "source.python meta.function-call.python meta.function-call.arguments.python meta.string.python string.quoted.single.python punctuation.definition.string.end.python ",
            "source.python meta.function-call.python punctuation.section.arguments.end.python ",
            "source.python ",
        ]

        lines = []
        st = set(lst)
        scores = []
        for scope in st:
            for selector in st:
                score = sublime.score_selector(scope, selector)
                lines.append('sublime.score_selector("{}", "{}") = {}'.format(
                    scope, selector, score
                ))
                scores.append(score)

        print(sorted(list(set(scores))))

        # 0 =       00000000000000000
        # 16 =      10000000000000000
        # 208 =     11010000000000000
        # 272 =     10001000000000000
        # 2256 =    10001101000000000
        # 2320 =    10010001000000000
        # 2768 =    10101101000000000
        # 2832 =    10110001000000000
        # 14544 =   11100011010000000
        # 18128 =   10001101101000000
        # 145616 =  10001110001101000
        # 1456336 = 10110001110001101

        for i,l in enumerate(sorted(lines)):
            print("{:<5}{}".format(i,l))

Anyway, let’s consider score_selector for a moment… would you be able to explain me how extract_scope works if we assume that function is used behind the curtains? I’m asking cos if that’s so it definitely doesn’t make any sense to me… For a moment I thought maybe extract_scope would iterate to the left&right until the score_selector(scope_at_caret_pos, scope_at_next_pos)!=0 but that’s definitely not, consider for instance position 0:

[0, '#',    (0, 16),        "# I'm a comment\n",                                             "source.python comment.line.number-sign.python punctuation.definition.comment.python "],

If you make:

>>> sublime.score_selector("source.python comment.line.number-sign.python punctuation.definition.comment.python ", "source.python comment.line.number-sign.python ")
272

but if you make:

>>> sublime.score_selector("source.python comment.line.number-sign.python ", "source.python comment.line.number-sign.python punctuation.definition.comment.python ")
0

So the results provided by extract_scope at positions 0…15 considering the above results doesn’t make any sense at all (at least to me) :confused:

Ideas?

0 Likes

#9

Btw, if you use this:

class TestScopeCommand(sublime_plugin.TextCommand):
    def run(self, edit, scope_must_match = False):
        view = self.view
        sel = view.sel()
        scope_name = None

        (row, col) = view.rowcol(sel[0].begin())
        point = view.text_point(row, col)
        sel.add(view.extract_scope(point))

You soon realize when extracting scopes from single positions the behaviour is +/- intuitive (even if I can’t explain how it works) but as soon as you make a selection that covers different scope “regions” the behaviour feels reall weird (buggy)

Here’s a little example:

showcase

0 Likes