Sublime Forum

Is view.classify buggy?

#1

Consider this text:

# I'm a comment


def foo():
    print('# No comment')

and this command:

class FooCommand(sublime_plugin.TextCommand):

    def class_flags(self, flags):
        CLASS_WORD_START = 1
        CLASS_WORD_END = 2
        CLASS_PUNCTUATION_START = 4
        CLASS_PUNCTUATION_END = 8
        CLASS_SUB_WORD_START = 16
        CLASS_SUB_WORD_END = 32
        CLASS_LINE_START = 64
        CLASS_LINE_END = 128
        CLASS_EMPTY_LINE = 256
        CLASS_MIDDLE_WORD = 512
        CLASS_WORD_START_WITH_PUNCTUATION = 1024
        CLASS_WORD_END_WITH_PUNCTUATION = 2048
        CLASS_OPENING_PARENTHESIS = 4096
        CLASS_CLOSING_PARENTHESIS = 8192

        res = []
        if flags & CLASS_WORD_START: res.append("CLASS_WORD_START")
        if flags & CLASS_WORD_END: res.append("CLASS_WORD_END")
        if flags & CLASS_PUNCTUATION_START: res.append("CLASS_PUNCTUATION_START")
        if flags & CLASS_PUNCTUATION_END: res.append("CLASS_PUNCTUATION_END")
        if flags & CLASS_SUB_WORD_START: res.append("CLASS_SUB_WORD_START")
        if flags & CLASS_SUB_WORD_END: res.append("CLASS_SUB_WORD_END")
        if flags & CLASS_LINE_START: res.append("CLASS_LINE_START")
        if flags & CLASS_LINE_END: res.append("CLASS_LINE_END")
        if flags & CLASS_EMPTY_LINE: res.append("CLASS_EMPTY_LINE")
        if flags & CLASS_MIDDLE_WORD: res.append("CLASS_MIDDLE_WORD")
        if flags & CLASS_WORD_START_WITH_PUNCTUATION: res.append("CLASS_WORD_START_WITH_PUNCTUATION")
        if flags & CLASS_WORD_END_WITH_PUNCTUATION: res.append("CLASS_WORD_END_WITH_PUNCTUATION")
        if flags & CLASS_OPENING_PARENTHESIS: res.append("CLASS_OPENING_PARENTHESIS")
        if flags & CLASS_CLOSING_PARENTHESIS: res.append("CLASS_CLOSING_PARENTHESIS")
        return " | ".join(reversed(res))

    def run(self, edit, block=False):
        for i in range(self.view.size()):
            l = "{}, {}, {}".format(
                i, repr(self.view.substr(i)), self.view.classify(i)
            )
            print("{:<30}{}".format(l, self.class_flags(self.view.classify(i))))

bind that command to a key and run it on the test view, you should get this output:

0, '#', 1092                  CLASS_WORD_START_WITH_PUNCTUATION | CLASS_LINE_START | CLASS_PUNCTUATION_START
1, ' ', 2056                  CLASS_WORD_END_WITH_PUNCTUATION | CLASS_PUNCTUATION_END
2, 'I', 49                    CLASS_SUB_WORD_END | CLASS_SUB_WORD_START | CLASS_WORD_START
3, "'", 6                     CLASS_PUNCTUATION_START | CLASS_WORD_END
4, 'm', 9                     CLASS_PUNCTUATION_END | CLASS_WORD_START
5, ' ', 2                     CLASS_WORD_END
6, 'a', 1                     CLASS_WORD_START
7, ' ', 2                     CLASS_WORD_END
8, 'c', 1                     CLASS_WORD_START
9, 'o', 512                   CLASS_MIDDLE_WORD
10, 'm', 512                  CLASS_MIDDLE_WORD
11, 'm', 512                  CLASS_MIDDLE_WORD
12, 'e', 512                  CLASS_MIDDLE_WORD
13, 'n', 512                  CLASS_MIDDLE_WORD
14, 't', 512                  CLASS_MIDDLE_WORD
15, '\n', 130                 CLASS_LINE_END | CLASS_WORD_END
16, '\n', 448                 CLASS_EMPTY_LINE | CLASS_LINE_END | CLASS_LINE_START
17, '\n', 448                 CLASS_EMPTY_LINE | CLASS_LINE_END | CLASS_LINE_START
18, 'd', 65                   CLASS_LINE_START | CLASS_WORD_START
19, 'e', 512                  CLASS_MIDDLE_WORD
20, 'f', 512                  CLASS_MIDDLE_WORD
21, ' ', 2                    CLASS_WORD_END
22, 'f', 1                    CLASS_WORD_START
23, 'o', 512                  CLASS_MIDDLE_WORD
24, 'o', 512                  CLASS_MIDDLE_WORD
25, '(', 4102                 CLASS_OPENING_PARENTHESIS | CLASS_PUNCTUATION_START | CLASS_WORD_END
26, ')', 0                    
27, ':', 8192                 CLASS_CLOSING_PARENTHESIS
28, '\n', 2184                CLASS_WORD_END_WITH_PUNCTUATION | CLASS_LINE_END | CLASS_PUNCTUATION_END
29, ' ', 64                   CLASS_LINE_START
30, ' ', 0                    
31, ' ', 0                    
32, ' ', 0                    
33, 'p', 1                    CLASS_WORD_START
34, 'r', 512                  CLASS_MIDDLE_WORD
35, 'i', 512                  CLASS_MIDDLE_WORD
36, 'n', 512                  CLASS_MIDDLE_WORD
37, 't', 512                  CLASS_MIDDLE_WORD
38, '(', 4102                 CLASS_OPENING_PARENTHESIS | CLASS_PUNCTUATION_START | CLASS_WORD_END
39, "'", 4096                 CLASS_OPENING_PARENTHESIS
40, '#', 0                    
41, ' ', 2056                 CLASS_WORD_END_WITH_PUNCTUATION | CLASS_PUNCTUATION_END
42, 'N', 49                   CLASS_SUB_WORD_END | CLASS_SUB_WORD_START | CLASS_WORD_START
43, 'o', 512                  CLASS_MIDDLE_WORD
44, ' ', 2                    CLASS_WORD_END
45, 'c', 1                    CLASS_WORD_START
46, 'o', 512                  CLASS_MIDDLE_WORD
47, 'm', 512                  CLASS_MIDDLE_WORD
48, 'm', 512                  CLASS_MIDDLE_WORD
49, 'e', 512                  CLASS_MIDDLE_WORD
50, 'n', 512                  CLASS_MIDDLE_WORD
51, 't', 512                  CLASS_MIDDLE_WORD
52, "'", 6                    CLASS_PUNCTUATION_START | CLASS_WORD_END
53, ')', 8192                 CLASS_CLOSING_PARENTHESIS
54, '\n', 10376               CLASS_CLOSING_PARENTHESIS | CLASS_WORD_END_WITH_PUNCTUATION | CLASS_LINE_END | CLASS_PUNCTUATION_END

Couple of questions:

  1. Why are you getting 0 at position 26, ie: 26, ')', 0? Is that a Sublime bug or intended behaviour?

  2. Sublime only exposes these constants in sublime.py:

    CLASS_WORD_START = 1
    CLASS_WORD_END = 2
    CLASS_PUNCTUATION_START = 4
    CLASS_PUNCTUATION_END = 8
    CLASS_SUB_WORD_START = 16
    CLASS_SUB_WORD_END = 32
    CLASS_LINE_START = 64
    CLASS_LINE_END = 128
    CLASS_EMPTY_LINE = 256

but there are clearly some constants missing as you can see values given by classify are quite higher… if we look at it in limetext there are these ones missing:

CLASS_MIDDLE_WORD
CLASS_WORD_START_WITH_PUNCTUATION
CLASS_WORD_END_WITH_PUNCTUATION
CLASS_OPENING_PARENTHESIS
CLASS_CLOSING_PARENTHESIS

so the question is, why are not these constants living in sublime.py?

Could anyone clarify? I was comparing the behaviour between limetext’s and sublime and both behave pretty similarly but there are subtle differences in some corner cases :confused: . Like for example, limetext’s routine in position 26 will give:

26, ')', 12288                CLASS_CLOSING_PARENTHESIS | CLASS_OPENING_PARENTHESIS

while in Sublime report 0, as already explained… which one is the correct?

2 Likes

#2

Here’s a little command that compares my own classify with Sublime’s classify:

import re
import textwrap

import sublime
import sublime_plugin
from sublime import Region

CLASS_WORD_START = 1
CLASS_WORD_END = 2
CLASS_PUNCTUATION_START = 4
CLASS_PUNCTUATION_END = 8
CLASS_SUB_WORD_START = 16
CLASS_SUB_WORD_END = 32
CLASS_LINE_START = 64
CLASS_LINE_END = 128
CLASS_EMPTY_LINE = 256
CLASS_MIDDLE_WORD = 512
CLASS_WORD_START_WITH_PUNCTUATION = 1024
CLASS_WORD_END_WITH_PUNCTUATION = 2048
CLASS_OPENING_PARENTHESIS = 4096
CLASS_CLOSING_PARENTHESIS = 8192


class PythonVsSublimeCommand(sublime_plugin.TextCommand):

    def class_flags(self, flags):
        res = []
        if flags & CLASS_WORD_START:
            res.append("CLASS_WORD_START")
        if flags & CLASS_WORD_END:
            res.append("CLASS_WORD_END")
        if flags & CLASS_PUNCTUATION_START:
            res.append("CLASS_PUNCTUATION_START")
        if flags & CLASS_PUNCTUATION_END:
            res.append("CLASS_PUNCTUATION_END")
        if flags & CLASS_SUB_WORD_START:
            res.append("CLASS_SUB_WORD_START")
        if flags & CLASS_SUB_WORD_END:
            res.append("CLASS_SUB_WORD_END")
        if flags & CLASS_LINE_START:
            res.append("CLASS_LINE_START")
        if flags & CLASS_LINE_END:
            res.append("CLASS_LINE_END")
        if flags & CLASS_EMPTY_LINE:
            res.append("CLASS_EMPTY_LINE")
        if flags & CLASS_MIDDLE_WORD:
            res.append("CLASS_MIDDLE_WORD")
        if flags & CLASS_WORD_START_WITH_PUNCTUATION:
            res.append("CLASS_WORD_START_WITH_PUNCTUATION")
        if flags & CLASS_WORD_END_WITH_PUNCTUATION:
            res.append("CLASS_WORD_END_WITH_PUNCTUATION")
        if flags & CLASS_OPENING_PARENTHESIS:
            res.append("CLASS_OPENING_PARENTHESIS")
        if flags & CLASS_CLOSING_PARENTHESIS:
            res.append("CLASS_CLOSING_PARENTHESIS")
        return " | ".join(reversed(res))

    def classify(self, point):
        # Classifies point, returning a bitwise OR of zero or more of defined flags
        #
        # Note: This should be taken from word_separator settings
        view = self.view

        ws = r"[-[\]!\"#$%&'()*+,./:;<=>?@\\^`{|}~]"
        res = 0
        a, b = "", ""

        if point > 0:
            a = view.substr(Region(point - 1, point))

        if point < view.size():
            b = view.substr(Region(point, point + 1))

        # Out of range
        if view.size() == 0 or point < 0 or point > view.size():
            return 3520

        # If before and after the point are separators return 0
        p = re.compile(ws)
        if a == b and p.match(a):
            return 0

        # SubWord start & end
        p = re.compile("[A-Z]")
        if p.match(b) and not p.match(a):
            res |= CLASS_SUB_WORD_START
            res |= CLASS_SUB_WORD_END

        if a == "_" and b != "_":
            res |= CLASS_SUB_WORD_START

        if b == "_" and a != "_":
            res |= CLASS_SUB_WORD_END

        # Punc start & end
        p = re.compile(ws)

        # Why ws != ""? See https:#github.com/limetext/rubex/issues/2
        if ((p.match(b) and ws != "") or b == "") and not (p.match(a) and ws != ""):
            res |= CLASS_PUNCTUATION_START
        if ((p.match(a) and ws != "") or a == "") and not (p.match(b) and ws != ""):
            res |= CLASS_PUNCTUATION_END

        # Word start & end
        re1 = re.compile(r"\w")
        re2 = re.compile(r"\s")

        if re1.match(b) and ((p.match(a) and ws != "") or re2.match(a) or a == ""):
            res |= CLASS_WORD_START
        if re1.match(a) and ((p.match(b) and ws != "") or re2.match(b) or b == ""):
            res |= CLASS_WORD_END

        # Line start & end
        if a == "\n" or a == "":
            res |= CLASS_LINE_START
        if b == "\n" or b == "":
            res |= CLASS_LINE_END
            if ws == "":
                res |= CLASS_WORD_END

        # Empty line
        if (a == "\n" and b == "\n") or (a == "" and b == ""):
            res |= CLASS_EMPTY_LINE

        # Middle word
        p = re.compile(r"\w")
        if p.match(a) and p.match(b):
            res |= CLASS_MIDDLE_WORD

        # Word start & end with punc
        p = re.compile(r"\s")
        if (res & CLASS_PUNCTUATION_START != 0) and (p.match(a) or a == ""):
            res |= CLASS_WORD_START_WITH_PUNCTUATION
        if (res & CLASS_PUNCTUATION_END != 0) and (p.match(b) or b == ""):
            res |= CLASS_WORD_END_WITH_PUNCTUATION

        # Openning & closing parentheses
        p = re.compile(r"[[({]")
        if p.match(a) or p.match(b):
            res |= CLASS_OPENING_PARENTHESIS

        # print(res)

        p = re.compile(r"[)\]}]")
        if p.match(a) or p.match(b):
            res |= CLASS_CLOSING_PARENTHESIS

        # TODO: isn't this a bug? what's the relation between
        # ',' and parentheses
        if a == ",":
            res |= CLASS_OPENING_PARENTHESIS
        if b == ",":
            res |= CLASS_CLOSING_PARENTHESIS

        return res

    def run(self, edit, block=False):
        self.view.sel().clear()

        for i in range(self.view.size()):
            c1 = self.classify(i)
            c2 = self.view.classify(i)
            if c1 != c2:
                print("Mismatch position {} - {} => {}/{} vs {}/{}".format(
                    i, self.view.substr(i),
                    c1, self.class_flags(c1),
                    c2, self.class_flags(c2),
                ))
                self.view.sel().add(Region(i, i + 1))

Just assign this command to a key binding and spawn it on any view and you’ll see both are not identical and there are few differences… btw, my version has been transpiled from here.

So, calling to the smart ST hardcoders here :slight_smile: . Would you be able to tell me how to adjust my version so it’ll match completely Sublime’s?

@wbond Plz… some hints here would be a real timesaver… come on! :wink:

Ps. Once the routine becomes fully correct I’ll optimize it (ie: regex compilation should be done only 1, outside the routine) and compare it with Sublime’s… I’d like to know how much faster Sublime’s will be, although again, re module shouldn’t be that bad.

0 Likes

#3

I wanted to let you know I’ve seen your question, but I haven’t worked on this part of the codebase before. Right now documenting all the edge cases of such a portion of ST functionality is pretty low behind lots of other things.

0 Likes

#4

@wbond: I understand, thanks to let me know, one last questions though:

  • Would be possible to post the source of these particular functions here or in the SublimeCoreIssues? I’d asked something similar here but I haven’t received any answer so far… Could you please confirm me is not so I won’t ask about it again?
  • So if you haven’t worked on that particular routine I guess you can’t say whether those positions returning 0 are intended behaviour or just a bug, right?

Btw, I have already made clear in some other threads I respect (and favour in many cases) software being closed-source but let me ask you, do you think posting the source code of these tiny routines would make any harm to the business at all? I’d read somewhere else ST codebase was ~70k LOC (~feb2009) so assuming these routines are ~200loc (~0.3%) of code, would it make any difference to post their source? :slight_smile: . In fact, by doing so you’d get people confirming if that’s a bug or intended behaviour and in the best case scenario people creating extra documentation about it… :wink:

0 Likes

#5

I second @BPL’s comment. Obviously, if there’s a serious IP concern over a piece of code, then that’s one thing. But if it could be done, then showing a small handful of these core utility functions would be extremely helpful. It can be very difficult to understand the behavior of even simple functions like score_selector, but the approach that has always worked best for me is to imagine what the original C++ would have looked like.

I understand that it’s not practical for the Sublime devs to spend a lot of time going over these functions to document the corner cases. Fortunately, there are users who are not only willing to do that, but are already trying to do it the hard way.

I’m very skeptical of the notion of open sourcing Sublime, even from a purely technical perspective (leaving aside the impracticality from a business perspective). Just posting a big pile of code doesn’t mean that anyone is going to expend the tremendous effort to learn the code base and contribute in a significant fashion. I mention this because I think that posting the code of a few utility functions is not at all like that: it’s virtually guaranteed that some members of the community will turn those snippets into something useful.

Again, it may be that Sublime HQ doesn’t feel that they can release any internal code for IP reasons. I would like for this not to be the case, but if it is, then it is. But from a time/benefit perspective, I think that there are few better ways to help out the ecosystem.

1 Like

#6

Thanks Thom! your comment has expressed perfectly word by word what I do think myself. Unfortunately I haven’t been able to express those thoughts so efftively on related threads regarding ST source code et al, so yeah, definitely your comment really helps to give proper context to my request.

I understand that it’s not practical for the Sublime devs to spend a lot of time going over these functions to document the corner cases. Fortunately, there are users who are not only willing to do that, but are already trying to do it the hard way.

Exactly :confused: , some functions are really easy and intuitive to fully understand but other ones are extremely hard to reverse engineer, which is actually frustrating cos once you’ve fully understood the whole logic (including edge cases) you realize the function(s) were “trivial”. For instance, I’ve spent almost 1 full day trying to fully understand indented_region and I’m pretty much sure the body of that function is extremely simple… just to say, the type of function you could code in 10min in python if you were fully aware about the underlying algorithm.

Anyway, just to make it clear again because I don’t want to be pushy (or annoying) on any way about asking for releasing these little routines to the masses, if the team considers such as action could harm in any way the business, that’s it. Just let us know about it and as I said before, I won’t ever ask for it again :slight_smile:

After all, my clear wish is to continue using SublimeText for the rest of my days as a coder so if some action/decission could harm (or not being benefitial) for the team/community/product obviously should be kept of the table.

0 Likes

#7

Ok, after waiting few weeks and nobody from SublimeHQ answered at all my requests about whether it’d be possible to see the source of these little functions such as extract_scope or classify I’ll take that silence as an implicit “No, we won’t ever share the source code of our sacred well-crafted routines” :slight_smile:

Just one note, there is nothing to be ashamed if you don’t want to share your source code at all, it’s not taboo to state clearly you favour close source and you won’t become less cooler. On the other hand, I think it’s not nice ignoring questions from people, in my case a simple “No, it’s not possible” answer would have been more than good enough and it’d have taken you just 5seconds :wink:

0 Likes

#8

Sublime HQ is not currently planning on trying to extract various portions of code from its codebase for public consumption in place of documentation.

When this topic gets into my work queue, I’ll look at things and can produce some documentation. In the meantime just ensure there is an issue for it, and that way it won’t be lost in the forum activity, as your question from last week did.

1 Like

Is there no official command list with documentation of parameters?
#9

Yay! :slight_smile:

Thanks for the confirmation and let me know about your position! That was all I wanted to know :wink: , now you won’t see me insist or asking again about the codebase source code anymore.

On the other hand, yeah… each time I find places where adding/improving the documentation could help to plugin developers I’ll open an issue as I’ve already done couple of times.

0 Likes