Sublime Forum

Regex: match the numbers that are repeated most often

#1

hello, I have 15 rows with 7 numbers, from 1 to 50. How can I match the 4 numbers that are repeated most often in all those 15 rows?

I suppose I must first select all numbers \d+
then, I have to divide all 2-digit numbers \b[1-9]{2}\b by all 1-digit numbers \b[1-9]{1}\b
or, I should select all numbers from 1-10, then all numbers from 10-20 …and from 40-50

I don’t know exactly, there should be a mathematics formula. In Excel I can use filters for this, or Sort from lowest to highest, etc, and also I have the option with unique and duplicate values.

But how can I do this with regex in sublime?

0 Likes

#2

Regular expressions don’t have the kind of computational power to do something like that, they just find text that matches. If you want to use a regex like that you’re probably going to have to do something brute force like manually search for every number from 1 to 50 and keep track of the number of results you get back, and then pick the four with the most hits.

You’re probably much better off writing a little custom made python code to examine the buffer and do it for you in this case, unless it’s something you’re only going to do once (even then I’d rather write a little script than do something like that by hand).

3 Likes

#3

BTW, have a look at collections.Counter

>>> from collections import Counter
>>> c = Counter([1, 5, 8, 1, 20, 15, 8, 48, 7, 54, 6, 15, 4, 9, 19, 1])
>>> c.most_common(4)
[(1, 3), (8, 2), (15, 2), (4, 1)]
>>> for nb, times in c.most_common(4):
...     print('The number {} occured {}'.format(nb, 'once' if times == 1 else 'twice' if times == 2 else str(times) + ' times'))
...
The number 1 occured 3 times
The number 8 occured twice
The number 15 occured twice
The number 4 occured once
>>>

So, all you have to do is select every number, add them to a list, pass that list to Counter, and ask this counter the 4 most common number!

(the module collections is full of “little” class like this that I’ve been re-coding a lot, because I wasn’t aware that they already existed, such as OrderedDict. So, please, don’t do the same mistake and have a deeper look to this awesome module :smile: )

2 Likes

#4

hello. thanks. Anyway, if I have to select every number, I can Count from the start all the occurences.

yes, it’s tough work. So, I better use excel, much faster.

thanks a lot

0 Likes

#5

When I say “select”, it’s from python:

import sublime
import sublime_plugin
from collections import Counter

class OccurencesCounterCommand(sublime_plugin.TextCommand):

    def run(self, edit):
        regions = self.view.find_all(r'\d+') # the "equivalent" of the ctrl+f
        nbs = []
        for region in regions:
            # in your case, you don't even need to transform it into a string
            # (you can leave it as a string), it's just because it's kind of cleaner
            nbs.append(int(self.view.substr(region)))

        counter = Counter(nbs)

        res = ','.join([occurence[0] for occurence in counter.most_common(4)])
        sublime.message_dialog('The 4 numbers that occurs the most are' + res)


I haven’t tested it

You just need to save this in Packages/User/occurences_counter.py, open up the console (View -> Show Console), and paste this in: view.run_command('occurences_counter'). (you can bind this to a shortcut for example, but if you want to use this only once, it doesn’t worth it)

If you’re use to do it in exel, and you don’t write python, it might be faster to copy/paste each number (or you can do it once, I don’t know) in exel to do it.

0 Likes