Sublime Forum

Referring to regex capture groups in a plugin

#1

I’m trying to use find_all() and replace() in a plugin, but I can’t seem to find the correct expression to refer back to a capture group within the replacement expression. I’ve tried “\1”, “\1”, “$1”, “$1” and some others - no luck.

Am I a victim of the null intersection of BOOST and Python?

0 Likes

#2

the view.replace API is decoupled from the view.find_all API, in that replace knows nothing about any capture groups as it is designed to be used standalone, not just as a result of regex matches.

view.find_all doesn’t expose the capture groups that matched directly, or their positions, but it’s possible to use the format string with $1 etc. to get the text that was captured, though if you have multiple capture groups, you’ll probably want to parse the extractions afterwards.

e.g.

extractions = list()
regions = view.find_all(r'(\w+), (\w+)', 0, '$1\n$2', extractions)
capture_group_contents = [extraction.split('\n') for extraction in extractions]

ofc there are other possible solutions to get the capture group locations, including executing the regex searches multiple times with lookaheads and \K in the relevant places, or using Python’s more limited built-in re module.

btw the docs don’t mention that the format string should be in Perl format and not Boost Extended format

1 Like

#3

As is, find_all and replace work fine, as long as the replacement expression doesn’t involve any regex. I’m lucky: I have only a few dozen possible matches, so I guess I’ll just handle one per line.

I’m trying to automate the task of cleaning up a file. The first time, I did it by hand, but between my mistakes and what I learned, I realized that I will need to do this often enough to bother automating it. My first thought was simply a saved and editable macro, but ST3 macros can’t handle replace. I thought about switching editors, but read that ST3 makes up for this with a full-featured plugin language - a little harder to use the first time, but supposedly worth the effort to learn. But no - the plugin language can’t handle replace??? Pitiful.

0 Likes

#4

You can use Python’s re API which is more accessible in regards to capture groups, or even Python’s 3rd party Regex module. Both of these I use in the plugin RegReplace which I created for automating common regular expression replacements.

0 Likes

#5

works fine from what I can see, care to share an example?

import sublime
import sublime_plugin


class RegexReplaceCommand(sublime_plugin.TextCommand):
    def run(self, edit, regex, replacement):
        extractions = list()
        regions = self.view.find_all(regex, 0, replacement, extractions)
        for region, replace_with in reversed(list(zip(regions, extractions))):
            self.view.replace(edit, region, replace_with)
window.run_command('new_file')
view.run_command('insert_snippet', { "name": "Packages/Text/Snippets/lorem.sublime-snippet" })
view.run_command('regex_replace', { 'regex': r', (\w+)', 'replacement': r', \U$1!' })
0 Likes

#6

Here’s an excerpt, using $1 for the first capture group.

import sublime
import sublime_plugin

class DictionaryCommand(sublime_plugin.TextCommand):
def replaceAll(self, edit, roman, shwa):
rlist = reversed(self.view.find_all(roman))
for r in rlist:
self.view.replace(edit, r, shwa)
def run(self, edit):
self.replaceAll(edit, “([^S] [PTK])( …1)”, “$1X$2”)

Even the above is beyond my pay grade in Python - your example is Greek to me.

0 Likes

#7

I found a solution, taking advantage of the limited scope of possible matches. Of the three elements I needed to match, there are only 4 possibilities for the first, 5 for the second, and 14 for the third. So I just iterated through them all with for, using a non-regex replace string.

It’s not elegant, but it may be easier to read :slight_smile:

Thanks for the help.

0 Likes

#8

So if I only have a single capture, how will it work? Just to have the text of each capture as elements of the list.

0 Likes