Sublime Forum

[Solved] How to convert this script to be a part of SublimeText?

#1

Hi

For over a year now, I’ve been making and enhancing script that is part of SubtitleEdit,
but yesterday I’ve figured that this could be part of SublimeText, and to help me fix wrong written words into right ones, when editing ebooks and other large pieces of text.

Can anyone help me to get started of how would this code look in ST and where would I put it.
As a macro assume;

Please, help me out. Thank you.

0 Likes

#2

Something like this should work:

 

import sublime, sublime_plugin

class ocr_fix( sublime_plugin.TextCommand ):
	def run( self, edit ):

		regexPairs = get_RegEx_Pairs()

		for queryPattern, replacementPattern in regexPairs:
			replacements = []
			resultRegions = self.view.find_all( queryPattern, 0, replacementPattern, replacements )

			for index in range( 0, len( resultRegions ) ):
				self.view.replace( edit, resultRegions[ index ], replacements[ index ] )

def get_RegEx_Pairs():

	regexPairs = []

	regexPairs.append( ( "advokati", "odvjetnici" ) )
	regexPairs.append( ( "Advokati", "Odvjetnici" ) )
	regexPairs.append( ( "advokatima", "odvjetnicima" ) )
	regexPairs.append( ( "Advokatima", "Odvjetnicima" ) )
	regexPairs.append( ( "amin", "amen" ) )
	regexPairs.append( ( "Amin", "Amen" ) )

	return( regexPairs )

 

The command ocr_fix can assigned to a key-binding or command palette entry.
 



 
Here’s the same script with all of the replacement pairs: @ Gist

You might need to check the RegEx patterns, I just did a quick replacement of \\(?![nt"]) with \\\\ to make the strings valid for Python.

2 Likes

#3

oh, wow! thank you very much :+1:

where do I put that file? in Packages or?

Also, not everything in original file is under regex.
I assume this converted all to it??

For example if I have čk -> čak
I wouldn’t want for ručka to turn out to be ručak.
So I wonder will this pick up singular words, or will it implement regular expression overall

0 Likes

#4

 
You can save it @ /Packages/OCR Fix/

 

 
By default, the view.find_all function uses RegEx.  You can also add a flag so that it searches for literal strings instead of RegEx, but I don’t think that would respect word boundaries.

I’d say your best bet is to stick with RegEx & add \\b where necessary to avoid partial matches.

0 Likes

#5

Please bear with me because it’s first time I do something like that.
I’ve put it in OCR Fix folder in Packages.

So how do I load it and run it, exactly?
I thought it would display under packages in menu.

When you say “flag it” what do you mean by that?
Also, a bit info on respecting word boundaries, if You don’t mind.
I apologize for so much questions.

0 Likes

#6

 
•  save this code to:
/Packages/OCR Fix/Default.sublime-commands

[

	{
		"caption": "OCR Fix",
		"command": "ocr_fix",
	},
	
]

•  open the command palette with Ctrl + Shift + P
•  type OCR Fix and press Enter
 

OR

 
•  save this code to:
/Packages/OCR Fix/Default.sublime-keymap

[

	{
		"keys": ["ctrl+shift+alt+o"],
		"command": "ocr_fix",
	},

]

•  press Ctrl + Shift + Alt + O
 



 

 
From Sublime Text > API Reference > View:

[Region]

find_all(pattern, <flags>, <format>, <extractions>)

Returns all (non-overlapping) regions matching the regex pattern. The optional flags parameter may be sublime.LITERAL, sublime.IGNORECASE, or the two ORed together. If a format string is given, then all matches will be formatted with the formatted string and placed into the extractions list.

 



 

 
Some of the RegEx patterns in the list you posted already use the word boundary metacharacter:

 
For example:

\b([aA])bsorbira will match absorbira in:

Case 1: "abc absorbira xyz"

but not in

Case 2: "abcabsorbira xyz"   or   Case 3: "abcabsorbiraxyz"

but it would match

Case 4: "abc absorbiraxyz"

 
In order to prevent case 4, you could use:
\b([aA])bsorbira\b

 

( In these examples, I used plain RegEx. Make sure you use properly escaped backslashes in the actual code for Python compatibility.  EG: \\b )

3 Likes

#7

I’ve created Default.sublime-keymap
and copy-pasted info you posted:

Now, I’ve created screenshot so you can tell if this is in good location or not.

What exactly should it happen when I do this?
Because I expected it to display in Preferences - Package settings.
Am I mistaken. Anyhow, nothing happens, and I don’t know what’s wrong.
Got any ideas? I’ve tried every uppercase/lowercase letter combination I could imagine,
because I’m not sure is it case sensitive??? finally, I’ve renamed file to ocr_fix.py like it’s in your example of commant, but no. nothing happens. and is restart of program necessary for this?
I did that, however, issue remains.

0 Likes

#8

 
Only plugins with a Main.sublime-menu file will show up there.

You should just be able to run the plugin with one of the two methods I described in my previous post.

0 Likes

#9

But what exactly is supposed to happen when I press that keys: ctrl+shift+alt+o
Coz I see no effect.

0 Likes

#10

It will automatically replace any instances of the misspelled words from the entire document.

0 Likes

#11

so you say - I click those 4 buttons and it should automatically run and change words? :confused:
That does not happen. :frowning:

Could you please test it if it happens on your pc?

Ali sedamdeset sedma godina Morgantea postala je sedamdeset sedma godina Sendovanija, i mada je Lok uspeo da neko vreme prikriva svoja dela od Krađoučitelja, još jednom prilikom je doživeo čudesan neuspeh u pokušaju da bude obazriv. Kada je Krađoučitelj shvatio šta je dečak uradio, otišao u posetu kapi Kamora i obezbedio dozvolu za jednu malu smrt. Tek se uzgred setio da ode bezokom svešteniku, ne da bi bio milosrdan, već zato što je to bila poslednja prilika da ostvari kakvu-takvu dobit.

0 Likes

#12

Just fixed the class name, should work now.

I usually use the uppercase naming convention of PluginNameCommand & forgot that OCR Fix would require the lowercase convention of plugin_name. :sweat_smile:

 
I did notice that some of the RegEx patterns use capturing group replacements, & they’re showing up as literal symbols in the replacements ( $1, etc. ), so you’ll need to handle that with extractions or throw a manual RegEx replacement in the loop.

0 Likes

#13

Ah yes, I’ve found it, changed it, and now it works.

Could you please give me some examples of extractions and similar so I can fix it by them?
I’m not really much of a coder but one who works by example.

for example:
dečak -> dječak [instead of $1ječa$2]

I see that this will require lots of fixing here, before it works the way I perceived it would.

0 Likes

#14

I just updated the code, the extractions list fixed it.  I believe it will work as you expect.

3 Likes

#15

:smiley:

I can’t thank you enough for all your help and kindness.
This is, in general, what I wanted. Thank you so much.
I really appreciate it. :relaxed:

1 Like

#16

Is that a plugin that you are using to display the differences? If so, which plugin is that?

0 Likes

#17

 
That particular plugin is Compare Side-By-Side.  For Git files, I use GitGutter

There are also various Diff Plugins that offer different display styles.

2 Likes

#18
replacementPairs.append( ( "њ", "nj" ) )

Can anyone help me?
I’ve hit an obstacle I don’t understand.
Apprently, regex values above don’t want to change and that sign remains.

0 Likes