Sublime Forum

Find and replace multiple strings

#1

In a column of several hundred thousand lines, I need to replace the x-letter groups in the following list with their Unicode counterparts as follows:
cx - ĉ
gx - ĝ
hx - ĥ
jx - ĵ
sx - ŝ
ux - ŭ

I would be grateful for simple instructions on how to do that in Sublime Text.

0 Likes

#2

Probably the easiest thing is to just perform 6 find-replace operations back to back, changing the term for each of them.

However, you mention “in a column”, which tends to suggest that the layout of the data matters for the replacement. In which case in order for someone to help you, they would need to see a small example of what the data looks like that’s being manipulated.

0 Likes

#3

Thanks for responding.

In fact that’s what I’m doing, using Windows standard Find and Replace routine. It takes about five minutes to replace all six x-letter strings with their Unicode counterparts, in a routine that might be described as: “find the first letter-group and replace all copies of it with this Unicode. Then find the next letter group and replace all of them with this Unicode”, in other words I’m doing manually the kind of monotonous logical routine computers were born for.

And in fact for years I’ve been using a program called Replace Text (formerly BK Replace Em) to do exactly that: it zipped through the replacements in less than two seconds. But I changed to Windows 11 and now Replace Text no longer replaces, so I’m looking for a replacement replacer. I thought that might be Sublime Text.

The data is a vocabulary file for the translator traduku.net
In a single column, line 1 is English, line 2 is its equivalent in Esperanto, line 3 in English and so on, like this:

go into debt
/surpreni sxuldon/prunti monon
go into decline
ekregresi
go into effect
/ekvalidigxi/ekefiki/promulgita
go into exile
/ekziligxi/ekzili sin/foriri al ekziligxo
go into hiding
kasxi sin
go into liquidation
/likvidigxi/malfondigxi

The letter-pairs with ‘x’ are a convenient way of writing the sibilants ‘ch’, ‘sh’ and so on. Before uploading to the server, I replace them with their counterparts in Unicode. Displayed on-screen, the letters appear in their standard form: ĉ - ĝ - ĥ - ĵ - ŝ - ŭ

The format of the data file was established by the site’s programmer. Myself, I don’t code. To me, JavaScript would be a language of the island of Java.

0 Likes