Sublime Forum

Run in command line/bash: open/save with encoding multiple files

#1

I wonder if it is possible, for native support or with plugin:

run ST3 from command line/bash, with some parameters, to

  • open multiple files with some encoding

  • save them with some encoding

This command should support regex expressions. Something like:

sublimetext.exe -open -encoding UTF8_BOM /mydir/*.xml
sublimetext.exe -save -encoding UTF8_NO_BOM /mydir/*.xml

I have seen approach to save all with encoding like in:

The plugin/improvement should do something alike.

1 Like

#2

ST automatically detects the correct encoding when opening files - I don’t know of a way to force an encoding when opening a file, but if there are some edge cases where it gets it wrong, you could get ST to reopen the file(s) with a different encoding.

it seems that, from your edit, you’ve already seen how to do this from within ST, and you’d like to extend it to controlling it from the command line. Take a look at this SO answer for details on how to execute ST commands from your shell:

I guess you mean glob patterns as opposed to regular expressions?

  • I think the way to do this is let ST handle the globs:

      subl /mydir/*.xml
    
  • and then issue a command to re-open all open files with the specified encoding (which you’ll have to create, it can be 99% identical to the one on SO, just using the reopen command instead of save)

  • and then issue a command to save all open files with the specified encoding (i.e. save_all_with_encoding from the SO answer you linked)

but really, if you just want to change encodings for multiple files from the command line, probably there are tools that are better suited for it rather than needing to write plugins for a text editor and making sure you don’t have any other open files before you start…

1 Like

#3

Thanks @kingkeith for the link and the answer.

The massive processing is the key. I am having a UTF-8 with BOM problem and need to save 200+ files again with UTF-8. Now any of them is detected as UTF-8 with BOM but I know this is wrong, because I just saved them all by 1) open them all at once 2) save all with the py script in my edit.

I just wonder if it is possible in shell/bash with some modification to ST, so that I can use scripts to integrate ST as a encoding converter. I can open 200+ small files but if any of them is a heavy one, it is disaster.

Just a suggestion, because with the trick above, it is possible already.

0 Likes

#4

Thanks again for replying. Yes the insight of yours is very good. I should try with that.

That said, actually I found it more convenient to use ST3, because it runs on all dev environment. Look at this question:

On different platforms there are different tools, but ST3 beats them all by supporting GBK, Windows 1252 and more encodings…

Don’t misunderstand me as a fanatic of ST3, but for this specific task, ST3 is:

  • lightweight (install-and-run, no extra installation is needed),
  • cross-platformed,
  • with more encoding support than its rivals. Even hexadecimal…
0 Likes

#5

You could write an ApplicationCommand that takes as parameters a glob, an input encoding and an output encoding, and invoke that from the command line with

subl --command 'my_convert_encodings_command {"glob": "/mydir/*.xml", "input": "UTF8_BOM", "output": "UTF8_NO_BOM"}'

There has been some discussion about invoking commands from the command line here.

0 Likes

#6

I would still prefer a python script over implementing this in ST. Hell, you could even use the st-internal interpreter to run the script if you need to run it on windows and don’t have python installed (wish it was by default). I would then do it in a way that you just need to load the plugin once and it does all the work, without having to go through command calling and stuff. Just make sure it doesn’t run more than once (E. G. Create a temporary file).

0 Likes

#7

I may want to switch and try with all encoding, with some unreadable files. If it runs only once, have to tweak params and run again… But the idea stands. Can you help with how to create a encoding conversion script with python?

0 Likes

#8

Basically like this (untested):

#!/usr/bin/env python

import io
import sys
import os


def main():
    if not len(sys.argv) == 2:
        print("Expected path as argument")
        return 1
        
    path = sys.argv[1]
    
    for dirpath, dirnames, filenames in os.walk(path):
        for fname in filenames:
            fpath = os.path.join(dirpath, fname)
            # read file in source encoding
            with io.open(fpath, encoding="gbk") as f:
                text = f.read()
            
            # write file (replace) in utf-8
            with io.open(fpath, 'w', encoding="utf-8") as f:
                f.write(text)

    
if __name__ == '__main__':
    sys.exit(main())

Run the script with the base folder as argument, it will then read all files in that folder recursively in gbk encoding and replace them with the same content in utf-8.
The list of encodings recognized by Python can be found here: https://docs.python.org/3.5/library/codecs.html#standard-encodings
Also note that the file contents are stored in memory, so don’t try this with files that are multiple GB large. For that you’d need to write the new file to a different location with smaller chunks and then replace the original through renaming.

You can always tweak this script to have more fine-grained control, if you need.

4 Likes