Basically like this (untested):
#!/usr/bin/env python
import io
import sys
import os
def main():
if not len(sys.argv) == 2:
print("Expected path as argument")
return 1
path = sys.argv[1]
for dirpath, dirnames, filenames in os.walk(path):
for fname in filenames:
fpath = os.path.join(dirpath, fname)
# read file in source encoding
with io.open(fpath, encoding="gbk") as f:
text = f.read()
# write file (replace) in utf-8
with io.open(fpath, 'w', encoding="utf-8") as f:
f.write(text)
if __name__ == '__main__':
sys.exit(main())
Run the script with the base folder as argument, it will then read all files in that folder recursively in gbk
encoding and replace them with the same content in utf-8
.
The list of encodings recognized by Python can be found here: https://docs.python.org/3.5/library/codecs.html#standard-encodings
Also note that the file contents are stored in memory, so don’t try this with files that are multiple GB large. For that you’d need to write the new file to a different location with smaller chunks and then replace the original through renaming.
You can always tweak this script to have more fine-grained control, if you need.