Sublime Forum

[Solved][python3] Cyrillic write(utf_8_text) fail

#1

Hello. I’m using sublime text 3 for python coding. And i have some problems with cyrilic encoding.

Firstly i’ve had problems even with building(running) any file with cyrilic in it. But i’d found solution to made build config as follow:

[cmd: ['python3', '-u', '-c', "import sys; import codecs; sys.stdout = codecs.getwriter( 'utf-8' )( sys.stdout.detach() ); exec( compile( open( r'/.../ducksearch.py', 'rb' ).read(), r'/.../ducksearch.py', 'exec'), globals(), locals() )"]]
[dir: /.../crowler]
[path: /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin]

So now it’s ok for me, it’s running py files with Cyrilic strings in it well. But when i’m trying to write file in cyrilic it fails again with such message:

UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 197: ordinal not in range(128)

At the same time same script goes well in both python3 cmd and ipython env. So it seems that the problem is in sublime build system or in my cfg for it. So could you please tell what should i do to make it works fine?

here’s my code:

utf_8_text = html.unescape(html_entities)

print(utf_8_text)

fi = open('./tmp/tmp.html', 'w')

try:
    fi.write(utf_8_text)
except Exception as e:
    raise e
finally:
    fi.close()

here’s some input text example (the original input is in the html entities, not cyrilyc actually).

<div class="book-description"> &#x41A;&#x443;&#x43B;&#x44C;&#x442;&#x443;&#x440;&#x430;, &#x43F;&#x43E; &#x43C;&#x43D;&#x435;&#x43D;&#x438;&#x44E; &#x415;&#x440;&#x43E;&#x444;&#x435;&#x435;&#x432;&#x430;, &#x435;&#x441;&#x442;&#x44C; &#x434;&#x438;&#x441;&#x442;&#x430;&#x43D;&#x446;&#x438;&#x44F; &#x43C;&#x435;&#x436;&#x434;&#x443; &#x447;&#x435;&#x43B;&#x43E;&#x432;&#x435;&#x43A;&#x43E;&#x43C;, &#x442;&#x430;&#x43A;&#x438;&#x43C; &#x43A;&#x430;&#x43A; &#x43E;&#x43D; &#x435;&#x441;&#x442;&#x44C;, &#x438; &#x442;&#x435;&#x43C; &#x43E;&#x431;&#x440;&#x430;&#x437;&#x43E;&#x43C;, &#x432; &#x43A;&#x43E;&#x442;&#x43E;&#x440;&#x43E;&#x43C; &#x43E;&#x43D; &#x441;&#x435;&#x431;&#x44F; &#x432;&#x438;&#x434;&#x438;&#x442;. &#x41D;&#x435;&#x430;&#x434;&#x435;&#x43A;&#x432;&#x430;&#x442;&#x43D;&#x43E;&#x441;&#x442;&#x44C; - &#x43F;&#x440;&#x438;&#x447;&#x438;&#x43D;&#x430; &#x441;&#x43C;&#x435;&#x445;&#x430; &#x438; &#x441;&#x43B;&#x435;&#x437;, &#x438;&#x440;&#x43E;&#x43D;&#x438;&#x438; &#x438;&#x442;&#x440;&#x430;&#x433;&#x435;&#x434;&#x438;&#x438;, &#x43E;&#x43F;&#x440;&#x435;&#x434;&#x435;&#x43B;&#x44F;&#x44E;&#x449;&#x430;&#x44F; &#x445;&#x43E;&#x434; &#x438;&#x441;&#x442;&#x43E;&#x440;&#x438;&#x438;, &#x447;&#x435;&#x43B;&#x43E;&#x432;&#x435;&#x447;&#x435;&#x441;&#x43A;&#x43E;&#x435; &#x441;&#x443;&#x449;&#x435;&#x441;&#x442;&#x432;&#x43E;&#x432;&#x430;&#x43D;&#x438;&#x435;. &#x412; &#x43D;&#x43E;&#x432;&#x43E;&#x439; &#x43A;&#x43D;&#x438;&#x433;&#x435; &#x415;&#x440;&#x43E;&#x444;&#x435;&#x435;&#x432;&#x430; &#x43C;&#x438;&#x440; &#x447;&#x435;&#x43B;&#x43E;&#x432;&#x435;&#x43A;&#x430;, &#x43A;&#x443;&#x43B;&#x44C;&#x442;&#x443;&#x440;&#x430;, &#x43B;&#x438;&#x442;&#x435;&#x440;&#x430;&#x442;&#x443;&#x440;&#x430;</div>
0 Likes

#2

I’ve found the solution. The problem were that in sublime text python environment there were no Russian localisation enabled. So for now my build config for python looks as follow (point that there is no such awful args coming to interpretator call). And all output is going well both output in console and writing in file.

{
"cmd": ["python3", "-u", "$file"],
"env": {"LANG": "ru_RU.UTF-8"},
"file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
"selector": "source.python"
}

Also just want to note that this part of build config, i mean env object, have almost none documentation and it’s quite sad.

3 Likes