Sublime Forum

Cyrillic encoding fails

#1

Faced a problem with encoding in Cyrillic. Perhaps this problem has already been solved in the latest versions of PHP, but I’ll write it anyway. I needed to reduce the text to a certain number of characters. I used the substr function:
$postPromptCopy = substr($postPrompt, 0, MAX_PROMPT_URL_SIZE);.
When this function separates a word, the file will say СоÐ-дайђе instead of Создайте. All text that was displaying normally before this message will also lose encoding.
Other editors deal with this more successfully: Разбитое слов�.

P.S. My solution was to search for the nearest space. In this case, the encoding is not broken.

0 Likes

#2

Can you provide more details about how the script is executed and in which form ST is involved in displaying results? Are results written to build output-panel or is it written to a file, which is opened in ST to display its content?

Which platform are you working on: Mac/Linux/Win?

File output

If it is a file, which displays wrong encoding ST may not have correctly detected its encoding.

You can try Main Menu > File > Reopen with Encoding > Cyrillic (…) .

Build System and Console Programs

If for instance a php script is executed with the default build command provided by ST, content of output panel can be malformed due to encoding mismatches. On Windows for instance console programs use to encode output with old “OEM” encodings, but python backend, responsible for drawing text to ST’s output panel expects something like “Western-125x” or UTF-8.

On my box for instance a php file encoded with Western-1251 (Cyrillic) outputs garbage in console as it expects Western-1252 (specified by my OS wide language settings).

grafik

To fix it, an overridden build configuration with "encoding": "cp1251", is required. If your system language matches you can also try "encoding": "oem",

Packages/PHP/PHP.sublime-build

{
    "cmd": ["php", "$file"],
    "file_regex": "^(?:php:)?[\t ](...*?):([0-9]*):?([0-9]*)",
    "selector": "embedding.php | source.php",
    "encoding": "cp1251", // added to force cyrillic 
    // "encoding": "oem", // may work as well

    "variants": [
        {
            "name": "Syntax Check",
            "cmd": ["php", "-l", "$file"]
        }
    ]
}

grafik

0 Likes

#3

Windows. XAMPP server on a local host. There doesn’t seem to be a problem with echo. The problem occurs when logging to php_error_log via error_log.
$postPromptCopy = substr($postPrompt, 0, MAX_PROMPT_URL_SIZE);
error_log($postPromptCopy);

0 Likes

#4

So error logs are written to a file which is then opened in ST?

0 Likes

#5

Yes. If VS Code or Notepad open that file, the problem is only with the last word. In ST the encoding of the whole file is broken.

0 Likes

#6

With "show_encoding": true, set in preferences, ST displays file’s encoding in status bar.

Check if it the right one or use the context menu: Reopen with Encoding to re-open the file with given encoding.

You can also check and/or "fallback_encoding" setting to specify the ANSI encoding if ST is unable to detect the encoding.

0 Likes