Sublime Forum

Encoding problem with Windows Ansi

#1

I might do it wrong but I am not able to correctly open files that are encoded in Windows Ansi in ST3.

The files that I open are csv files exported from Excel and ST3 guesses the encoding as Western (Windows 1252) but the chars with umlauts are incorrect and wrong.

If I open the same file in Notepad++ it tells me the file is encoded in ANSI and I can successfully convert it to UTF-8. That converted file works well in ST3.

What is the right way to do that in Sublime Text (opening an Excel generated csv, convert the encoding to UTF-8 and save it)?

1 Like

#2

Thereā€™s no such encoding as ā€œWindows ANSIā€ ā€” ANSI refers to any of the old Windows legacy encodings/character sets from the pre-Unicode, post-DOS era. Microsoft applications such as Excel seem to prefer Unicode only in the way of UTF-16, and UTF-8 rarely and only with a BOM (Byte Order Mark) at the beginning. It can happen that without a BOM an otherwise valid UTF-8 byte stream is interpreted as ā€œANSIā€, i.e. typically CP1252 (aka. ā€œWestern (Windows 1252)ā€).

I would say that Sublimeā€™s behaviour is correct that it displays the particular charset instead of the vague blanket ā€œANSIā€ that doesnā€™t mean much.

But, back to the issue! What do the broken accented characters look like? Something like ƃĀ” or so? In that case, your file is UTF-8 and you should try opening it as that.

0 Likes

#3

ST3 thinks itā€™s CP1252 and opens it as CP1252. Still there are the umlauts wrong. See in the picture, I higlighted the first wrong chars:

  • Ɵ => ĪÆ
  • Ć¼ => ĻŒ
  • Ƥ => Ī“

Even reopening as Western (Windows 1252) doesnā€™t help. But in Notepad++ and Emacs I am able to open the file with correct umlauts and are able to save it as UTF-8. I would like to do the same directly with ST3. But somehow I donā€™t figure a correct way to open it in Sublime Text without broken umlauts. Any idea?

1 Like

#4

It might also be an ST3 or Excel bug. If I export a Excel sheet with the following content as csv than Sublime Text als guesses CP1252 and shows the umlauts correctly.

66;ƄƜƖƟ ;Manche Menschen bilden sich ein (verbilden sich), daƟ sie immer recht haben.
66;Hoe {{c1::komt}} het dat ze zoveel {{c1::bezwaren}} tegen ons {{c1::voorstel}} hebben?;Wie kommt es, daƟ sie so viel Bedenken gegen unseren Vorschlag haben?
1 Like

#5

That is really weird, as those are greek characters, which means the bytes are interpreted as CP1253 instead of CP1252.

0xFC is indeed ĻŒ in 1253 and Ć¼ in 1252. 0xE4 is Ī“ in the greek code page and Ƥ in the western one.

Not sure what ST3 is doing there. Software usually autodetect what codepage it could be from the frequency of certain bytes, as thereā€™s nothing else of a sign in the file that could say what it was supposed to be ā€“ but Iā€™ve never seen that break in ST before for sure.

0 Likes

#6

Than that might be a ST3 bug. Maybe @wbond, can have a look at it.

I use ST Build 3157 (latest dev build) on Windows 10 Home 64 bit. And the failure is reproducable. I have a certain CP1252 encoded file that I can correctly open in Emacs/Notepad++ but where ST3 fails to open it correctly. ST3 guesses CP1252 (as seen in the screenshot above) but fails to use the correct encoding for umlauts.

Strange thing is that I export something different from Excel than CP1252 works.

@ralesk I tried to set the encoding to Cp1253 but it doesnā€™t change anything.

1 Like

#7

Iā€™m having the same issue with Sublime Text 4, is there a solution for this yet?

0 Likes