Sublime Forum

Encoding problem with Windows Ansi

#1

I might do it wrong but I am not able to correctly open files that are encoded in Windows Ansi in ST3.

The files that I open are csv files exported from Excel and ST3 guesses the encoding as Western (Windows 1252) but the chars with umlauts are incorrect and wrong.

If I open the same file in Notepad++ it tells me the file is encoded in ANSI and I can successfully convert it to UTF-8. That converted file works well in ST3.

What is the right way to do that in Sublime Text (opening an Excel generated csv, convert the encoding to UTF-8 and save it)?

1 Like

#2

There’s no such encoding as “Windows ANSI” — ANSI refers to any of the old Windows legacy encodings/character sets from the pre-Unicode, post-DOS era. Microsoft applications such as Excel seem to prefer Unicode only in the way of UTF-16, and UTF-8 rarely and only with a BOM (Byte Order Mark) at the beginning. It can happen that without a BOM an otherwise valid UTF-8 byte stream is interpreted as “ANSI”, i.e. typically CP1252 (aka. “Western (Windows 1252)”).

I would say that Sublime’s behaviour is correct that it displays the particular charset instead of the vague blanket “ANSI” that doesn’t mean much.

But, back to the issue! What do the broken accented characters look like? Something like á or so? In that case, your file is UTF-8 and you should try opening it as that.

0 Likes

#3

ST3 thinks it’s CP1252 and opens it as CP1252. Still there are the umlauts wrong. See in the picture, I higlighted the first wrong chars:

  • ß => ί
  • ü => ό
  • ä => δ

Even reopening as Western (Windows 1252) doesn’t help. But in Notepad++ and Emacs I am able to open the file with correct umlauts and are able to save it as UTF-8. I would like to do the same directly with ST3. But somehow I don’t figure a correct way to open it in Sublime Text without broken umlauts. Any idea?

1 Like

#4

It might also be an ST3 or Excel bug. If I export a Excel sheet with the following content as csv than Sublime Text als guesses CP1252 and shows the umlauts correctly.

66;ÄÜÖß ;Manche Menschen bilden sich ein (verbilden sich), daß sie immer recht haben.
66;Hoe {{c1::komt}} het dat ze zoveel {{c1::bezwaren}} tegen ons {{c1::voorstel}} hebben?;Wie kommt es, daß sie so viel Bedenken gegen unseren Vorschlag haben?
1 Like

#5

That is really weird, as those are greek characters, which means the bytes are interpreted as CP1253 instead of CP1252.

0xFC is indeed ό in 1253 and ü in 1252. 0xE4 is δ in the greek code page and ä in the western one.

Not sure what ST3 is doing there. Software usually autodetect what codepage it could be from the frequency of certain bytes, as there’s nothing else of a sign in the file that could say what it was supposed to be – but I’ve never seen that break in ST before for sure.

0 Likes

#6

Than that might be a ST3 bug. Maybe @wbond, can have a look at it.

I use ST Build 3157 (latest dev build) on Windows 10 Home 64 bit. And the failure is reproducable. I have a certain CP1252 encoded file that I can correctly open in Emacs/Notepad++ but where ST3 fails to open it correctly. ST3 guesses CP1252 (as seen in the screenshot above) but fails to use the correct encoding for umlauts.

Strange thing is that I export something different from Excel than CP1252 works.

@ralesk I tried to set the encoding to Cp1253 but it doesn’t change anything.

1 Like

#7

I’m having the same issue with Sublime Text 4, is there a solution for this yet?

0 Likes