Sublime Forum

Sublime shows UTF-8 but Notepad++ shows ANSI, why? which one is correct?

#1

Hi Everyone,

I’m using Sublime v.3.1.1 Build 3176 (Unregistered/free) on macOS High Sierra 10.13.3 and Windows 7 64-bit and Notepad++ 7.5.8 (64-bit) on Windows 7 64-bit. I open a simple text file (link below) with one line of text and Sublime (both Mac and Windows) says the encoding is UTF-8 but Notepad++ says ANSI.

Which one is correct?

https://drive.google.com/file/d/1Rb27b0CGo3c4Aa054aep61a8e4DrwZ3j/view?usp=sharing

0 Likes

#2
0 Likes

#3

from the stackoverflow, the last statement saiid “ANSI is not the same as UTF-8.”.

If so, why Sublime says my file has UTF-8 encoding? why it doesn’t say ANSI if ANSI is not same as UTF-8?

0 Likes

#4

It was designed for backward compatibility with ASCII. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well.

as your text file doesn’t use any code points above 128, your file is both UTF-8 and Ascii

2 Likes

#5

If ST doesn’t find any character other than ASCII it assumes utf-8 to make sure you won’t run into trouble if you add unicode characters to your text later on, while Notepad++ relies on the BOM in order to decide whether to use utf-8, utf-16 or ANSI. I can remember a recent discussion when users said BOM is useless, but Notepad++ is just another example of legacy programs which rely on it as ANSI was the default 15 years ago.

Today utf-8 is the state of the art default always to use standard encoding for text files as it on the one hand does not waste a useless extra byte for each (ascii) character like utf-16 but is capable to store unicode characters (which use 2 or more bytes) without inserting binary 0. So utf-8 can be handled by all legacy text methods as well.

Finally, as long as you keep using ASCII only characters you don’t need to care about ANSI vs. UTF-8. If you start adding unicode characters you will be happy about them to be saved correctly by ST without effort.

4 Likes

#6

Thank you to both of you guys.

0 Likes