Sublime Forum

Auto falling back to UTF-8 !?

#1

“Not all characters are representable in Western (ISO 8859-1), falling back to UTF-8”

this is the message i get when sublime was unable to save the file with the existing encoding, i tried to set the encoding fall back and defaults in the settings but still not working:

"default_encoding": "Arabic (Windows 1256)", "default_line_ending": "unix", "fallback_encoding": "Arabic (Windows 1256)",

anyone have a solution on to prevent sublime for forcing to save the file in utf-8?

0 Likes

EncodingHelper ( Encoding on status bar, Convert to UTF8 )
#2

I also got that message of EncodingHelper saying
“Not all characters are representable in Western (ISO 8859-1), falling back to UTF-8”

IMO what EncodingHelper needs is a “best fit” mapping from characters in UTF-8 to characters in Latin-1 (ISO 8859-1).

Of course, there are many characters in UTF-8 that do not have any equivalent character in Latin-1.

But on the other hand side, there are many chararcters in UTF-8 that do have a (roughly) similar character in Latin-1, although they are not exactly the same, like e.g. all those slightly different kinds of hyphens:

‐ 8208 2010 HYPHEN
‑ 8209 2011 NON-BREAKING HYPHEN
‒ 8210 2012 FIGURE DASH
– 8211 2013 – EN DASH
— 8212 2014 — EM DASH
― 8213 2015 HORIZONTAL BAR
(Source: w3schools.com/charsets/ref_u … uation.asp )

It would be better if all of them would be mapped to the good old Latin-1 hypen (-), instead of simply doing nothing and falling back to UTF-8.

Another improvement for EncodingHelper would be if all characters that cannot be replaced during conversion at all would be marked somehow, e.g. “<?>”.

Then the user would have the possibility to manually correct them.

Please consider:

  • In most cases, there are only a few “weird” UTF-8 characters that make EncodingHelper fail and fall back to UTF-8

  • The “weird” UTF-8 characters are often hard to recognize (see hypen example above)

See also:
stackoverflow.com/questions/231 … s-with-php - How to replace UTF-8 characters with similar-looking ASCII characters with PHP?

0 Likes

#3

The total idea of this package is to use UTF8, it helps by trying to alert, detect and convert when the document is in a different encoding than UTF8. If a document is in another encoding then should be converted to UTF8 not to a “translated” poor form of the actual document.

I btw understand your problem, and believe that the addition of something like that could be good in some situations when working with legacy systems. BTW, is still unrelated to what this package does.

The following message is reported by Sublime Text, not by Encoding Helper

“Not all characters are representable in Western (ISO 8859-1), falling back to UTF-8”

In fact, Encoding Helper does not help to open “UTF-8” as “ISO 8859-1”, it helps to open “ISO 8859-1” as “UTF-8” where ISO 8859-1 can be any encoding, but target encoding is always UTF8.

Examples of Encoding Helper messages on status bar:

“Detecting encoding…”
“Converted to UTF-8 from XYZ” (Encoding Helper was able to convert the document to UTF-8 when opening from xyz encoding)
“Opened as UTF8 detected ISO 8859-1 (document maybe broken)” (ST decided to open the document as UTF-8 but Encoding Helper says this document is probably in another encoding)

0 Likes