I also got that message of EncodingHelper saying
“Not all characters are representable in Western (ISO 8859-1), falling back to UTF-8”
IMO what EncodingHelper needs is a “best fit” mapping from characters in UTF-8 to characters in Latin-1 (ISO 8859-1).
Of course, there are many characters in UTF-8 that do not have any equivalent character in Latin-1.
But on the other hand side, there are many chararcters in UTF-8 that do have a (roughly) similar character in Latin-1, although they are not exactly the same, like e.g. all those slightly different kinds of hyphens:
‐ 8208 2010 HYPHEN
‑ 8209 2011 NON-BREAKING HYPHEN
‒ 8210 2012 FIGURE DASH
– 8211 2013 – EN DASH
— 8212 2014 — EM DASH
― 8213 2015 HORIZONTAL BAR
(Source: w3schools.com/charsets/ref_u … uation.asp )
It would be better if all of them would be mapped to the good old Latin-1 hypen (-), instead of simply doing nothing and falling back to UTF-8.
Another improvement for EncodingHelper would be if all characters that cannot be replaced during conversion at all would be marked somehow, e.g. “<?>”.
Then the user would have the possibility to manually correct them.
Please consider:
-
In most cases, there are only a few “weird” UTF-8 characters that make EncodingHelper fail and fall back to UTF-8
-
The “weird” UTF-8 characters are often hard to recognize (see hypen example above)
See also:
stackoverflow.com/questions/231 … s-with-php - How to replace UTF-8 characters with similar-looking ASCII characters with PHP?