Quite interesting indeed.
/tmp Σ ipython
Python 3.6.4 (default, Jan 5 2018, 02:35:40)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: with open ("BOD_30079.txt", encoding="utf-8") as f:
...: print(f.read())
...:
<<-----UTF-8 MESSAGE BOD_30079 BEGINS---->>
Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.
The system works in many languages. 该系统以许多语言工作. يعمل النظام في العديد من اللغات.
���� ���� �� �������� ��� ����� � ���� ��� ��
Το σύστημα λειτουργεί σε πολλές γλώσσες.Система работает на многих языках.
Steganography is the practice of concealing messages within other non-secret text or data.
The cover media may appear unremarkable at first glance and will require close investigation.
<<-----UTF-8 MESSAGE BOD_30079 ENDS----->>
In [2]: with open ("BOD_30079.txt", encoding="utf-8") as f:
...: lines = f.readlines()
In [3]: lines
Out[3]:
['<<-----UTF-8 MESSAGE BOD_30079 BEGINS---->>\n',
'\n',
"Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.\n",
'\n',
'The system works in many languages. 该系统以许多语言工作. يعمل النظام في العديد من اللغات. \n',
'\U000e0048\U000e0054\U000e0042\U000e007b\U000e0074\U000e0072\U000e0031\U000e0074\U000e0068\U000e0033\U000e006d\U000e0031\U000e0075\U000e0035\U000e005f\U000e0031\U000e0034\U000e0039\U000e0039\U000e007d���� ���� �� �������� ��� ����� � ���� ��� ��\n',
'Το σύστημα λειτουργεί σε πολλές γλώσσες.Система работает на многих языках.\n',
'\n',
'Steganography is the practice of concealing messages within other non-secret text or data.\n',
'The cover media may appear unremarkable at first glance and will require close investigation.\n',
'\n',
'<<-----UTF-8 MESSAGE BOD_30079 ENDS----->>\n']
In [4]: len(lines[5]) - 1
Out[4]: 65
It looks like both Python and ST discard the '\U000e0048\U000e0054\U000e0042\U000e007b\U000e0074\U000e0072\U000e0031\U000e0074\U000e0068\U000e0033\U000e006d\U000e0031\U000e0075\U000e0035\U000e005f\U000e0031\U000e0034\U000e0039\U000e0039\U000e007d'
symbols when printing. The first one is https://www.fileformat.info/info/unicode/char/e0048/index.htm.
When saving the file, these symbols are still there however.