Sublime Forum

Sublime Text 3 Unicode Issue

#1

Hi, i found this while i was playing CTF

This is the text file i got from the CTF, it contains a Unicode string (line 6) (i will upload the text file below)

i copied the whole unicode string and it gave me 45 charactes, which is missing 10 characters left (i noticed while i was trying to decode it and after opening the text file with another text editors which gave me 65 chars)

https://s1.gifyu.com/images/a3bc491ba6c902fb7.gif

http://www111.zippyshare.com/v/4m1rAhbI/file.html

1 Like

#2

Quite interesting indeed.

/tmp Σ ipython
Python 3.6.4 (default, Jan  5 2018, 02:35:40)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: with open ("BOD_30079.txt", encoding="utf-8") as f:
   ...:     print(f.read())
   ...:
<<-----UTF-8 MESSAGE BOD_30079 BEGINS---->>

Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.
The system works in many languages. 该系统以许多语言工作. يعمل النظام في العديد من اللغات.
󠁈󠁔󠁂󠁻󠁴󠁲󠀱󠁴󠁨󠀳󠁭󠀱󠁵󠀵󠁟󠀱󠀴󠀹󠀹󠁽���� ���� �� �������� ��� ����� � ���� ��� ��
Το σύστημα λειτουργεί σε πολλές γλώσσες.Система работает на многих языках.

Steganography is the practice of concealing messages within other non-secret text or data.
The cover media may appear unremarkable at first glance and will require close investigation.

<<-----UTF-8 MESSAGE BOD_30079 ENDS----->>

In [2]: with open ("BOD_30079.txt", encoding="utf-8") as f:
   ...:     lines = f.readlines()

In [3]: lines
Out[3]:
['<<-----UTF-8 MESSAGE BOD_30079 BEGINS---->>\n',
 '\n',
 "Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.\n",
 '\n',
 'The system works in many languages. 该系统以许多语言工作. يعمل النظام في العديد من اللغات. \n',
 '\U000e0048\U000e0054\U000e0042\U000e007b\U000e0074\U000e0072\U000e0031\U000e0074\U000e0068\U000e0033\U000e006d\U000e0031\U000e0075\U000e0035\U000e005f\U000e0031\U000e0034\U000e0039\U000e0039\U000e007d���� ���� �� �������� ��� ����� � ���� ��� ��\n',
 'Το σύστημα λειτουργεί σε πολλές γλώσσες.Система работает на многих языках.\n',
 '\n',
 'Steganography is the practice of concealing messages within other non-secret text or data.\n',
 'The cover media may appear unremarkable at first glance and will require close investigation.\n',
 '\n',
 '<<-----UTF-8 MESSAGE BOD_30079 ENDS----->>\n']

In [4]: len(lines[5]) - 1
Out[4]: 65

It looks like both Python and ST discard the '\U000e0048\U000e0054\U000e0042\U000e007b\U000e0074\U000e0072\U000e0031\U000e0074\U000e0068\U000e0033\U000e006d\U000e0031\U000e0075\U000e0035\U000e005f\U000e0031\U000e0034\U000e0039\U000e0039\U000e007d' symbols when printing. The first one is https://www.fileformat.info/info/unicode/char/e0048/index.htm.

When saving the file, these symbols are still there however.

0 Likes

#4

but i cant copy fully 65 chars with the unicode using sublime :frowning:

0 Likes

#5

Hello! I’m sorry to say that you had the solution in front of your face. With this, I mean you can find the solution inside this post… enjoy!!

0 Likes