Sublime Forum

Arabic and Unicode Typing

#3

We do not currently support RTL languages in our text layout system.

0 Likes

#4

I use 3.1.1 build 3176 - the last as it says
WIN 10 - but it also not working on WIN8
I am using Fonts Opentype fonts which can support 64 languages
The problem should be clear now
The Sublime text doesn’t support Arabic, Persian, Urdu etc

0 Likes

#5

That is Bad
I like that Software
It has all features a developer can use.

Cheers

1 Like

#6

I think it should be easy fix for Sublime Development.
now SandardWindows library support writing/reading text from 64 languages

0 Likes

#7

No, it won’t, since it requires making significant changes to our custom editor component.

We support Linux and Mac too, so we need to support all three, not just Windows. However, the bigger issue is that standard OS text drawing is too slow for our performance target, so we have a custom implementation. We cache glyphs and do the layout ourself instead of sending full strings to the OS. At some point we need to augment our layout engine to support RTL, but it won’t be a simple or quick fix.

1 Like

#8

Unicode Bi-Directional (‘BiDi’) support (and algorithm) is very well defined in Unicode® Standard Annex #9. Also, Unicode org offers two reference implementations in C and in Java.
I addition, Microsoft has a Windows BiDi implementation library.
I am rather new to Linux to know if it offers such but some open-source projects support BiDi, e.g, MySQL WorkBench, so may be it can be used in ST.
Apple supports many if not all RTL languages natively so I assume they have their own BiDi implementation too.
I am sure that adding BiDi support to ST is by no means trivial. But it is rather essential. Nowadays the world is international and there are many RTL languages to support. Writing code involve typing, cutting/copying/pasting, checking, validating, comparing and otherwise processing BiDi text. So IMHO ST must support BiDi to remain relevant to developers even in the realm of LTR-only, if their products are to be used internationally.
I am personally not able to contribute code or make the required changes. However, as a native speaker of an RTL language (Hebrew) I can offer advice and be an alpha, beta and whatever tester.
How about making the Sublime Text BiDi support a crowd funded one. I for one will contribute to such. Hey people, cast your votes, put your money where your interest is! Support the cause of BiDi languages and make ST usable internationally!

1 Like

#9

MeirG8h

You’ve just explained it better than anybody could

Hope your message will echo the brilliant team of ST

Cheers

1 Like

#10

I think it should be easy fix for Sublime Development.
now SandardWindows library support writing/reading text from 64 languages

Definitely not, and most likely it will lead to all sorts of problems (not to mention the impact it would have on plugins developement).

The issue doesn’t just boil down to supporting right-to-left writing and Arabic fonts (which in itself would be difficoult for various reasons), the problem is how to introduce support for Arabic in all the various interfaces.

How should search, symbols indexing and Goto anywhere functionality handle Arabic?

There are serious Unicode collation problems to consider too. For exmaple, in collation (for sorting or fuzzy matching) should an Alif maqsurah be counted as an Alif or a Ya? Special characters like nabilah, which are not part of the alphabet but are used to sustain a hamazh, how should it be handled in sorting and fuzzy match?

Optionality of vowels and other symbols (sukun, shaddah, waslah, etc.) further complicate the issue. Should vowels be discounted in fuzzy search? Just to show how complicate this can be, have a look at these dedicate libraries:

Then there filenames representation, in search & replace, rename operation, indexing, etc.

There are really lots of issues in introducing Arabic support, and these would have ripercussion on user packages, plugins, etc.

If you’re only looking to add some inline Arabic in a text document, chances are you can do it by manually insert Unicode control characters to enforce some aspects of bidirectionality, ligatures, etc. But, from my experience, if you need to do some advanced editing in Arabic you should be looking at a dedicated editor.

@MeirG

However, as a native speaker of an RTL language (Hebrew) I can offer advice and be an alpha, beta and whatever tester.

The Hebrew alphabet poses less problems than Arabic for its letters are not joined together like they do in Arabic. Arabic applications that need to display classic texts employ over 400 glyphs to cover all the various graphical combination of the letters, and some of the letter combinations have even specific Unicode points (just think of the Lam-Alif combination, which is considered a letter in its own right). Basically, Arabic is cursive only, while Hebrew is the opposite.

1 Like

#11

Dear @tajmone,
We have no argument! Personally, I have very little knowledge of the Arabic language so I must admit that I was not aware how non-trivial the issue is! My arguments however are:

  1. BiDi support is essential
  2. And all your concerns are addressed, are well defined, and reference implementations of it exist
  3. IMHO the BiDi community should pay for the development of such. A few crowd financing platform exist, so technically it can be started right a way. But we have to first find a person or a team, versed in BiDi implementation, to take up the challenge. I am working on it.
    Are the core developers of ST ready to take up the challenge?
0 Likes

#12

I see all editors support RTL , Arabic and Others
Ultraedit , Notepad++ , EditPad Pro , Editpad, etc without such complications!

And you don’t need 400 characters to support Arabic - perhaps in Ful Editor not in Text Editor.
you need 128 maximum.

As Arabic user - you don’t need to worry about - sukun, shaddah, waslah -
These characters appear only in full editor like MS Word not in any text editor
What really can be done - is to allow writing with context analysis and that library is built in MS windows and also found with many S/W vendors.

Cheers

1 Like

#13

Dear @wbond,
Where on the priority scale of TODO features is BiDi support? Anywhere?
Would Financial support raise it to the top?
On the scale of 1 to 10 where 1 is a message spelling fix and 10 is a total rewrite of the rendering engine, does BiDi support rank?

0 Likes

#14

It doesn’t really matter the number of characters, but rather rewriting the layout engine to understand the concept of RTL, and dealing with selections, caret placements, etc.

Generally we don’t have a priority assigned to every feature enhancement or bug request. Instead we tend to sort things into stuff we are going to tackle in a dev cycle. We are working on finishing up a dev cycle now, so we wouldn’t start it now. We haven’t determined all of parts of the next cycle, but I we do have at least one thing waiting to merge in until after 3.2 is released.

I wouldn’t be the one to answer that since I don’t deal with finances, but I tend to doubt it. Priorities are set by a combination of available manpower, trying to make sure a dev cycle isn’t too long, and what elements of the program seem to need the most attention, and what will impact/benefit a decent portion of users.

I didn’t write our current text layout implementation, but I have worked on related code a little bit. My hunch is that implementing bidi layout and getting it correct is probably in the vicinity of 8-10.

1 Like

#15

@MeirG

  1. BiDi support is essential

I agree that BiDi support would be a good feature to add, but as I said this wouldn’t solve all the problems related to supporting Arabic.

  1. And all your concerns are addressed, are well defined, and reference implementations of it exist

I would hardly agree on this. First, there are different solutions to these issue and not a single one. Second, the Unicode approach to Arabic hasn’t rendered justice to the Arabic language at all — which is the reason why classical Arabic literature tools don’t use Unicode but rather a custom system for fonts and letters in order to correctly represent the Quran, poetry or classical prose.

As a simple example, take the alif+lem with an hamzah over the alif (as used, for example in articles). The typical glyph that you get in wordprocessor and webpages places the hamzah over lem instead of the alif. This is an error, and it’s due to the confusion about the lem-alif “letter”, which can be written in two different ways, one of them placing the lem on the left side of the alif, depending on whether it was written with a single stroke of the pen or two. (sorry, can’t provide glyphs for this, but the issue is a well known one)

This is a rather long subject to get into (and I don’t really want to get too much into it), but I’d like to make a point here, i.e. that when talking about Unicode Arabic we are really talking about an attempt to standardize a living language which has many variations. Consonant diactricts (which were not part of the Arabic language but were added later on) are not used consistently across the Arab world. So you’ll find that, for example, in North Africa the Qaf was traditionally (and still is, often) written with a single dot below the letter, instead of two above. These are acceptable variations on the glyphs, but the urge to standardize everything into something manageable in a clear-cut way would rather go for what is more common.

While these considerations might not affect implementing support for Arabic in an editor, they do point out the fact that universal solutions to the language should be taken with a pinch of salt. Embracing BiDi and the Unicode standard would still leave a huge amount of work when it comes to collation, search-&-replace, and other features.

As much as I’d love to see Arabic supported in ST and SM, I’m just worried that it would introduce some problems in the API that could affect user customization.

@Adamq81h

I see all editors support RTL , Arabic and Others
Ultraedit , Notepad++ , EditPad Pro , Editpad, etc without such complications!

It’s true, Npp can handle inline Arabic text much better than ST, and you only need to copy and paste some Arabic text into both editors to see the difference. That points to a better BiDi handling of text, but doesn’t adress the issue at hand in ST.

Just consider, for eg., syntax definitions. How is the syntax parser going to handle change of direction when parsing tokes? And, most important, how can BiDi be handled in syntax scopes without introducing huge complications in a syntax that is strongly RegEx driven?

And you don’t need 400 characters to support Arabic - perhaps in Ful Editor not in Text Editor. you need 128 maximum.

Yes, but then all packages must be able to account for these 128 Unicode points, and the BiDi controls that come with them. In which contexts are these supposed to be usable? Most scope RegExs are still relying on Ascii-only identifiers. So, one thing is to add some Arabic inside strings and code comments, another one is to expect to use it in markup language (say, markdown) and still have ST handle properly all the contents and their scopes.

As Arabic user - you don’t need to worry about - sukun, shaddah, waslah -
These characters appear only in full editor like MS Word not in any text editor
What really can be done - is to allow writing with context analysis and that library is built in MS windows and also found with many S/W vendors.

That’s exactly the point, most Arabic users (quite rightly) wouldn’t expect to have to worry about using a shaddah when doing a fuzzy search, they’d instead expect developers to take care of that by providing a smart interface.

For an Arabic native speaker it’s an obvious thought that writing a shaddah is unnecessary, but how is a search box (and it’s algorithm) then going to know that the looked up word has a double consonant? This is a rather important aspect for result matching, and the searched lexeme would have to matched against results that, in their turn, might or might not contain a shaddah themselves, and Unicode collation would need to manipulate the list of candidate matches to ensure that (for example) a shaddah is placed before a vowel, and not the other way round.

There are libraries to handle these cases, but they’d have to be integrated at all levels of the API where such interactions occur.

Microsoft has thousands of employees from all over the world, and can afford outsourcing complexity to third parties if needed, but at SunlimeHQ problably most developers won’t be Arabic speakers and yet will have to be able to test any new ST/SM features against Arabic if it became officially supported, and to account for it in any new feature. That’s more simply said than done.

I’m not saying it’s impossible, because obviously it’s being done on many softwares (but rarely done the right way, I dare say), but I’m objecting to the initial proposal that this would be a trivial change to implement — especially cross-platform. And I really can’t see how this wouldn’t affect user customization of ST, for it would pervade the whole API and affect all ST I/O dialogs, and I doubt that it can be handled in a manner that is fully transparent to end users either.

Definitely, I agree that when it comes to handling Arabic ST performs much more poorly than other editors (like those mentioned by @Adamq81h) and that it should at least display decently Arabic (and other BiDi) text (for example, in code comments and strings).

3 Likes

#16

Dear @wbond

I would really like to initiate a crowdfunded project for the implementation of BiDi support. For this I need two things:

  1. An estimate for the target sum of the funds to raise, required to implement BiDi
  2. A way to advertise the crowdfunding initiative among the ST3 community.

Whom should I talk to for each of the above two?

Regards,
Meir

2 Likes

#17

Hey everyone!
I emailed directly the Sublime HQ regarding the crowdfunding initiative of mine. I asked for:

  1. A pledge to use the funds raised to implement the BiDi support
  2. En estimate of the funds required to do so
  3. A pledge to advertise the initiative on the weekly digest of the ST3 forums
    The answer of Kari from Sublime HQ Pty Ltd was:
    “… we are not accepting third party donations at this time as a means of developing features.”
    Well, so much for my initiative…! I hope though that the “… at this time” clause leaves some hope for the future. I am here folks to pick up the glove at any time.
    BTW, Microsoft’s Visual Studio Code does support BiDi languages wonderfully, it is free and open code, does perform decently and is available on all three major OS, Windows, Linux and MacOS. FYI!
    Regards,
    Meir
0 Likes

What's the vision for Sublime Text?
#18

@MeirG, I want to thank you for your initiative, if it had worked out it would have been a beneficial for all ST users.

The answer of Kari from Sublime HQ Pty Ltd was:

“… we are not accepting third party donations at this time as a means of developing features.”

Interesting. I can imagine that there might be some etichical considerations for this. I’ve seen some commercial software with life-time licenses take that path to encourage development, but in these cases they are over 20-years old products with a well established users base, and since they aren’t going to step back from the original promise of life-long licenses they need some fundings to keep the product growing and up to date.

I do hope that Sublime HQ might at some point consider similar offers.

BTW, Microsoft’s Visual Studio Code does support BiDi languages wonderfully, it is free and open code, does perform decently and is available on all three major OS, Windows, Linux and MacOS. FYI!

With open source tools there is always the advantage that even if the main developers say “No” to a request, you can still implement a feature independently (either branching off from the original project with a new product, or just temporarily and then propose upstream integration).

I guess that if VSCode didn’t support BiDi, you could have start your crowd funding for the feature and outsource it to anyone willing to take it on. That’s the beauty of the open source “Bazar”.

Thanks anyhow.

2 Likes

#19

And 22 months later!
Still no changes!
Thank you for your attentions to your customers requests.

0 Likes

#20

very disappointed to see this issue not resolved, despite all the time that has passed. im switching to another editor.

0 Likes

#21

I stumbled upon this thread while trying to figure out how to enable Arabic in Sublime …

So back in the days I was part of the team who ported a Bidi engine to ICU (both ICU4C & ICU4J), mainly the Arabic shaping engine responsible for the shaping of the Arabic characters into Arabic words and strings, you can even find my name in the ICU open source files until now: https://github.com/unicode-org/icu/blob/main/icu4c/source/common/ushape.cpp

So my proposal if someone from Sublime is interested is that I can help in adding the Arabic support to Sublime free of charge, I might be a bit rusty but as long as we don’t have a due date I believe I can handle it during my free time

Just let me know if you are interested
Regards

0 Likes

#22

Unfortunately the shaping and grouping of characters is already solved, as we use the platform libraries to do that for us. It’s the custom layout, rendering, selection and caret logic that’s not built for RTL.

0 Likes