从左到右解析阿拉伯语/RTL 文本 [英] Parsing through Arabic / RTL text from left to right

查看:43
本文介绍了从左到右解析阿拉伯语/RTL 文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个 RTL 语言(例如阿拉伯语)的字符串,其中包含一些英语:

Let's say I have a string in an RTL language such as Arabic with some English chucked in:

string s = "Test:لطيفة;اليوم;a;b"

注意字符串中有分号.当我使用像 string[] spl = s.Split(';'); 这样的拆分命令时,一些字符串以相反的顺序保存.事情是这样的:

Notice there are semicolons in the string. When I use the Split command like string[] spl = s.Split(';');, then some of the strings are saved in reverse order. This is what happens:

pspl[0] = " 测试:لطيفة"
spl[1] = " "اليوم
spl[2] = "a"
spl[3] = "b"

‏‏‏‏‏spl[0] = "‏Test:لطيفة"
spl[1] = "‏"اليوم
spl[2] = ‏"a"
spl[3] = ‏"b"

以上与原作相比,乱七八糟.相反,我希望得到这个:

The above is out of order compared to the original. Instead, I expect to get this:

spl[0] = "Test:اليوم"
spl[1] = " لطيفة"
spl[2] = "a"
spl[3] = "b"

‏‏spl[0] = ‏"Test:اليوم"
spl[1] = "‏لطيفة"
spl[2] = ‏"a"
spl[3] = ‏"b"

我准备编写自己的拆分函数.但是,字符串中的字符也以相反的顺序解析,所以我又回到了第一个.我只想浏览屏幕上显示的每个字符.

I'm prepared to write my own split function. However, the chars in the string also parse in reverse order, so I'm back to square one. I just want to go through each character as it's shown on the screen.

推荐答案

就当前字符串而言,单词 لطيفة 存储在单词 اليوم 之前;اليوم 显示为第一个"(即更靠左)这一事实只是 Unicode 双向算法在显示文本时的(正确)结果.

As your string currently stands, the word لطيفة is stored prior to the word اليوم; the fact that اليوم is displayed "first" (that is, further to the left), is just a (correct) result of the Unicode Bidirectional Algorithm in displaying the text.

即:您以 ("Test:لطيفة;اليوم;a;b") 开头的字符串是用户输入Test:"、لطيفة、;"、اليوم 和的结果";a;b".因此,C# 拆分它的方式实际上反映了创建字符串的方式.只是它的创建方式并没有体现在字符串的显示上,因为两个连续的阿拉伯语单词在显示时被当作一个单元来处理.

That is: the string you start with ("Test:لطيفة;اليوم;a;b") is the result of the user entering "Test:", then لطيفة, then ";", then اليوم, and then ";a;b". Thus, the way C# is splitting it does in fact mirror the way that the string is created. It's just that the way it is created is not reflected in the display of the string, because the two consecutive Arabic words are treated as a single unit when they are displayed.

如果您想要一个字符串以从左到右的顺序显示阿拉伯语单词,中间有分号,同时还以相同的顺序存储单词,那么您应该放置一个从左到右的标记 (U+200E) 在分号之后.这将有效地将每个阿拉伯语单词作为其自己的单位进行分割,然后双向算法将分别处理每个单词.

If you'd like a string to display Arabic words in left-to-right order with semicolons in between, while also storing the words in that same order, then you should put a Left-to-Right mark (U+200E) after the semicolon. This will effectively section off each Arabic word as its own unit, and the Bidirectional Algorithm will then treat each word separately.

例如,以下代码以与您使用的字符串相同的字符串开头(添加一个从左到右的标记),但它会根据您期望的方式将其拆分(即 spl[0] = "Test:اليوم",和 spl[1] = " لطيفة"):

For instance, the following code begins with a string identical to the one you use (with the addition of a single Left-to-Right mark), yet it will split it up according to the way that you are expecting it to (that is, spl[0] = ‏"Test:اليوم", and spl[1] = "‏لطيفة"):

static void Main(string[] args) {
    string s = "Test:اليوم;u200Eلطيفة;a;b";
    string[] spl = s.Split(';');
}

这篇关于从左到右解析阿拉伯语/RTL 文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆