通过阿拉伯语/ RTL文本解析从左至右 [英] Parsing through Arabic / RTL text from left to right

查看:338
本文介绍了通过阿拉伯语/ RTL文本解析从左至右的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说,我有一个RTL语言的字符串,如阿拉伯语与一些英文打发:

Let's say I have a string in an RTL language such as Arabic with some English chucked in:

字符串s =测试:لطيفة ;اليوم; A; b

注意有字符串中的分号。当我使用像分割命令字符串[] SPL = s.Split(';'); ,那么某些字符串保存在相反的顺序。这是发生了什么:

Notice there are semicolons in the string. When I use the Split command like string[] spl = s.Split(';');, then some of the strings are saved in reverse order. This is what happens:

SPL [0] =测试:لطيفة结果
SPL [1] = اليوم结果
SPL [2] =A结果,
SPL [3] =b的

‏‏‏‏‏spl[0] = "‏Test:لطيفة"
spl[1] = "‏"اليوم
spl[2] = ‏"a"
spl[3] = ‏"b"

的以上是无序比较原始。相反,我希望得到这样的:

The above is out of order compared to the original. Instead, I expect to get this:

SPL [0] =测试:اليوم结果
SPL [1] = لطيفة结果,
SPL [2] =A结果,
SPL [3] =b的

‏‏spl[0] = ‏"Test:اليوم"
spl[1] = "‏لطيفة"
spl[2] = ‏"a"
spl[3] = ‏"b"

我M准备写我自己的分裂功能。然而,在字符串中的字符也解析以相反的顺序,所以我又回到了起点。我只是想通过每个字符,因为它是在屏幕上显示。

I'm prepared to write my own split function. However, the chars in the string also parse in reverse order, so I'm back to square one. I just want to go through each character as it's shown on the screen.

推荐答案

由于您的字符串目前维持,字لطيفة是之前的单词اليوم存储;该اليوم显示第一(即,进一步向左侧),但事实是仅有一个(正确的)Unicode双向算法在显示文本结果

As your string currently stands, the word لطيفة is stored prior to the word اليوم; the fact that اليوم is displayed "first" (that is, further to the left), is just a (correct) result of the Unicode Bidirectional Algorithm in displaying the text.

这就是:你开始(测试:لطيفة;اليوم; A; b)的字符串用户的输入结果测试,然后لطيفة,然后点;,然后اليوم,然后一; b。因此,这样C#是分裂,它实际反映该字符串的创建方式。这只是它的创建方式没有反映在该字符串的显示,因为显示,当他们在两个连续的阿拉伯语单词作为单个单元处理。

That is: the string you start with ("Test:لطيفة;اليوم;a;b") is the result of the user entering "Test:", then لطيفة, then ";", then اليوم, and then ";a;b". Thus, the way C# is splitting it does in fact mirror the way that the string is created. It's just that the way it is created is not reflected in the display of the string, because the two consecutive Arabic words are treated as a single unit when they are displayed.

如果你想要一个字符串中左到右的顺序在两者之间分号,同时还存放词语相同的顺序,然后显示阿拉伯语单词你分号后,应该把一个左至右符号(U + 200E)。那么这将有效地关闭部分各阿拉伯字作为自己的单位,以及双向算法将分别对待每一个字。

If you'd like a string to display Arabic words in left-to-right order with semicolons in between, while also storing the words in that same order, then you should put a Left-to-Right mark (U+200E) after the semicolon. This will effectively section off each Arabic word as its own unit, and the Bidirectional Algorithm will then treat each word separately.

例如,下面的代码开头的字符串相同的使用(加单左至右符号的)之一,但它会分裂它根据你正在它期望(即,声压级[0] =测试:اليوم,和SPL [1] =لطيفة):该方法

For instance, the following code begins with a string identical to the one you use (with the addition of a single Left-to-Right mark), yet it will split it up according to the way that you are expecting it to (that is, spl[0] = ‏"Test:اليوم", and spl[1] = "‏لطيفة"):

static void Main(string[] args) {
    string s = "Test:اليوم;\u200Eلطيفة;a;b";
    string[] spl = s.Split(';');
}

这篇关于通过阿拉伯语/ RTL文本解析从左至右的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆