如何检测字符是否属于从右到左的语言? [英] How to detect whether a character belongs to a Right To Left language?

查看:646
本文介绍了如何检测字符是否属于从右到左的语言?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是一个很好的方式告诉一个字符串是否包含在从右到左的语言文字。

What is a good way to tell whether a string contains text in a Right To Left language.

我发现这个<一个href=\"http://stackoverflow.com/questions/1847972/how-can-i-detect-the-flowdirection-righttoleft-or-lefttoright-automatically-in-wp\">question这暗示着下面的方法:

I have found this question which suggests the following approach:

public bool IsArabic(string strCompare)
{
  char[] chars = strCompare.ToCharArray();
  foreach (char ch in chars)
    if (ch >= '\u0627' && ch <= '\u0649') return true;
  return false;
}

虽然这可能对阿拉伯语工作这似乎并没有掩盖其他RTL语言,如希伯来语。有没有办法知道一个特定字符属于RTL语言通用的方式?

While this may work for Arabic this doesn't seem to cover other RTL languages such as Hebrew. Is there a generic way to know that a particular character belongs to a RTL language?

推荐答案

统一code字符具有与其相关的不同属性。这些性能不能从code点导出的;你需要一个表,告诉你,如果一个角色都有一定的财产或没有。

Unicode characters have different properties associated with them. These properties cannot be derived from the code point; you need a table that tells you if a character has a certain property or not.

您有兴趣与双向财产R或AL(RandALCat)字符

You are interested in characters with bidirectional property "R" or "AL" (RandALCat).

一个RandALCat字符与字符明确从右到左的方向性。

A RandALCat character is a character with unambiguously right-to-left directionality.

下面是完整的列表,统一code 3.2(从 RFC 3454 )的:

Here's the complete list as of Unicode 3.2 (from RFC 3454):


D. Bidirectional tables

D.1 Characters with bidirectional property "R" or "AL"

----- Start Table D.1 -----
05BE
05C0
05C3
05D0-05EA
05F0-05F4
061B
061F
0621-063A
0640-064A
066D-066F
0671-06D5
06DD
06E5-06E6
06FA-06FE
0700-070D
0710
0712-072C
0780-07A5
07B1
200F
FB1D
FB1F-FB28
FB2A-FB36
FB38-FB3C
FB3E
FB40-FB41
FB43-FB44
FB46-FBB1
FBD3-FD3D
FD50-FD8F
FD92-FDC7
FDF0-FDFC
FE70-FE74
FE76-FEFC
----- End Table D.1 -----

下面是一些code,以获得完整的名单的统一code ++ 6.0:

Here's some code to get the complete list as of Unicode 6.0:

var url = "http://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt";

var query = from record in new WebClient().DownloadString(url).Split('\n')
            where !string.IsNullOrEmpty(record)
            let properties = record.Split(';')
            where properties[4] == "R" || properties[4] == "AL"
            select int.Parse(properties[0], NumberStyles.AllowHexSpecifier);

foreach (var codepoint in query)
{
    Console.WriteLine(codepoint.ToString("X4"));
}

请注意,这些值是单向code code点。在C#/。NET的字符串是UTF-16连接codeD,需要转换成统一code code分排名第一(见的 Char.ConvertToUtf32 )。下面是一个检查,如果一个字符串包含的方法至少有一个RandALCat字符:

Note that these values are Unicode code points. Strings in C#/.NET are UTF-16 encoded and need to be converted to Unicode code points first (see Char.ConvertToUtf32). Here's a method that checks if a string contains at least one RandALCat character:

static void IsAnyCharacterRightToLeft(string s)
{
    for (var i = 0; i < s.Length; i += char.IsSurrogatePair(s, i) ? 2 : 1)
    {
        var codepoint = char.ConvertToUtf32(s, i);
        if (IsRandALCat(codepoint))
        {
            return true;
        }
    }
    return false;
}

这篇关于如何检测字符是否属于从右到左的语言?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆