对于字符不能被解析为int为什么Char.IsDigit返回true? [英] Why Char.IsDigit returns true for chars which can't be parsed to int?

查看:158
本文介绍了对于字符不能被解析为int为什么Char.IsDigit返回true?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常使用 Char.IsDigit 来检查字符是一个数字是特别方便,在LINQ查询到pre-检查 int.Parse 如下:123。所有(Char.IsDigit)

但也有它的字符是数字,但不能被解析到 INT 5

  //真
布尔ISDIGIT = Char.IsDigit('5');

VAR文化= CultureInfo.GetCultures(CultureTypes.SpecificCultures);
INT NUM;
// 假
布尔isIntForAnyCulture =文化
    。任何(C => int.TryParse('5'.ToString(),NumberStyles.Any,C,出NUM));
 

这是为什么?是我的 int.Parse - ?通过 Char.IsDigit pre检查这样不正确

有310个字符这是个数字:

 名单,其中,焦炭> digitList = Enumerable.Range(0,UInt16.MaxValue)
   。选择(ⅰ=> Convert.ToChar(i))的
   。凡(C => Char.IsDigit(C))
   .ToList();
 

下面的 Char.IsDigit 在.NET 4的实施(ILSpy):

 公共静态布尔ISDIGIT(字符C)
{
    如果(char.IsLatin1(c))的
    {
        返回c取代; ='0'和;&安培; ℃下='9';
    }
    返回CharUni codeInfo.GetUni codeCategory(三)==统一codeCategory.DecimalDigitNumber;
}
 

那么,为什么会有属于<一个字符href="http://msdn.microsoft.com/en-us/library/system.globalization.uni$c$ccategory%28v=vs.110%29.aspx"相对=nofollow> DecimalDigitNumber -category (十进制数字字符,也就是说,范围为0到9的字符... 的),不能在任何文化中被解析到一个 INT

解决方案

这是因为它正在检查所有的统一code数字数字,十进制数的范畴,这里列出的:

http://www.fileformat.info/信息/ UNI code /分类/ ND / list.htm

这并不意味着它是在当前语言环境的有效数字字符。事实上使用 int.Parse(),你只能解析区域设置的普通的英语数字,而不管。

例如,这个的的工作:

 内部测试= int.Parse(3,CultureInfo.GetCultureInfo(AR));
 

尽管 3 是一个有效的阿拉伯数字字符,而AR是阿拉伯语的区域设置标识符。

微软的文章如何:解析统一code数字指出:

  
    

这是.NET Framework的解析为小数是ASCII数字0到9,由code指定唯一的统一code位值U + 0030到U + 0039。 .NET框架解析所有其他的Uni code位的字符。

  

不过,请注意,您可以使用 char.GetNumericValue() 以一个单code数字字符转换成等价的数值为双。

原因返回的值是一个双,而不是一个int是因为事情是这样的:

  Console.WriteLine(char.GetNumericValue('四分之一')); //输出0.25
 

您可以使用像这样把所有的数字字符串中的字符到其对应的ASCII码:

 公共字符串ConvertNumericChars(字符串输入)
{
    StringBuilder的输出=新的StringBuilder();

    的foreach(在输入字符CH)
    {
        如果(char.IsDigit(CH))
        {
            双值= char.GetNumericValue(CH);

            如果((值GT = 0)及及(值小于= 9)及及(价值==(INT)值))
            {
                output.Append((炭)('0'+(INT)值));
                继续;
            }
        }

        output.Append(CH);
    }

    返回output.ToString();
}
 

I often use Char.IsDigit to check if a char is a digit which is especially handy in LINQ queries to pre-check int.Parse as here: "123".All(Char.IsDigit).

But there are chars which are digits but which can't be parsed to int like ۵.

// true
bool isDigit = Char.IsDigit('۵'); 

var cultures = CultureInfo.GetCultures(CultureTypes.SpecificCultures);
int num;
// false
bool isIntForAnyCulture = cultures
    .Any(c => int.TryParse('۵'.ToString(), NumberStyles.Any, c, out num)); 

Why is that? Is my int.Parse-precheck via Char.IsDigit thus incorrect?

There are 310 chars which are digits:

List<char> digitList = Enumerable.Range(0, UInt16.MaxValue)
   .Select(i => Convert.ToChar(i))
   .Where(c => Char.IsDigit(c))
   .ToList(); 

Here's the implementation of Char.IsDigit in .NET 4 (ILSpy):

public static bool IsDigit(char c)
{
    if (char.IsLatin1(c))
    {
        return c >= '0' && c <= '9';
    }
    return CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber;
}

So why are there chars that belong to the DecimalDigitNumber-category("Decimal digit character, that is, a character in the range 0 through 9...") which can't be parsed to an int in any culture?

解决方案

It's because it is checking for all digits in the Unicode "Number, Decimal Digit" category, as listed here:

http://www.fileformat.info/info/unicode/category/Nd/list.htm

It doesn't mean that it is a valid numeric character in the current locale. In fact using int.Parse(), you can ONLY parse the normal English digits, regardless of the locale setting.

For example, this doesn't work:

int test = int.Parse("٣", CultureInfo.GetCultureInfo("ar"));

Even though ٣ is a valid Arabic digit character, and "ar" is the Arabic locale identifier.

The Microsoft article "How to: Parse Unicode Digits" states that:

The only Unicode digits that the .NET Framework parses as decimals are the ASCII digits 0 through 9, specified by the code values U+0030 through U+0039. The .NET Framework parses all other Unicode digits as characters.

However, note that you can use char.GetNumericValue() to convert a unicode numeric character to its numeric equivalent as a double.

The reason the return value is a double and not an int is because of things like this:

Console.WriteLine(char.GetNumericValue('¼')); // Prints 0.25

You could use something like this to convert all numeric characters in a string into their ASCII equivalent:

public string ConvertNumericChars(string input)
{
    StringBuilder output = new StringBuilder();

    foreach (char ch in input)
    {
        if (char.IsDigit(ch))
        {
            double value = char.GetNumericValue(ch);

            if ((value >= 0) && (value <= 9) && (value == (int)value))
            {
                output.Append((char)('0'+(int)value));
                continue;
            }
        }

        output.Append(ch);
    }

    return output.ToString();
}

这篇关于对于字符不能被解析为int为什么Char.IsDigit返回true?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆