如何在C#中获取unicode字符的十进制值? [英] How do i get the decimal value of a unicode character in C#?

查看:270
本文介绍了如何在C#中获取unicode字符的十进制值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在C#中获取Unicode字符的数字值?

How do i get the numeric value of a unicode character in C#?

例如,如果泰米尔字符( U + 0B85 ),输出应为2949(即0x0B85)

For example if tamil character (U+0B85) given, output should be 2949 (i.e. 0x0B85)

  • C++: How to get decimal value of a unicode character in c++
  • Java: How can I get a Unicode character's code?

某些字符需要多个代码点.在此示例UTF-16中,每个代码单元仍位于基本多语言平面中:

Some characters require multiple code points. In this example, UTF-16, each code unit is still in the Basic Multilingual Plane:

  • (即U+0072 U+0327 U+030C)
  • (即U+0072 U+0338 U+0327 U+0316 U+0317 U+0300 U+0301 U+0302 U+0308 U+0360)
  • (i.e. U+0072 U+0327 U+030C)
  • (i.e. U+0072 U+0338 U+0327 U+0316 U+0317 U+0300 U+0301 U+0302 U+0308 U+0360)

更大的一点是,一个字符"可能需要1个以上的UTF-16代码单元,它可能需要2个以上的UTF-16代码单元,它可能需要3个以上的UTF-16代码单元.

The larger point being that one "character" can require more than 1 UTF-16 code unit, it can require more than 2 UTF-16 code units, it can require more than 3 UTF-16 code units.

更大的一点是,一个字符"可能需要数十个unicode代码点.在C#中的UTF-16中,意味着大于1 char.一个字符可能需要17个char.

The larger point being that one "character" can require dozens of unicode code points. In UTF-16 in C# that means more than 1 char. One character can require 17 char.

我的问题是关于将char转换为UTF-16编码值.即使整个字符串17 char仅表示一个字符",我仍然想知道如何将每个UTF-16单位转换为数值.

My question was about converting char into a UTF-16 encoding value. Even if an entire string of 17 char only represents one "character", i still want to know how to convert each UTF-16 unit into a numeric value.

例如

String s = "அ";

int i = Unicode(s[0]);

其中Unicode返回Unicode标准定义的整数值,输入表达式的第一个字符.

Where Unicode returns the integer value, as defined by the Unicode standard, for the first character of the input expression.

推荐答案

它与Java基本相同.如果您将其作为char,则可以隐式转换为int:

It's basically the same as Java. If you've got it as a char, you can just convert to int implicitly:

char c = '\u0b85';

// Implicit conversion: char is basically a 16-bit unsigned integer
int x = c;
Console.WriteLine(x); // Prints 2949

如果您将其作为字符串的一部分,只需先获取单个字符:

If you've got it as part of a string, just get that single character first:

string text = GetText();
int x = text[2]; // Or whatever...

请注意,不在基本多语言平面中的字符将表示为两个UTF-16代码单元. .NET中有 支持查找完整的Unicode代码点,但它不是简单.

Note that characters not in the basic multilingual plane will be represented as two UTF-16 code units. There is support in .NET for finding the full Unicode code point, but it's not simple.

这篇关于如何在C#中获取unicode字符的十进制值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆