如何将Unicode转义序列转换为.NET字符串中的Unicode字符? [英] How do I convert Unicode escape sequences to Unicode characters in a .NET string?

查看:122
本文介绍了如何将Unicode转义序列转换为.NET字符串中的Unicode字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您已将文本文件加载到字符串中,并且希望将所有Unicode转义转换为字符串内的实际Unicode字符。

Say you've loaded a text file into a string, and you'd like to convert all Unicode escapes into actual Unicode characters inside of the string.

示例:


以下是整数的上半部分Unicode'\u2320',这是下半部分'\U2321'。

"The following is the top half of an integral character in Unicode '\u2320', and this is the lower half '\U2321'."


推荐答案

答案很简单,并且适用于至少数千个字符的字符串。

The answer is simple and works well with strings up to at least several thousand characters.

示例1:

Regex  rx = new Regex( @"\\[uU]([0-9A-F]{4})" );
result = rx.Replace( result, match => ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString() );

示例2:

Regex  rx = new Regex( @"\\[uU]([0-9A-F]{4})" );
result = rx.Replace( result, delegate (Match match) { return ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); } );

第一个示例显示使用lambda表达式(C#3.0)进行替换,第二个示例使用a

The first example shows the replacement being made using a lambda expression (C# 3.0) and the second uses a delegate which should work with C# 2.0.

要分解此处发生的情况,首先我们创建一个正则表达式:

To break down what's going on here, first we create a regular expression:

new Regex( @"\\[uU]([0-9A-F]{4})" );

然后我们用字符串'result'和一个匿名方法(在第一个例子,第二个委托-委托也可以是一个正则方法),它转换在字符串中找到的每个正则表达式。

Then we call Replace() with the string 'result' and an anonymous method (lambda expression in the first example and the delegate in the second - the delegate could also be a regular method) that converts each regular expression that is found in the string.

处理Unicode转义像这样:

The Unicode escape is processed like this:

((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); });

获取代表转义数字部分的字符串(跳过前两个字符)。

Get the string representing the number part of the escape (skip the first two characters).

match.Value.Substring(2)

使用Int32.Parse()解析该字符串,该字符串采用Parse()函数应该期望的字符串和数字格式,在这种情况下为十六进制数字。

Parse that string using Int32.Parse() which takes the string and the number format that the Parse() function should expect which in this case is a hex number.

NumberStyles.HexNumber

然后我们将结果数字转换为Unicode字符:

Then we cast the resulting number to a Unicode character:

(char)

最后,我们在Unicode字符上调用ToString(),它为我们提供了其字符串表示形式,该字符串表示形式是传递回Replace()的值:

And finally we call ToString() on the Unicode character which gives us its string representation which is the value passed back to Replace():

.ToString()

注意:可以使用match参数的GroupCollection和正则表达式中的子表达式来捕获数字(而不是使用Substring调用来获取要转换的文本),而不是仅使用数字('232 0),但这更加复杂且可读性较差。

Note: Instead of grabbing the text to be converted with a Substring call you could use the match parameter's GroupCollection, and a subexpressions in the regular expression to capture just the number ('2320'), but that's more complicated and less readable.

这篇关于如何将Unicode转义序列转换为.NET字符串中的Unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆