如何转换UNI code转义序列UNI code字符的字符串。NET [英] How do convert unicode escape sequences to unicode characters in a .NET string

查看:180
本文介绍了如何转换UNI code转义序列UNI code字符的字符串。NET的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设你已经加载一个文本文件转换为字符串,你想将所有UNI code逃进字符串内实际UNI code字符。

Say you've loaded a text file into a string and you'd like to convert all unicode escapes into actual unicode characters inside of the string.

例如:

以下是在单向code的组成字符\ u2320'的上半部分,这是下半'\ U2321'。

"The following is the top half of an integral character in unicode '\u2320', and this is the lower half '\U2321'."

我发现我的作品和答案,如果如下。

I found an answer that works for me and if follows.

推荐答案

这是我想出了答案。这很简单,用琴弦行之有效到至少severl万字。

This is the answer that I came up with. It's simple and works well with strings up to at least severl thousand characters.

例1:

Regex  rx = new Regex( @"\\[uU]([0-9A-F]{4})" );
result = rx.Replace( result, match => ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString() );

例2:

Regex  rx = new Regex( @"\\[uU]([0-9A-F]{4})" );
result = rx.Replace( result, delegate (Match match) { return ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); } );

第一个例子是使用一个lambda前pression(C#3.0)而作出的repacement,第二使用 委托其应与C#2.0。

The first example shows the repacement being made using a Lambda Expression (C# 3.0) and the second uses a delegate which should work with C# 2.0.

要打破这是怎么回事就在这里,我们首先创建一个常规的前pression:

To break down what's going on here, first we create a regular expression:

new Regex( @"\\[uU]([0-9A-F]{4})" );

然后我们调用替换()以字符串'结果'和匿名方法(拉姆达EX pression在第一个例子,在第二委托 - 委托也可以是一个普通的方法),其将每个常规这是字符串中找到前pression。

Then we call Replace() with the string 'result' and an anonymous method (Lambda expression in the first example and the delegate in the second - the delegate could also be a regular method) that converts each regular expression that is found in the string.

单向code转义是这样处理的:

The unicode escape is processed like this:

((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); } );

获取字符串再presenting逃逸的号码的一部分(跳过前两个字符)。

Get the string representing the number part of the escape (skip the first two characters).

      match.Value.Substring(2)

解析使用Int32.Parse(),它接受字符串和数字格式解析()函数应该期望在这种情况下是一个十六进制数字的字符串。

Parse that string using Int32.Parse() which takes the string and the number format that Parse() function should expect which in this case is a hex number.

      NumberStyles.HexNumber

然后我们投得到的数字为单code字

Then we cast the resulting number to a unicode character

      (char)

和finaly我们所说的ToString()在UNI code字这给了我们它的弦重新presentation这是值传递回替换()

and finaly we call ToString() on the unicode character which gives us it's string representation which is the value passed back to Replace()

      .ToString()

请注意,抓取文本,而不是要与一个子串调用你可以使用匹配参数的GroupCollection和SUBEX pressions在常规EX pression转换捕捉刚 数(2320),但是这更复杂,不易阅读。

Note, instead of grabbing the text to be converted with a Substring call you could use the match parameter's GroupCollection, and a subexpressions in the regular expression to capture just the number ('2320') but that's more complicated and less readable.

这篇关于如何转换UNI code转义序列UNI code字符的字符串。NET的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆