如何转换UNI code转义序列UNI code字符的字符串。NET [英] How do convert unicode escape sequences to unicode characters in a .NET string
问题描述
假设你已经加载一个文本文件转换为字符串,你想将所有UNI code逃进字符串内实际UNI code字符。
Say you've loaded a text file into a string and you'd like to convert all unicode escapes into actual unicode characters inside of the string.
例如:
以下是在单向code的组成字符\ u2320'的上半部分,这是下半'\ U2321'。
"The following is the top half of an integral character in unicode '\u2320', and this is the lower half '\U2321'."
我发现我的作品和答案,如果如下。
I found an answer that works for me and if follows.
推荐答案
这是我想出了答案。这很简单,用琴弦行之有效到至少severl万字。
This is the answer that I came up with. It's simple and works well with strings up to at least severl thousand characters.
例1:
Regex rx = new Regex( @"\\[uU]([0-9A-F]{4})" );
result = rx.Replace( result, match => ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString() );
例2:
Regex rx = new Regex( @"\\[uU]([0-9A-F]{4})" );
result = rx.Replace( result, delegate (Match match) { return ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); } );
第一个例子是使用一个lambda前pression(C#3.0)而作出的repacement,第二使用 委托其应与C#2.0。
The first example shows the repacement being made using a Lambda Expression (C# 3.0) and the second uses a delegate which should work with C# 2.0.
要打破这是怎么回事就在这里,我们首先创建一个常规的前pression:
To break down what's going on here, first we create a regular expression:
new Regex( @"\\[uU]([0-9A-F]{4})" );
然后我们调用替换()以字符串'结果'和匿名方法(拉姆达EX pression在第一个例子,在第二委托 - 委托也可以是一个普通的方法),其将每个常规这是字符串中找到前pression。
Then we call Replace() with the string 'result' and an anonymous method (Lambda expression in the first example and the delegate in the second - the delegate could also be a regular method) that converts each regular expression that is found in the string.
单向code转义是这样处理的:
The unicode escape is processed like this:
((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); } );
获取字符串再presenting逃逸的号码的一部分(跳过前两个字符)。
Get the string representing the number part of the escape (skip the first two characters).
match.Value.Substring(2)
解析使用Int32.Parse(),它接受字符串和数字格式解析()函数应该期望在这种情况下是一个十六进制数字的字符串。
Parse that string using Int32.Parse() which takes the string and the number format that Parse() function should expect which in this case is a hex number.
NumberStyles.HexNumber
然后我们投得到的数字为单code字
Then we cast the resulting number to a unicode character
(char)
和finaly我们所说的ToString()在UNI code字这给了我们它的弦重新presentation这是值传递回替换()
and finaly we call ToString() on the unicode character which gives us it's string representation which is the value passed back to Replace()
.ToString()
请注意,抓取文本,而不是要与一个子串调用你可以使用匹配参数的GroupCollection和SUBEX pressions在常规EX pression转换捕捉刚 数(2320),但是这更复杂,不易阅读。
Note, instead of grabbing the text to be converted with a Substring call you could use the match parameter's GroupCollection, and a subexpressions in the regular expression to capture just the number ('2320') but that's more complicated and less readable.
这篇关于如何转换UNI code转义序列UNI code字符的字符串。NET的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!