阅读UTF8 / UNI code字符转义ASCII序列 [英] Read UTF8/UNICODE characters from an escaped ASCII sequence
问题描述
我有一个文件下面的名字,我需要阅读的字符串作为UTF8-CN codeD字符串,所以从这样的:
I have the following name in a file and I need to read the string as a UTF8-encoded string, so from this:
test_\303\246\303\270\303\245.txt
我需要获取以下信息:
I need to obtain the following:
test_æøå.txt
你知道如何实现这一目标使用C#?
Do you know how to achieve this using C#?
推荐答案
假设你有这样的字符串:
Assuming you have this string:
string input = "test_\\303\\246\\303\\270\\303\\245.txt";
即。从字面上
I.E. literally
test_\303\246\303\270\303\245.txt
您可以这样做:
string input = "test_\\303\\246\\303\\270\\303\\245.txt";
Encoding iso88591 = Encoding.GetEncoding(28591); //See note at the end of answer
Encoding utf8 = Encoding.UTF8;
//Turn the octal escape sequences into characters having codepoints 0-255
//this results in a "binary string"
string binaryString = Regex.Replace(input, @"\\(?<num>[0-7]{3})", delegate(Match m)
{
String oct = m.Groups["num"].ToString();
return Char.ConvertFromUtf32(Convert.ToInt32(oct, 8));
});
//Turn the "binary string" into bytes
byte[] raw = iso88591.GetBytes(binaryString);
//Read the bytes into C# string
string output = utf8.GetString(raw);
Console.WriteLine(output);
//test_æøå.txt
由二进制串,我的意思是只包括与codepoints 0-255字符的字符串。因此,它相当于一个穷人版的字节[]
其中,
您检索字的索引$ C $连接点我
,而不是字节
的值字节[]
在指数我
(这是我们在JavaScript中做了几年前)。由于ISO-8859-1地图
正是第256 UNI code code点为一个字节,这是完美的,用于将二进制串成字节[]
。
by "binary string", I mean a string consisting only of characters with codepoints 0-255. It therefore amounts to a poor man's byte[]
where
you retrieve the codepoint of character at index i
, instead of a byte
value in a byte[]
at index i
(This is what we did in javascript a few years ago). Because iso-8859-1 maps
exactly the first 256 unicode code points into a single byte, it's perfect for converting a "binary string" into a byte[]
.
这篇关于阅读UTF8 / UNI code字符转义ASCII序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!