阅读UTF8 / UNI code字符转义ASCII序列 [英] Read UTF8/UNICODE characters from an escaped ASCII sequence

查看:321
本文介绍了阅读UTF8 / UNI code字符转义ASCII序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件下面的名字,我需要阅读的字符串作为UTF8-CN codeD字符串,所以从这样的:

I have the following name in a file and I need to read the string as a UTF8-encoded string, so from this:

test_\303\246\303\270\303\245.txt

我需要获取以下信息:

I need to obtain the following:

test_æøå.txt

你知道如何实现这一目标使用C#?

Do you know how to achieve this using C#?

推荐答案

假设你有这样的字符串:

Assuming you have this string:

string input = "test_\\303\\246\\303\\270\\303\\245.txt";

即。从字面上

I.E. literally

test_\303\246\303\270\303\245.txt

您可以这样做:

string input = "test_\\303\\246\\303\\270\\303\\245.txt";
Encoding iso88591 = Encoding.GetEncoding(28591); //See note at the end of answer
Encoding utf8 = Encoding.UTF8;


//Turn the octal escape sequences into characters having codepoints 0-255
//this results in a "binary string"
string binaryString = Regex.Replace(input, @"\\(?<num>[0-7]{3})", delegate(Match m)
{
    String oct = m.Groups["num"].ToString();
    return Char.ConvertFromUtf32(Convert.ToInt32(oct, 8));

});

//Turn the "binary string" into bytes
byte[] raw = iso88591.GetBytes(binaryString);

//Read the bytes into C# string
string output = utf8.GetString(raw);
Console.WriteLine(output);
//test_æøå.txt

由二进制串,我的意思是只包括与codepoints 0-255字符的字符串。因此,它相当于一个穷人版的字节[] 其中, 您检索字的索引$ C $连接点,而不是字节的值字节[] 在指数(这是我们在JavaScript中做了几年前)。由于ISO-8859-1地图 正是第256 UNI code code点为一个字节,这是完美的,用于将二进制串成字节[]

by "binary string", I mean a string consisting only of characters with codepoints 0-255. It therefore amounts to a poor man's byte[] where you retrieve the codepoint of character at index i, instead of a byte value in a byte[] at index i (This is what we did in javascript a few years ago). Because iso-8859-1 maps exactly the first 256 unicode code points into a single byte, it's perfect for converting a "binary string" into a byte[].

这篇关于阅读UTF8 / UNI code字符转义ASCII序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆