阅读UTF8 / UNI code字符转义ASCII序列 [英] Read UTF8/UNICODE characters from an escaped ASCII sequence

查看：321 发布时间：2015/11/25 16:18:11 c# .net unicode encoding utf-8

本文介绍了阅读UTF8 / UNI code字符转义ASCII序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个文件下面的名字，我需要阅读的字符串作为UTF8-CN codeD字符串，所以从这样的：

I have the following name in a file and I need to read the string as a UTF8-encoded string, so from this:

test_\303\246\303\270\303\245.txt

我需要获取以下信息：

I need to obtain the following:

test_æøå.txt

你知道如何实现这一目标使用C＃？

Do you know how to achieve this using C#?

推荐答案

假设你有这样的字符串：

Assuming you have this string:

string input = "test_\\303\\246\\303\\270\\303\\245.txt";

即。从字面上

I.E. literally

test_\303\246\303\270\303\245.txt

您可以这样做：

string input = "test_\\303\\246\\303\\270\\303\\245.txt";
Encoding iso88591 = Encoding.GetEncoding(28591); //See note at the end of answer
Encoding utf8 = Encoding.UTF8;


//Turn the octal escape sequences into characters having codepoints 0-255
//this results in a "binary string"
string binaryString = Regex.Replace(input, @"\\(?<num>[0-7]{3})", delegate(Match m)
{
    String oct = m.Groups["num"].ToString();
    return Char.ConvertFromUtf32(Convert.ToInt32(oct, 8));

});

//Turn the "binary string" into bytes
byte[] raw = iso88591.GetBytes(binaryString);

//Read the bytes into C# string
string output = utf8.GetString(raw);
Console.WriteLine(output);
//test_æøå.txt

由二进制串，我的意思是只包括与codepoints 0-255字符的字符串。因此，它相当于一个穷人版的字节[] 其中，您检索字的索引$ C $连接点我，而不是字节的值字节[] 在指数我（这是我们在JavaScript中做了几年前）。由于ISO-8859-1地图正是第256 UNI code code点为一个字节，这是完美的，用于将二进制串成字节[] 。

by "binary string", I mean a string consisting only of characters with codepoints 0-255. It therefore amounts to a poor man's byte[] where you retrieve the codepoint of character at index i, instead of a byte value in a byte[] at index i (This is what we did in javascript a few years ago). Because iso-8859-1 maps exactly the first 256 unicode code points into a single byte, it's perfect for converting a "binary string" into a byte[].

这篇关于阅读UTF8 / UNI code字符转义ASCII序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

阅读UTF8 / UNI code字符转义ASCII序列 [英] Read UTF8/UNICODE characters from an escaped ASCII sequence

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

阅读UTF8 / UNI code字符转义ASCII序列 [英] Read UTF8/UNICODE characters from an escaped ASCII sequence

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭