我应该如何解码UTF-8字符串 [英] How should I decode a UTF-8 string
本文介绍了我应该如何解码UTF-8字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个像这样的字符串:
I have a string like:
About \xee\x80\x80John F Kennedy\xee\x80\x81\xe2\x80\x99s Assassination . unsolved mystery \xe2\x80\x93 45 years later. Over the last decade, a lot of individuals have speculated on conspiracy theories that ...
我知道\xe2\x80\x93
是破折号.但是我应该如何在C#中解码上述字符串?
I understand that \xe2\x80\x93
is a dash character. But how should I decode the above string in C#?
推荐答案
扫描输入字符串char-by-char并将使用\x
(string
转换为byte[]
并使用UTF8 decoder
),其他所有字符均保持不变:
Scan the input string char-by-char and convert values starting with \x
(string
to byte[]
and back to string
using UTF8 decoder
), leaving all other characters unchanged:
static string Decode(string input)
{
var sb = new StringBuilder();
int position = 0;
var bytes = new List<byte>();
while(position < input.Length)
{
char c = input[position++];
if(c == '\\')
{
if(position < input.Length)
{
c = input[position++];
if(c == 'x' && position <= input.Length - 2)
{
var b = Convert.ToByte(input.Substring(position, 2), 16);
position += 2;
bytes.Add(b);
}
else
{
AppendBytes(sb, bytes);
sb.Append('\\');
sb.Append(c);
}
continue;
}
}
AppendBytes(sb, bytes);
sb.Append(c);
}
AppendBytes(sb, bytes);
return sb.ToString();
}
private static void AppendBytes(StringBuilder sb, List<byte> bytes)
{
if(bytes.Count != 0)
{
var str = System.Text.Encoding.UTF8.GetString(bytes.ToArray());
sb.Append(str);
bytes.Clear();
}
}
输出:
About John F Kennedy’s Assassination . unsolved mystery – 45 years later. Over the last decade, a lot of individuals have speculated on conspiracy theories that ...
这篇关于我应该如何解码UTF-8字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文