C#中的文本编码问题 - encoding.getchars()会产生奇怪的结果 [英] Text encoding problem in C# - encoding.getchars() yields weird results

查看:124
本文介绍了C#中的文本编码问题 - encoding.getchars()会产生奇怪的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个涉及用Shift-JIS编码的日文文本的项目。我正在尝试从字节数组中解码有用的字符,如下所示:

I'm working on a project that involves Japanese text encoded in Shift-JIS. I'm trying to decode useful characters from arrays of bytes like so:

Encoding encoding = Encoding.GetEncoding(932);
string test = new string(encoding.GetChars(byteArray));



这适用于99%的数据,但我得到了奇怪的结果少数几个人物。例如,字符'¨'。如果我获得对应于该字符的字节,如下所示:


This works for 99% of the data but I'm getting weird results for just a handful of characters. The character '¨', for example. If I obtain the bytes corresponding to that character like this:

byte[] getBytesTest = encoding.GetBytes("¨");



结果正是我所期望的。两个字节,0x81和0x4E,这是该字符应该在Shift-JIS中。

如果我尝试将其转换回字符串,但是:


The results are exactly what I expect. Two bytes, 0x81 and 0x4E, which is what that character is supposed to be in Shift-JIS.
If I try to convert it back to a string, though:

string getCharsTest = new string(encoding.GetChars(getBytesTest));



...我得到默认错误输出?。



所以tl; dr,我正在试图找出原因:


...I get the default error output "?".

So tl;dr, I'm trying to figure out why this:

byte[] getBytesTest = encoding.GetBytes("¨");
string getCharsTest = new string(encoding.GetChars(getBytesTest));



产生?结果而不是¨。



我尝试过:



将GetBytes()的结果传递给GetChars()并手动传递字节,如下所示:


yields "?" instead of "¨" as a result.

What I have tried:

Passing the results of GetBytes() to GetChars() and manually passing the bytes in like this:

encoding.GetChars(new byte[] {0x81, 0x4E})



都未能产生预期的结果。



不知道该去哪里。


both fail to produce the expected result.

Not sure where to go with this.

推荐答案

你的代码还可以,当我在VS2017的Win 10上试一试时,它可以工作:

Your code is ok, when I try this on Win 10 in VS2017, it works:
System.Text.Encoding encoding = System.Text.Encoding.GetEncoding(932);
byte[] getBytesTest = encoding.GetBytes("¨");
string getCharsTest = new string(encoding.GetChars(getBytesTest));

也许你需要更新你的Windows?

Maybe you need to update your Windows ?


这里的问题是编码哪个控制台无法理解。如果您将文件写入文件而不是控制台,您将获得预期的文本。



Problem here is encoding which console doesn't understands. If you will write the content in file instead of console, you'll get the text as expected.

File.WriteAllText("file.txt", getCharsTest);


这篇关于C#中的文本编码问题 - encoding.getchars()会产生奇怪的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆