C#中的文本编码问题 - encoding.getchars()会产生奇怪的结果 [英] Text encoding problem in C# - encoding.getchars() yields weird results
问题描述
我正在开发一个涉及用Shift-JIS编码的日文文本的项目。我正在尝试从字节数组中解码有用的字符,如下所示:
I'm working on a project that involves Japanese text encoded in Shift-JIS. I'm trying to decode useful characters from arrays of bytes like so:
Encoding encoding = Encoding.GetEncoding(932);
string test = new string(encoding.GetChars(byteArray));
这适用于99%的数据,但我得到了奇怪的结果少数几个人物。例如,字符'¨'。如果我获得对应于该字符的字节,如下所示:
This works for 99% of the data but I'm getting weird results for just a handful of characters. The character '¨', for example. If I obtain the bytes corresponding to that character like this:
byte[] getBytesTest = encoding.GetBytes("¨");
结果正是我所期望的。两个字节,0x81和0x4E,这是该字符应该在Shift-JIS中。
如果我尝试将其转换回字符串,但是:
The results are exactly what I expect. Two bytes, 0x81 and 0x4E, which is what that character is supposed to be in Shift-JIS.
If I try to convert it back to a string, though:
string getCharsTest = new string(encoding.GetChars(getBytesTest));
...我得到默认错误输出?。
所以tl; dr,我正在试图找出原因:
...I get the default error output "?".
So tl;dr, I'm trying to figure out why this:
byte[] getBytesTest = encoding.GetBytes("¨");
string getCharsTest = new string(encoding.GetChars(getBytesTest));
产生?结果而不是¨。
我尝试过:
将GetBytes()的结果传递给GetChars()并手动传递字节,如下所示:
yields "?" instead of "¨" as a result.
What I have tried:
Passing the results of GetBytes() to GetChars() and manually passing the bytes in like this:
encoding.GetChars(new byte[] {0x81, 0x4E})
都未能产生预期的结果。
不知道该去哪里。
both fail to produce the expected result.
Not sure where to go with this.
推荐答案
你的代码还可以,当我在VS2017的Win 10上试一试时,它可以工作:
Your code is ok, when I try this on Win 10 in VS2017, it works:
System.Text.Encoding encoding = System.Text.Encoding.GetEncoding(932);
byte[] getBytesTest = encoding.GetBytes("¨");
string getCharsTest = new string(encoding.GetChars(getBytesTest));
也许你需要更新你的Windows?
Maybe you need to update your Windows ?
这里的问题是编码哪个控制台无法理解。如果您将文件写入文件而不是控制台,您将获得预期的文本。
Problem here is encoding which console doesn't understands. If you will write the content in file instead of console, you'll get the text as expected.
File.WriteAllText("file.txt", getCharsTest);
这篇关于C#中的文本编码问题 - encoding.getchars()会产生奇怪的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!