为什么File.ReadAllBytes结果与使用File.ReadAllText时不同? [英] Why is File.ReadAllBytes result different than when using File.ReadAllText?
问题描述
我有一个文本文件(UTF-8编码),内容为 test。我尝试从此文件中获取字节数组并转换为字符串,但是它包含一个奇怪的字符。我使用以下代码:
I have a text file (UTF-8 encoding) with contents "test". I try to get the byte array from this file and convert to string, but it contains one strange character. I use the following code:
var path = @"C:\Users\Tester\Desktop\test\test.txt"; // UTF-8
var bytes = File.ReadAllBytes(path);
var contents1 = Encoding.UTF8.GetString(bytes);
var contents2 = File.ReadAllText(path);
Console.WriteLine(contents1); // result is "?test"
Console.WriteLine(contents2); // result is "test"
conents1
与 contents2
不同-为什么?
推荐答案
此方法尝试根据字节序标记的存在自动检测文件的编码。可以检测到UTF-8和UTF-32编码格式(大端和小端)。
This method attempts to automatically detect the encoding of a file based on the presence of byte order marks. Encoding formats UTF-8 and UTF-32 (both big-endian and little-endian) can be detected.
因此文件包含BOM(字节顺序标记)和 ReadAllText
方法正确地解释了它,而第一种方法只读取了纯字节,而根本不解释它们。
So the file contains BOM (Byte order mark), and ReadAllText
method correctly interprets it, while the first method just reads plain bytes, without interpreting them at all.
Encoding.GetString
说,它只是:
Encoding.GetString
says that it only:
将指定字节数组中的所有字节解码为字符串
(重点是我的)。当然,这并不完全是结论性的,但是您的示例表明,应从字面上理解这一点。
(emphasis mine). Which is of course not entirely conclusive, but your example shows that this is to be taken literally.
这篇关于为什么File.ReadAllBytes结果与使用File.ReadAllText时不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!