使用 WebClient 和 WebRequest 之间的编码差异? [英] Encoding differences between using WebClient and WebRequest?

查看:28
本文介绍了使用 WebClient 和 WebRequest 之间的编码差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在获取一些随机的西班牙报纸索引时,我没有使用 WebRequest 正确获取变音符号,它们产生了这个奇怪的字符: ,同时使用 WebClient 从同一个 uri 下载响应 我得到了适当的回应.

In getting some random spanish newspaper's index I don't get the diacriticals properly using WebRequest, they yield this weird character: , while downloading the response from the same uri using a WebClient I get the appropriate response.

为什么会有这种差异化?

Why is this differentiation?

var client = new WebClient();
string html = client.DownloadString(endpoint);

对比

WebRequest request = WebRequest.Create(endpoint);
using (WebResponse response = request.GetResponse())
{
    Stream stream = response.GetResponseStream();
    StreamReader reader = new StreamReader(stream);
    string html = reader.ReadToEnd();
}

推荐答案

在创建流读取器时,您只是假设实体采用 UTF-8 格式,而没有明确设置编码.您应该检查 HttpWebResponseCharacterSet(未由 WebResponse 基类公开),并打开 StreamReader使用适当的编码.

You're just assuming that the entity is in UTF-8 when creating your stream-reader without explicitly setting the encoding. You should examine the CharacterSet of the HttpWebResponse (not exposed by the WebResponse base class), and open the StreamReader with the appropriate encoding.

否则,如果它读取非 UTF-8 的内容,就好像它是 UTF-8 一样,它会遇到在 UTF-8 中无效的八位字节序列,必须用 U+FFFD 替换字符( ) 尽其所能.

Otherwise, if it reads something that's not UTF-8 as if it was UTF-8, it'll come across octet-sequences that aren't valid in UTF-8 and have to substitute in U+FFFD replacement character () as the best it can do.

WebClient 几乎做到了这一点:DownloadString 是一个更高级别的方法,其中 WebRequest 及其派生类让您进入更底层,它只需要一次调用向 URI 发送 GET 请求,检查标头以查看正在使用的内容编码,以防您需要解压缩或解压缩它,查看已使用的字符编码,设置文本阅读器使用该编码和流,然后调用 ReadAll()".正常的高级大块指令与低级小块指令的优缺点适用.

WebClient does pretty much this: DownloadString is a higher level method, that where WebRequest and its derived classes let you get in lower, it has a single call for "send a GET request to the URI, examine the headers to see what content-encoding is in use, in case you need to un-gzip or de-compress it, see what character-encoding is in place, set up a text-reader with that encoding and the stream, and then call ReadAll()". The normal high-level-big-chunk-instructions vs low-level-small-chunk-instructions pros and cons apply.

这篇关于使用 WebClient 和 WebRequest 之间的编码差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆