这个HttpWebRequest是否正确? [英] Is this HttpWebRequest correct?

查看:47
本文介绍了这个HttpWebRequest是否正确?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用HttpWebRequest和HttpWebResponse从几个网址中删除

网页。


string url =" some url";

HttpWebRequest httpWebRequest =

(HttpWebRequest)WebRequest.Create(url);


使用(HttpWebResponse httpWebResponse =

(HttpWebResponse)httpWebRequest.GetResponse())

{

string html = string.Empty;


StreamReader responseReader = new

StreamReader(httpWebResponse.GetResponseStream(),Encoding.UTF7);

html = responseReader.ReadToEnd();

}


我的代码有效,但我的问题是,我是以正确的方式做到这一点

(尤其是编码部分)?我提取内容的一些网站

来自其中的字符串,这些字母不存在于英语

字母表中,目前是唯一能够正确读取这些内容的方法。 />
我的streamreader是否使用UTF7编码。这真的是

唯一的方式吗?


在我继续推进这个项目之前,我想了解一下

这是否真的如此是这样做的,或者我错过了什么?


任何帮助表示赞赏。


谢谢

I am currently using the HttpWebRequest and HttpWebResponse to pull
webpages down from a few urls.

string url = "some url";
HttpWebRequest httpWebRequest =
(HttpWebRequest)WebRequest.Create(url);

using (HttpWebResponse httpWebResponse =
(HttpWebResponse)httpWebRequest.GetResponse())
{
string html = string.Empty;

StreamReader responseReader = new
StreamReader(httpWebResponse.GetResponseStream(), Encoding.UTF7);
html = responseReader.ReadToEnd();
}

My code works but my question is, am I doing it the right way
(especially the encoding part)? Some of the websites I pull content
from have charachters in them that do not exist in the english
alphabet and currently the only way for these to be read correctly by
my streamreader is if I am using UTF7 encoding. Is this really the
only way?

Before I move forward in the project I would like to understand if
this indeed is the way to do this or if I am missing anything?

Any help is appreciated.

Thanks

推荐答案

Nightcrawler写道:
Nightcrawler wrote:

我目前正在使用HttpWebRequest和HttpWebResponse来拉

网页从几个网址下来。


string url =" some url";

HttpWebRequest httpWebRequest =

(HttpWebRequest )WebRequest.Create(url);


使用(HttpWebResponse httpWebResponse =

(HttpWebResponse)httpWebRequest.GetResponse())

{

string html = string.Empty;


StreamReader responseReader = new

StreamReader(httpWebResponse.GetResponseStream(),编码。 UTF7);

html = responseReader.ReadToEnd();

}


我的代码有效,但我的问题是,我是以正确的方式做到这一点

(尤其是编码部分)?我提取内容的一些网站

来自其中的字符串,这些字母不存在于英语

字母表中,目前是唯一能够正确读取这些内容的方法。 />
我的streamreader是否使用UTF7编码。这真的是

唯一的方式吗?
I am currently using the HttpWebRequest and HttpWebResponse to pull
webpages down from a few urls.

string url = "some url";
HttpWebRequest httpWebRequest =
(HttpWebRequest)WebRequest.Create(url);

using (HttpWebResponse httpWebResponse =
(HttpWebResponse)httpWebRequest.GetResponse())
{
string html = string.Empty;

StreamReader responseReader = new
StreamReader(httpWebResponse.GetResponseStream(), Encoding.UTF7);
html = responseReader.ReadToEnd();
}

My code works but my question is, am I doing it the right way
(especially the encoding part)? Some of the websites I pull content
from have charachters in them that do not exist in the english
alphabet and currently the only way for these to be read correctly by
my streamreader is if I am using UTF7 encoding. Is this really the
only way?



您应该检查字符串

参数的HTTP响应标头Content-Type,并使用它来创建流阅读器。例如,如果

服务器发送标题

Content-Type:text / html; charset = Windows-1252

然后你会用

新的StreamReader(httpWebResponse.GetResponseStream(),

Encoding.GetEncoding(" Windows-) 1252))

另一方面,在狂野的网络上,服务器通常不会发送

字符集参数和HTML文档的作者在meta元素中包含

字符集,例如

< meta http-equiv =" Content-Type" content =" text / html;

charset = Windows-1252">

因此浏览器等用户代理会付出很多努力来尝试

读取足够的文档来查找并解析该元素然后

能够解码文档的其余部分。

-


Martin Honnen --- MVP XML
http:// JavaScript.FAQTs.com/


所以你基本上说的是我最好的选择是寻找

元页面中的标签用于确定要使用的编码,而不是
依赖于HTTP响应标头。


我使用的大部分网站都是streamreader说:< meta http-

equiv =" Content-type"含量=" text / html的;字符集= UTF-8英寸/但是

是一些没有在他们的代码中包含元标记的人。我应该如何处理那些?b $ b?流读取器是否有办法检测页面使用的编码是什么?


感谢您的帮助!
So what you basically are saying is that my best bet is to look for
the meta tags in the page to determine the encoding to use and don''t
rely on the HTTP response header.

Most of the sites I read using the streamreader say: <meta http-
equiv="Content-type" content="text/html; charset=UTF-8" /but there
are a few that do not have that meta tag included in their code. How
should I approach those? Is there a way for the streamreader to detect
what encoding the page is using?

Thanks for you help!


更令人讨厌的是,我读过的其中一个网站是

说明它使用的是UTF-8而我的流重读器仍然无法翻译

the charachters正确。我得到的是小方盒而不是

字符。
What is even more annoying is that one of the websites I read is
stating it''s using UTF-8 and my streamreader still does not translate
the charachters correctly. I get little square boxes instead of the
charachters.


这篇关于这个HttpWebRequest是否正确?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆