获取HTML代码时出现问题 [英] Problem in getting HTML code

查看:97
本文介绍了获取HTML代码时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨!

问题:
我想从某些网页中删除一些数据(我具有管理访问权限),并将一些信息存储在db中以供以后分析.
听起来很简单,对吧?
我决定使简单的控制台原型和代码看起来像这样:

Hi!

The problem:
I want to scrap some data from certain webpage (I have administrative access) and to store some information in db for later analysis.
Sounds easy, right?
I''ve decided to make simple console prototype and code look something like this:

string uri =  @"http://s7.iqstreaming.com:8044/admin.cgi";
HttpWebRequest request = WebRequest.Create(uri) as HttpWebRequest;
                     
if(request == null)
{
     Console.WriteLine(":( This shouldn't happen!");
     Console.ReadKey();
}

request.ContentType = @"text/html";
request.Accept = @"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
request.Credentials = new NetworkCredential("myID", "myPass");

using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
     StreamReader reader = new StreamReader( response.GetResponseStream());

     while (!reader.EndOfStream)
     {
         Console.WriteLine(reader.ReadLine());
     }

     reader.Close();
     response.Close();
}



该代码可在其他大多数网站上使用,但是在这里我得到错误404(大多数情况下),502或超时.
我已经咨询过Firebug(我从那里获取了接受和压缩信息),但无济于事.
不能选择使用Win-forms和webBrowser控件(至少目前是这样).

P.S.
当我尝试从 http://s7.iqstreaming.com:8044/index.html 获取HTML时,也会发生同样的事情(不需要凭据).

添加了P.S.



This code works on most other sites, but here I get errors 404 (most of the time), 502 or timeout.
I''ve consulted with Firebug (I''ve took Accept and compression info from there) but to no avail.
Using Win-forms and webBrowser control as an alternative is not an option (at least for now).

P.S.
Same thing happens when I try to get HTML from http://s7.iqstreaming.com:8044/index.html (doesn''t need credentials).

added P.S.

推荐答案

我已将这个问题发布在其他流行的网站上,以供程序员使用,并且其中一位成员对此问题有所了解.
答案很简单(通常),问题出在UserAgent中!

当我添加此行代码时,一切都按预期进行.
I''ve posted this question on other popular web site for coders and one of members there knew about this issue.
Answer seams very simple (as usual), problem was in UserAgent!

When I add this line of code everything works as expected.
request.UserAgent="Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.78 Safari/535.11";


这篇关于获取HTML代码时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆