我如何使用HttpWebRequest和HttpWebResponse登录网站并随后刮取页面 [英] How do i use HttpWebRequest and HttpWebResponse to login into a website and subsequently scrape a page

查看:100
本文介绍了我如何使用HttpWebRequest和HttpWebResponse登录网站并随后刮取页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



有人可以帮我弄清楚如何使用HttpWebRequest登录页面并随后抓取页面.正在使用的代码似乎不起作用.


Hi,

Can some one help me figure out how to login into a page using HttpWebRequest and subsequently scrape a page. The code am using doesnt seem to work.


HttpWebRequest request;
HttpWebResponse response;
CookieContainer cookies;


string url = string.Format("http://control.shaboshabo.com/login-action.php ?username={0}&password={1}", "xxx", "yyy");
request = (HttpWebRequest)WebRequest.Create(url);
request.AllowAutoRedirect = true;
request.Method = "POST";
request.CookieContainer = new CookieContainer();
response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode != HttpStatusCode.Found)
{
    //ToDo: if the page wasn't found raise Exception

    //instead of this textmessage

    Console.WriteLine("Something Wrong");
    response.Close();
    request.KeepAlive = false;
    return;
}
cookies = request.CookieContainer;
response.Close();
request = (HttpWebRequest)WebRequest.Create("http://control.shaboshabo.com/controlpanel.php");
request.AllowAutoRedirect = false;
request.CookieContainer = cookies;
response = (HttpWebResponse)request.GetResponse();
//string markup = null;
var rStream = response.GetResponseStream();
string markup;

using (var sr = new StreamReader(rStream))
{
    markup = sr.ReadToEnd();
}
//using (Stream s = response.GetResponseStream())
//{
//    StreamReader sr = new StreamReader(s);

//    while (!sr.EndOfStream)
//    {
//        markup = sr.ReadToEnd();
//    }
//    sr.Close();
//}

Console.WriteLine(markup);
Console.ReadLine();

推荐答案

一些要点可以帮助您解决问题.

1)从您的网址中删除login-action.php?之间的多余空间.
Few points which may help you to resolve your problem.

1) Remove extra space between login-action.php and ? from your Url.
string url = string.Format("http://control.shaboshabo.com/login-action.php?username={0}&password={1}", "xxx", "yyy");



2)如果我在浏览器中的Url下面键入,则会显示消息-"用户名或密码错误".

http://control.shaboshabo.com/login-action.php?username=xxx& password = yyy

这意味着PHP网站确实对作为QueryString传递的凭证进行身份验证,但是这些凭证是错误的.尝试通过传递有效的凭据.

3)使用Get作为方法类型而不是Post,因为您正在Url(QuesryString)中传递详细信息,并且实际上并没有发布任何数据.

已更新-我测试了您的第一个网址的代码,它返回的状态为"确定".如下更改条件.



2) If I type below Url in browser then it showing me message - "Wrong username or password".

http://control.shaboshabo.com/login-action.php?username=xxx&password=yyy

That means PHP Website does authenticating cridentials passed as a QueryString but those are wrong cridentials. Try by passing valid cridentials.

3) Use Get as a Method Type instead of Post, as you are passing details in Url(QuesryString) and does not actually posting any Data.

Updated - I tested your code for first Url and it is returning Status as "OK". Change your condition as below.

if (response.StatusCode != HttpStatusCode.OK)
{
    //ToDo: if the page wasn't found raise Exception

    //instead of this textmessage

    Console.WriteLine("Something Wrong");
    response.Close();
    request.KeepAlive = false;
    return;
}



HttpStatusCode.OK表示您的HTTP请求成功并且可以访问.但这并不一定意味着您的PHP站点已成功验证了这些凭据.为此,您将必须传递有效的凭证.

请查看以下链接,以获取有关HttpStatusCode的更多信息.

http://msdn.microsoft.com/en-us/library/system.net. httpstatuscode.aspx



HttpStatusCode.OK indicates your HTTP Request was successful and accessible. But that does not necessarily means your PHP site has authenticated those cridentials succeessfully. For that you will have to pass valid cridentials.

Have a look at below link for more information on HttpStatusCode.

http://msdn.microsoft.com/en-us/library/system.net.httpstatuscode.aspx


我无法回答您的问题.但是,最近我需要自己进行一些Web抓取,并使用了 WebBrowser [ ^ ]类,它似乎可以正常运行.我确实需要花很长时间与浏览器调试器一起弄清楚如何在返回的页面中获取不同的信息,但是它确实运行良好.
I cannot answer your question. However, I have recently had the need to do some web scraping for myself and used the WebBrowser[^] class, which seems to work OK. I did have to spend quite a long time with my browser debugger to figure out how to get to different pieces of information in the returned pages, but it does work reasonably well.


这篇关于我如何使用HttpWebRequest和HttpWebResponse登录网站并随后刮取页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆