刮取安全站点 [英] Scraping secured sites

查看:57
本文介绍了刮取安全站点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用WebRequest()& C#的WebResponse()方法,用于从站点抓取数据.它适用于未经身份验证的站点.现在,我想从经过身份验证的网站上抓取详细信息,这会提示您输入登录详细信息以进入该网站.该站点为``https://www.dandh.ca/v4/view?pageReq=dhMainE''.请求页面时如何传递凭据?

我使用了以下代码,但不起作用:

I use WebRequest() & WebResponse() methods from C# for scraping data from sites. It works well for unauthenticated sites. Now I want to scrape details from an authenticated sites which prompts for login details for getting into the site. The site is ''https://www.dandh.ca/v4/view?pageReq=dhMainE''. How do I pass the credentials while requesting the page?

I used the below code,but it doesn''t work:

NetworkCredential credential = new NetworkCredential("XXXX", "XXXX"); //USERNAME,PWD
         HttpWebRequest wr = (HttpWebRequest)WebRequest.Create(target_url);
wr.Credentials = credential;
wr.Method = "POST";

HttpWebResponse wrs = (HttpWebResponse)wr.GetResponse();
StreamReader sr = new StreamReader(wrs.GetResponseStream(), Encoding.GetEncoding("ISO-8859-1"), false);
         source = sr.ReadToEnd();



我还尝试设置''DefaultCredentials=true'' ,并在我的firefox浏览器中保持网站打开.

请帮助我解决问题.

谢谢&问候,
Lavanya.



I also tried setting ''DefaultCredentials=true'' and keeps the site opened in my firefox browser.

Please help me to get my problem solved.

Thanks & Regards,
Lavanya.

推荐答案

一段时间以来,我一直在处理此问题.因此,首先,您应该在某个http分析器中分析所有类型的请求,并跟踪require的行为.最困难的级别是实施授权,因为每个站点的授权工作可能不同.和往常一样,它与Cookie关联,登录后将在其中存储登录数据.您需要在代码中重复此技术,然后您将设法访问该页面.

发送登录数据也可以通过不同的方式来实现.它们是查询字符串和发布参数,而不仅仅是使用立即可用的NetworkCredential对象.

在实施抓取之前,请尝试寻找用于站点的开发人员API,因为它可以大大简化开发工作.
I was working with this issue some period of time. So, first of all you should analize all kind of requests in some http analyzer and track the behaviour of required. The most difficult level is to implement an authorization because it can be work differently for every site. As usual it''s connected with cookies where the login data is stored after logging in. You need to repeat this technique in your code and then you will manage to access the page.

Sending login data can be implemented in different ways too. They are query string and post parameters not just using ready-to-use NetworkCredential object.

Before to implement scraping try to look for developer API for site cause it pretty simplifies the developing.


这篇关于刮取安全站点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆