HtmlAgilityPack和验证 [英] HtmlAgilityPack and Authentication

查看:154
本文介绍了HtmlAgilityPack和验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个方法,如果给一个特定的URL来获得id和XPath的。我如何通过与请求的用户名和密码,这样我可以凑一个需要用户名和密码的网址?



 使用HtmlAgilityPack; 

_Web =新HtmlWeb();

内部字典<字符串,字符串> GetidsAndXPaths(字符串URL)
{
变种webidsAndXPaths =新词典<字符串,字符串>();
VAR DOC = _web.Load(URL);
变种节点= doc.DocumentNode.SelectNodes(// * [@ id中]);
如果(节点== NULL)回报webidsAndXPaths;
//代码来获取所有的XPath和IDS



我应该使用Web请求获得页面的源代码,然后将该文件传递到上面的方法?

  VAR WC =新的WebClient(); 
wc.Credentials =新的NetworkCredential(用户名,密码);
wc.DownloadFile(http://somewebsite.com/page.aspx,@C:\localfile.html);


解决方案

HtmlWeb.Load 有一些重载,这些接受要么的NetworkCredential 也可以在用户名和直接密码通过。

  //名称说明
公共方法Load(字符串)//获取来自互联网资源的HTML文档。
公共方法Load(字符串,字符串)//加载来自互联网资源的HTML文档。
公共方法Load(字符串,字符串,WebProxy,的NetworkCredential)//加载来自互联网资源的HTML文档。
公共方法Load(字符串,字符串,的Int32,字符串,字符串)//加载来自互联网资源的HTML文档。

您不需要在 WebProxy 实例,或者您也可以在系统默认的通过。



另外,您可以连线了 HtmlWeb.PreRequestHandler ,和设置请求的凭据

  htmlWeb.PreRequestHandler + =(要求)=> {
request.Credentials =新的NetworkCredential(...);
返回真;
};


I have a method to get ids and xpaths if given a particular url. How do I pass in the username and password with the request so that I can scrape a url that requires a username and password?

using HtmlAgilityPack;

_web = new HtmlWeb();

internal Dictionary<string, string> GetidsAndXPaths(string url)
{
    var webidsAndXPaths = new Dictionary<string, string>();
    var doc = _web.Load(url);
    var nodes = doc.DocumentNode.SelectNodes("//*[@id]");
    if (nodes == null) return webidsAndXPaths;
    // code to get all the xpaths and ids

Should I use a web request to get the page source and then pass that file into the method above?

var wc = new WebClient();
wc.Credentials = new NetworkCredential("UserName", "Password");
wc.DownloadFile("http://somewebsite.com/page.aspx", @"C:\localfile.html");

解决方案

HtmlWeb.Load has a number of overloads, these accept either an instance of NetworkCredential or you can pass in a username and password directly.

Name // Description 
Public method Load(String) //Gets an HTML document from an Internet resource.  
Public method Load(String, String) //Loads an HTML document from an Internet resource.  
Public method Load(String, String, WebProxy, NetworkCredential) //Loads an HTML document from an Internet resource.  
Public method Load(String, String, Int32, String, String) //Loads an HTML document from an Internet resource. 

You do not need to pass in a WebProxy instance, or you can pass in the system default one.

Alternatively you can wire up the HtmlWeb.PreRequestHandler and setup the credentials for the request.

htmlWeb.PreRequestHandler += (request) => {
    request.Credentials = new NetworkCredential(...);
    return true;
};

这篇关于HtmlAgilityPack和验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆