使用HtmlAgilityPack.NETCore获取网页 [英] Get web page using HtmlAgilityPack.NETCore

查看:358
本文介绍了使用HtmlAgilityPack.NETCore获取网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 HtmlAgilityPack 处理html页面。
以前我是这样做的:

I used the HtmlAgilityPack for work with html pages. Previously I did this:

HtmlWeb web = new HtmlWeb();
HtmlDocument document = web.Load(url);
var nodes = document.DocumentNode.SelectNodes("necessary node");

但现在我需要使用HtmlAgilityPack.NETCore,其中 HtmlWeb 不存在。
应该如何使用 HtmlWeb 获得相同的结果?

but now i need to use the HtmlAgilityPack.NETCore where HtmlWeb is absent. What should i use instead HtmlWeb to have the same result?

推荐答案

使用 HttpClient 作为与远程资源交互的新方法通过http。

Use the HttpClient as a new way to interact with remote resources via http.

对于您的解决方案,您可能需要在此处使用 async 方法来非阻塞线程,而不是 .Result 的用法。还要注意, HttpClient 打算从不同的线程使用从.Net 4.5开始,因此您不应该每次都重新创建它:

As for your solution, you probably need to use the async methods here for non-blocking your thread, instead of .Result usage. Also note that HttpClient was meant to be used from different threads starting from .Net 4.5, so you should not recreate it each time:

// instance or static variable
HttpClient client = new HttpClient();

// get answer in non-blocking way
using (var response = await client.GetAsync(url))
{
    using (var content = response.Content)
    {
        // read answer in non-blocking way
        var result = await content.ReadAsStringAsync();
        var document = new HtmlDocument();
        document.LoadHtml(result);
        var nodes = document.DocumentNode.SelectNodes("Your nodes");
        //Some work with page....
    }
}

关于异步/等待的出色文章:异步/等待-异步最佳实践编程,来自@StephenCleary | 2013年3月

Great article about async/await: Async/Await - Best Practices in Asynchronous Programming by @StephenCleary | March 2013

这篇关于使用HtmlAgilityPack.NETCore获取网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆