使用 HtmlAgilityPack.NETCore 获取网页 [英] Get web page using HtmlAgilityPack.NETCore

查看:27
本文介绍了使用 HtmlAgilityPack.NETCore 获取网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 HtmlAgilityPack 来处理 html 页面.以前我是这样做的:

I used the HtmlAgilityPack for work with html pages. Previously I did this:

HtmlWeb web = new HtmlWeb();
HtmlDocument document = web.Load(url);
var nodes = document.DocumentNode.SelectNodes("necessary node");

但现在我需要使用没有 HtmlWeb 的 HtmlAgilityPack.NETCore.我应该使用什么来代替 HtmlWeb 以获得相同的结果?

but now i need to use the HtmlAgilityPack.NETCore where HtmlWeb is absent. What should i use instead HtmlWeb to have the same result?

推荐答案

使用 HttpClient 作为通过 http 与远程资源交互的新方式.

Use the HttpClient as a new way to interact with remote resources via http.

至于您的解决方案,您可能需要在此处使用 async 方法来非阻塞线程,而不是使用 .Result .另请注意,HttpClient 旨在用于从 .Net 4.5 开始的不同线程,因此你不应该每次都重新创建它:

As for your solution, you probably need to use the async methods here for non-blocking your thread, instead of .Result usage. Also note that HttpClient was meant to be used from different threads starting from .Net 4.5, so you should not recreate it each time:

// instance or static variable
HttpClient client = new HttpClient();

// get answer in non-blocking way
using (var response = await client.GetAsync(url))
{
    using (var content = response.Content)
    {
        // read answer in non-blocking way
        var result = await content.ReadAsStringAsync();
        var document = new HtmlDocument();
        document.LoadHtml(result);
        var nodes = document.DocumentNode.SelectNodes("Your nodes");
        //Some work with page....
    }
}

关于 async/await 的好文章:Async/Await - 异步编程的最佳实践 来自@StephenCleary |2013 年 3 月

Great article about async/await: Async/Await - Best Practices in Asynchronous Programming by @StephenCleary | March 2013

这篇关于使用 HtmlAgilityPack.NETCore 获取网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆