HTML Agility Pack如何在页面加载后获取动态生成的内容 [英] Html Agility Pack how to get dynamically generated content after page loads

查看:153
本文介绍了HTML Agility Pack如何在页面加载后获取动态生成的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图从"https://www.sideshow.com/collectibles?manufacturer=Hot+Toys"获取信息.专门针对Div c-ProductList行以ss为目标,但似乎未检索到任何信息,任何线索

I am attempting to get information from "https://www.sideshow.com/collectibles?manufacturer=Hot+Toys" specifically Div c-ProductList row ss-targeted but no information seems to be retrieved, any clues

var test = page.DocumentNode.SelectNodes("//div[@class='c-ProductList row ss-targeted']");

推荐答案

要获取的内容是在页面加载后使用Javascript和Ajax生成的.HAP无法获取它,除非它在后台运行浏览器并执行页面上的脚本.

The content you want to get is generated after the page loads, using Javascript and Ajax. HAP cannot get it unless it runs a browser in background and execute the scripts on the page.

.Net Core 2.0

前提条件:您需要在PC中安装Chrome网络浏览器.

Pre-requisites: you need Chrome web browser installed in your PC.

  1. 创建控制台应用程序

  1. Create a console application

安装Nuget软件包安装软件包HtmlAgilityPack 安装软件包Selenium.WebDriver Install-Package Selenium.Chrome.WebDriver

Install Nuget packages Install-Package HtmlAgilityPack Install-Package Selenium.WebDriver Install-Package Selenium.Chrome.WebDriver

通过以下方法替换 Main 方法

代码:

    static void Main(string[] args)
    {
        string url = "https://www.sideshow.com/collectibles?manufacturer=Hot+Toys";
        var browser = new ChromeDriver(Environment.CurrentDirectory);
        browser.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(30);
        browser.Navigate().GoToUrl(url);

        var results = browser.FindElementByClassName("ss-results");
        var doc = new HtmlDocument();
        doc.LoadHtml(results.GetAttribute("innerHTML"));

        // Show results
        var list = doc.DocumentNode.SelectSingleNode("//div[@class='c-ProductList row ss-targeted']");
        foreach (var title in list.SelectNodes(".//h2[@class='c-ProductListItem__title ng-binding']"))
        {
            Console.WriteLine(title.InnerText);
        }
        Console.ReadLine();
    }

.Net 4.6

  1. 创建控制台应用程序

  1. Create a console application

安装Nuget软件包安装软件包HtmlAgilityPack

Install Nuget package Install-Package HtmlAgilityPack

Solution Explorer 中添加对 System.Windows.Form

根据需要使用语句添加

Add using statements as required

通过以下方法替换 Main 方法

代码:

[STAThread]
static void Main(string[] args)
{
    string url = "https://www.sideshow.com/collectibles?manufacturer=Hot+Toys";

    var web = new HtmlWeb();
    web.BrowserTimeout = TimeSpan.FromSeconds(30);

    var doc = web.LoadFromBrowser(url, o =>
    {
        var webBrowser = (WebBrowser)o;

        // Wait until the list shows up
        return webBrowser.Document.Body.InnerHtml.Contains("c-ProductList");
    });

    // Show results
    var list = doc.DocumentNode.SelectSingleNode("//div[@class='c-ProductList row ss-targeted']");
    foreach (var title in list.SelectNodes(".//h2[@class='c-ProductListItem__title ng-binding']"))
    {
        Console.WriteLine(title.InnerText);
    }
    Console.ReadLine();
}

显示以以下内容开头的列表:

Displays a list starting with:

钢铁侠马克L

John Wick

John Wick

惩罚者战争机器装甲

神奇女侠豪华版

这篇关于HTML Agility Pack如何在页面加载后获取动态生成的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆