使用HtmlAgilityPack解析C#中的网页信息 [英] using HtmlAgilityPack for parsing a web page information in C#

查看:117
本文介绍了使用HtmlAgilityPack解析C#中的网页信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用HtmlAgilityPack解析网页相关的信息。这是我的code:

I'm trying to use HtmlAgilityPack for parsing a web page information. This is my code:

using System;
using HtmlAgilityPack;

namespace htmparsing
{
    class MainClass
    {
        public static void Main (string[] args)
        {
            string url = "https://bugs.eclipse.org";
            HtmlWeb web = new HtmlWeb();
            HtmlDocument doc = web.Load(url);
            foreach(HtmlNode node in doc){
                //do something here with "node"
            }               
        }
    }
}

但是,当我试图访问 doc.DocumentElement.SelectNodes 我看不到 DocumentElement 在列表中。我加了HtmlAgilityPack.dll在引用,但我不知道有什么问题。

But when I tried to access to doc.DocumentElement.SelectNodes I can not see DocumentElement in the list. I added the HtmlAgilityPack.dll in the references, but I don't know what's the problem.

推荐答案

我已经演示了使用ASP.NET HAP(HTML敏捷性包)刮DOM元素的文章。它只是让你去通过一步的整个过程一步。你可以看看和尝试。

I've an article that demonstrates scraping DOM elements with HAP (HTML Agility Pack) using ASP.NET. It simply lets you go through the whole process step by step. You can have a look and try it.

<一个href=\"http://www.$c$cproject.com/Articles/659019/Scraping-HTML-DOM-elements-using-HtmlAgilityPack-H\">Scraping使用HtmlAgilityPack(HAP)在ASP.NET中的HTML DOM元素

和你的过程中,它的工作对我罚款。当你用一个变化做了我试过这样。

and about your process it's working fine for me. I've tried this way as you did with a single change.

string url = "https://www.google.com";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a")) 
{
    outputLabel.Text += node.InnerHtml;
}

得到的输出如预期。问题是你所要求的 DocumentElement 的HTMLDocument 对象实际上应为 DocumentNode 。下面是从 HTMLAgilityPack 关于您所面临的问题开发商的回应。

Got the output as expected. The problem is you are asking for DocumentElement from HtmlDocument object which actually should be DocumentNode. Here's a response from a developer of HTMLAgilityPack about the problem you are facing.

HTMLDocument.DocumentElement不在对象浏览

这篇关于使用HtmlAgilityPack解析C#中的网页信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆