C#-阅读HTML? [英] c# - reading HTML?

查看:76
本文介绍了C#-阅读HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用C#开发程序,我需要一些帮助.我正在尝试创建显示在某个网站上的数组或项目列表.我想做的是阅读锚文本及其href.例如,这是HTML:

I'm developing a program in C# and I require some help. I'm trying to create an array or a list of items, that display on a certain website. What I'm trying to do is read the anchor text and it's href. So for example, this is the HTML:

<div class="menu-1">
    <div class="items">
        <div class="minor">
            <ul>
                <li class="menu-item">
                    <a class="menu-link" title="Item-1" id="menu-item-1" href="/?item=1">Item 1</a>
                </li>
                <li class="menu-item">
                    <a class="menu-link" title="Item-1" id="menu-item-2" href="/?item=2">Item 2</a>
                </li>
                <li class="menu-item">
                    <a class="menu-link" title="Item-1" id="menu-item-3" href="/?item=3">Item 3</a>
                </li>
                <li class="menu-item">
                    <a class="menu-link" title="Item-1" id="menu-item-4" href="/?item=4">Item 4</a>
                </li>
                <li class="menu-item">
                    <a class="menu-link" title="Item-1" id="menu-item-5" href="/?item=5">Item 5</a>
                </li>
            </ul>
        </div>
    </div>
</div>

因此,我想从HTML中阅读以下内容:

So from that HTML I would like to read this:

string[,] array = {{"Item 1", "/?item=1"}, {"Item 2", "/?item=2"}, {"Item 3", "/?item=3"}, {"Item 4", "/?item=4"}, {"Item 5", "/?item=5"}};

HTML是我编写的示例,实际的网站看起来并非如此.

The HTML is an example I had written, the actual site does not look like that.

推荐答案

正如其他人所说的,HtmlAgilityPack是最适合html解析的,还请务必从HtmlAgilityPack网站下载HAP Explorer,使用它来测试您的选择,无论如何使用SelectNode命令将获取所有具有ID的锚点,并以menu-item开头:

As others said HtmlAgilityPack is the best for html parsing, also be sure to download HAP Explorer from HtmlAgilityPack site, use it to test your selects, anyway this SelectNode command will get all anchors that have ID and it start with menu-item :

  HtmlDocument doc = new HtmlDocument();
  doc.Load(htmlFile);
  var myNodes = doc.DocumentNode.SelectNodes("//a[starts-with(@id,'menu-item-')]");
  foreach (HtmlNode node in myNodes)
  {
    Console.WriteLine(node.Id);

  }

这篇关于C#-阅读HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆