与HTML敏捷包解析HTML [英] Parsing html with html agility pack

查看：131 发布时间：2016/9/22 13:43:33 c# html html-agility-pack

本文介绍了与HTML敏捷包解析HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我要收集所有标签从这个DIV，但不知道如何使用XPath方法

 <$ C $做到这一点的最佳途径C>< DIV CLASS =biz_info> 
将; H3>&下; A HREF =/ PROFIL / 78122 /秒％C3％B8rby-康复/>索尔比康复&下; / A>&下; / H3> 
<表类=string_14> 
<&TBODY GT; 
< TR> 
< TD> Postadr：< / TD> 
< TD类=tab_space> Rognerudveien 8 B，0681奥斯陆< / TD> 
< / TR> 
 
< TR> 
< TD>电话：< / TD> 
< TD类=tab_space> 928 70 700℃; / TD> 
< / TR> 
 
< TR> 
< TD> Nettside：LT; / TD> 
< TD类=tab_space>< A HREF =http://www.sorby-rehab.no目标=_空白> www.sorby-rehab.no< / A> < / TD> 
< / TR> 
< / TBODY> 
< /表> 
< / DIV>

今天，我的代码看起来像这样（但是非常糟糕）：

 的HTMLDocument DOC =新的HTMLDocument（）; 
 doc.Load（新StringReader（结果））; 
 HtmlNode根= doc.DocumentNode; 
 
名单，LT;字符串> anchorTags =新的List<串GT;（）; 
 
的foreach（在root.SelectNodes HtmlNode链接（// @类= biz_info））
 {
串ATT = link.OuterHtml; 
 anchorTags.Add（ATT）; 
}

是谁的人在XPath是专业，可以帮助我？

解决方案

 的HTMLDocument HTML =新的HTMLDocument（）; 
 html.Load（新StringReader（结果））; 
 VAR anchorTags = html.DocumentNode.SelectNodes（// DIV [@类='biz_info'] // A）
。选择（A => a.OuterHtml）
。了ToList（）;

这会给你的锚标记HTML列表。如果你只需要网址：

 网址= html.DocumentNode.SelectNodes（// DIV [@类='biz_info'] //一个[@href =''！]。）
。选择（A => a.Attributes [HREF]值）
 .ToList（）;

I want to collect all tags in from this div but do not know how to do this in the best way with xpath method

<div class="biz_info">
    <h3><a href="/profil/78122/s%C3%B8rby-rehab/">Sørby Rehab</a></h3>
    <table class="string_14">
        <tbody>
            <tr>
               <td>Postadr.:</td> 
               <td class="tab_space">Rognerudveien 8 B, 0681 Oslo</td> 
            </tr>

            <tr>
                <td>Telefon:</td> 
                <td class="tab_space">928 70 700</td>
            </tr>

            <tr>
                <td>Nettside:</td> 
                <td class="tab_space"><a href="http://www.sorby-rehab.no" target="_blank">www.sorby-rehab.no</a></td>
            </tr>
        </tbody>
    </table>
</div>

Today my code looks like this (but very bad):

 HtmlDocument doc = new HtmlDocument();
doc.Load(new StringReader(result));
HtmlNode root = doc.DocumentNode;

List<string> anchorTags = new List<string>();

foreach (HtmlNode link in root.SelectNodes("//@class=biz_info"))
{
    string att = link.OuterHtml;
    anchorTags.Add(att);
}

Is someone who is professional in xpath that can help me?

解决方案

HtmlDocument html = new HtmlDocument();
html.Load(new StringReader(result));
var anchorTags = html.DocumentNode.SelectNodes("//div[@class='biz_info']//a")
                     .Select(a => a.OuterHtml)
                     .ToList();

That will give you list of anchor tags html. If you need just urls:

urls = html.DocumentNode.SelectNodes("//div[@class='biz_info']//a[@href!='']")
           .Select(a => a.Attributes["href"].Value)
           .ToList();

这篇关于与HTML敏捷包解析HTML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

与HTML敏捷包解析HTML [英] Parsing html with html agility pack

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

与HTML敏捷包解析HTML [英] Parsing html with html agility pack

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭