Html敏捷包 - 在< Strong>之后如何提取值标签 [英] Html agility pack - how extract value after <Strong> tag

查看:113
本文介绍了Html敏捷包 - 在< Strong>之后如何提取值标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写信寻求帮助,关于如何从以下html数据源中提取编辑选择值:

I am writing to seek help, as to how I can extract the value ‘Editor’s Pick’ from the following html data source:

<P align=justify>Editor's picks<BR><A href="/Article.asp?PUB=250&ISS=22792&SID=52855&TS=1&article=Idiosyncratic risk" name="" target=_blank>Idiosyncratic risk?</A>





到目前为止,我创建了以下函数,是目前返回null。



So far, I have created the following function, which is currently returning null.

public static string getHTMLTags()
    {

        string url = "";

        string data = storyMethod();

        HtmlDocument html = new HtmlDocument();
        html.LoadHtml(data);

        var nodes = html.DocumentNode.SelectNodes("//p[@align=justify]//strong[1]");

        if (nodes != null)
        {
            foreach (var node in nodes)
            {
                string Description = node.InnerHtml;
                return Description;
            }
        }

        return null;

    }





关于我可以使用哪些方法/属性的任何进一步帮助Html敏捷包,可以帮助我解决这个任务。



预期输出:

编辑推荐




感谢您的进一步帮助。



Any further assistance as to what methods/properties I could use within the Html agility pack, which could help me to solve this task.

Expected output:
Editor's picks


Thank you for any further assistance.

推荐答案

问题是您的< P> 标记未关闭,因此HAP将< strong> 标记视为 sibling 元素,而不是 child 元素。



解决方案隐藏在CodePlex网站的讨论中:

The problem is that your <P> tag isn't closed, so HAP is treating the <strong> tag as a sibling element, not a child element.

The solution is buried in the discussions on the CodePlex site:



现在,您可以调整HTML敏捷包使用HtmlNode.ElementFlags静态属性更符合您的期望...您可以做的是告诉它您不想支持未关闭的< p> 标记:


Now, you can tweak the HTML agility pack to better suit what you expect using the HtmlNode.ElementFlags static property ... What you can do is tell it you don't want to support unclosed <p> tags:

HtmlNode.ElementsFlags.Remove("p"); // remove the Empty and Closed flags
HtmlDocument doc = new HtmlDocument();
doc.Load(...);






您还缺少属性值周围的引号,你应该只为后代节点使用一个 /


You're also missing quotes around the attribute value, and you should only use a single / for the descendant node:

HtmlNode.ElementsFlags.Remove("p");
HtmlDocument html = new HtmlDocument();
html.LoadHtml(data);

var nodes = html.DocumentNode.SelectNodes("//p[@align='justify']/strong[1]");
return nodes == null ? null : nodes.Select(n => n.InnerHtml).FirstOrDefault();

// Result:
// Editor's picks<br><a href="/Article.asp?PUB=250&ISS=22792&SID=52855&TS=1&article=Idiosyncratic risk" name="" target="_blank">Idiosyncratic risk?</a>


这篇关于Html敏捷包 - 在&lt; Strong&gt;之后如何提取值标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆