使用HtmlAgilityPack解析节点的子级时遇到问题 [英] Problem parsing children of a node with HtmlAgilityPack

查看:88
本文介绍了使用HtmlAgilityPack解析节点的子级时遇到问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在解析html表单的输入标签子项时遇到问题.我可以使用//input [@type]从根解析它们,但不能将其作为特定节点的子元素.

I'm having a problem parsing the input tag children of a form in html. I can parse them from the root using //input[@type] but not as children of a specific node.

以下是说明问题的代码:

Here's some code that illustrates the problem:

private const string HTML_CONTENT =
        "<html>" +
        "<head>" +
        "<title>Test Page</title>" +
        "<link href='site.css' rel='stylesheet' type='text/css' />" +
        "</head>" +
        "<body>" +
        "<form id='form1' method='post' action='http://www.someplace.com/input'>" +
        "<input type='hidden' name='id' value='test' />" +
        "<input type='text' name='something' value='something' />" +
        "</form>" +
        "<a href='http://www.someplace.com'>Someplace</a>" +
        "<a href='http://www.someplace.com/other'><img src='http://www.someplace.com/image.jpg' alt='Someplace Image'/></a>" +
        "<form id='form2' method='post' action='/something/to/do'>" +
        "<input type='text' name='secondForm' value='this should be in the second form' />" +
        "</form>" +
        "</body>" +
        "</html>";

public void Parser_Test()
    {
        var htmlDoc = new HtmlDocument
        {
            OptionFixNestedTags = true,
            OptionUseIdAttribute = true,
            OptionAutoCloseOnEnd = true,
            OptionAddDebuggingAttributes = true
        };

        byte[] byteArray = Encoding.UTF8.GetBytes(HTML_CONTENT);
        var stream = new MemoryStream(byteArray);
        htmlDoc.Load(stream, Encoding.UTF8, true);
        var nodeCollection = htmlDoc.DocumentNode.SelectNodes("//form");
        if (nodeCollection != null && nodeCollection.Count > 0)
        {
            foreach (var form in nodeCollection)
            {
                var id = form.GetAttributeValue("id", string.Empty);
                if (!form.HasChildNodes)
                    Debug.WriteLine(string.Format("Form {0} has no children", id ) );

                var childCollection = form.SelectNodes("input[@type]");
                if (childCollection != null && childCollection.Count > 0)
                {
                    Debug.WriteLine("Got some child nodes");
                }
                else
                {
                    Debug.WriteLine("Unable to find input nodes as children of Form");
                }
            }
            var inputNodes = htmlDoc.DocumentNode.SelectNodes("//input");
            if (inputNodes != null && inputNodes.Count > 0)
            {
                Debug.WriteLine(string.Format("Found {0} input nodes when parsed from root", inputNodes.Count ) );
            }
        }
        else
        {
            Debug.WriteLine("Found no forms");
        }
    }

输出是:

Form form1 has no children
Unable to find input nodes as children of Form
Form form2 has no children
Unable to find input nodes as children of Form
Found 3 input nodes when parsed from root

我希望Form1和Form2都具有子级,而input [@type]能够为Form1找到2个节点,为Form2找到1个

What I would expect is that Form1 and Form2 would both have children and that input[@type] would be able to find 2 nodes for form1 and 1 for form2

是否有我不应该使用的特定配置设置或方法?有什么想法吗?

Is there a specific configuration setting or method that I'm not using that I should be? Any ideas?

谢谢

史蒂夫

推荐答案

好吧,我现在已经放弃了HtmlAgilityPack.似乎在该库中还有更多工作要做,以使一切正常工作.为解决此问题,我将代码移至此处以使用SGMLReader库: http://developer.mindtouch .com/SgmlReader

Well, I've given up on HtmlAgilityPack for now. Seems like there is still more work to do in that library to get everything working. To solve this problem I've moved the code over to use the SGMLReader library from here: http://developer.mindtouch.com/SgmlReader

使用该库,我所有的单元测试均正确通过,并且示例代码按预期工作.

Using this library all my unit tests pass properly and the sample code works as expected.

这篇关于使用HtmlAgilityPack解析节点的子级时遇到问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆