HtmlAgilityPack丢弃选项结束标签 [英] HtmlAgilityPack Drops Option End Tags

查看:152
本文介绍了HtmlAgilityPack丢弃选项结束标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用HtmlAgilityPack。我用下面的字符串创建了一个HtmlDocument和LoadHtml:

 < select id =foo_Barname =foo.Bar >< option selected =selectedvalue =1> One< / option>< option value =2>两个< / option>< / select> 

这会产生一些意想不到的结果。首先,它提供了两个解析器错误EndTagNotRequired。其次,选择节点有4个孩子 - 两个用于选项标签,另外两个用于选项标签的内部文本。最后,OuterHtml就是这样的:

 < select id =foo_Barname =foo.Bar><<< ; option selected =selectedvalue =1> One< option value =2>两个< / select> 

所以基本上我决定把结束标记放在选项上。让我们暂时搁置一下,不管这是否合适和可取。我使用HtmlAgilityPack来测试HTML代码,所以我不希望它为我做出任何决定或给出任何错误,除非HTML确实是格式错误。有什么方法可以使其表现得如何?我试着为HtmlDocument设置一些选项,具体如下:

  doc.OptionAutoCloseOnEnd = false; 
doc.OptionCheckSyntax = false;
doc.OptionFixNestedTags = false;

这不起作用。如果HtmlAgilityPack无法做到我想要的,你能推荐一些可以做到的吗?

解决方案

完全相同的错误报告在HAP home页面的讨论,但看起来在几年内没有对该项目进行有意义的修复。不鼓励。

对源代码的快速浏览表明,错误可以通过注释掉HtmlNode.cs的第92行来解决:

  //它们有时包含,有时它们不会... 
ElementsFlags.Add(option,HtmlElementFlag.Empty);

(实际上,它们总是包含标签文本,但空白字符串也是有效的文本。粗心的作者可能会忽略结束标记,但对任何元素都是如此)。

ADD



一个等效的解决方案是在使用liberary之前调用 HtmlNode.ElementsFlags.Remove(option); (无需修改自由源代码)

I am using HtmlAgilityPack. I create an HtmlDocument and LoadHtml with the following string:

<select id="foo_Bar" name="foo.Bar"><option selected="selected" value="1">One</option><option value="2">Two</option></select>

This does some unexpected things. First, it gives two parser errors, EndTagNotRequired. Second, the select node has 4 children - two for the option tags and two more for the inner text of the option tags. Last, the OuterHtml is like this:

<select id="foo_Bar" name="foo.Bar"><option selected="selected" value="1">One<option value="2">Two</select>

So basically it is deciding for me to drop the closing tags on the options. Let's leave aside for a moment whether it is proper and desirable to do that. I am using HtmlAgilityPack to test HTML generation code, so I don't want it to make any decision for me or give any errors unless the HTML is truly malformed. Is there some way to make it behave how I want? I tried setting some of the options for HtmlDocument, specifically:

 doc.OptionAutoCloseOnEnd = false;
 doc.OptionCheckSyntax = false;
 doc.OptionFixNestedTags = false;

This is not working. If HtmlAgilityPack cannot do what I want, can you recommend something that can?

解决方案

The exact same error is reported on the HAP home page's discussion, but it looks like no meaningful fixes have been made to the project in a few years. Not encouraging.

A quick browse of the source suggests the error might be fixable by commenting out line 92 of HtmlNode.cs:

// they sometimes contain, and sometimes they don 't...
ElementsFlags.Add("option", HtmlElementFlag.Empty);

(Actually no, they always contain label text, although a blank string would also be valid text. A careless author might omit the end-tag, but then that's true of any element.)

ADD

An equivalent solution is calling HtmlNode.ElementsFlags.Remove("option"); before any use of liberary (without need to modify the liberary source code)

这篇关于HtmlAgilityPack丢弃选项结束标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆