HtmlAgilityPack 删除选项结束标签 [英] HtmlAgilityPack Drops Option End Tags

查看:41
本文介绍了HtmlAgilityPack 删除选项结束标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 HtmlAgilityPack.我使用以下字符串创建了一个 HtmlDocument 和 LoadHtml:

I am using HtmlAgilityPack. I create an HtmlDocument and LoadHtml with the following string:

<select id="foo_Bar" name="foo.Bar"><option selected="selected" value="1">One</option><option value="2">Two</option></select>

这做了一些意想不到的事情.首先,它给出了两个解析器错误,EndTagNotRequired.其次,选择节点有 4 个子节点——两个用于选项标签,另外两个用于选项标签的内部文本.最后,OuterHtml 是这样的:

This does some unexpected things. First, it gives two parser errors, EndTagNotRequired. Second, the select node has 4 children - two for the option tags and two more for the inner text of the option tags. Last, the OuterHtml is like this:

<select id="foo_Bar" name="foo.Bar"><option selected="selected" value="1">One<option value="2">Two</select>

所以基本上我决定放弃选项的结束标签.让我们暂时搁置一下这样做是否适当和可取.我正在使用 HtmlAgilityPack 来测试 HTML 生成代码,所以我不希望它为我做出任何决定或给出任何错误,除非 HTML 真的格式错误.有没有办法让它按照我想要的方式行事?我尝试为 HtmlDocument 设置一些选项,特别是:

So basically it is deciding for me to drop the closing tags on the options. Let's leave aside for a moment whether it is proper and desirable to do that. I am using HtmlAgilityPack to test HTML generation code, so I don't want it to make any decision for me or give any errors unless the HTML is truly malformed. Is there some way to make it behave how I want? I tried setting some of the options for HtmlDocument, specifically:

 doc.OptionAutoCloseOnEnd = false;
 doc.OptionCheckSyntax = false;
 doc.OptionFixNestedTags = false;

这不起作用.如果 HtmlAgilityPack 不能做我想做的,你能推荐一些可以吗?

This is not working. If HtmlAgilityPack cannot do what I want, can you recommend something that can?

推荐答案

HAP 主页的讨论中报告了完全相同的错误,但似乎几年来没有对该项目进行有意义的修复.不鼓励.

The exact same error is reported on the HAP home page's discussion, but it looks like no meaningful fixes have been made to the project in a few years. Not encouraging.

快速浏览源代码表明该错误可能可以通过注释掉 HtmlNode.cs 的第 92 行来修复:

A quick browse of the source suggests the error might be fixable by commenting out line 92 of HtmlNode.cs:

// they sometimes contain, and sometimes they don 't...
ElementsFlags.Add("option", HtmlElementFlag.Empty);

(实际上不,它们总是包含标签文本,尽管空白字符串也是有效文本.粗心的作者可能会省略结束标记,但对于任何元素都是如此.)

(Actually no, they always contain label text, although a blank string would also be valid text. A careless author might omit the end-tag, but then that's true of any element.)

添加

一个等效的解决方案是在使用 liberary 之前调用 HtmlNode.ElementsFlags.Remove("option");(无需修改 liberary 源代码)

An equivalent solution is calling HtmlNode.ElementsFlags.Remove("option"); before any use of liberary (without need to modify the liberary source code)

这篇关于HtmlAgilityPack 删除选项结束标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆