HtmlAgility ParseErrors属性 [英] HtmlAgility ParseErrors Property

查看:50
本文介绍了HtmlAgility ParseErrors属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我期望修复HtmlAgility库有哪些错误?根据我的经验,我知道它可以关闭丢失的标签,例如:

What errors can I expect to fix HtmlAgility library? I know from my own experience it can close a missing tag, like:

<car>Nissan</car

在执行Load或LoadHtml时,它将对其进行修复,例如:

When do Load or LoadHtml, it will fix it, like:

<car>Nissan</car>

我也知道ParseErorrs集合可以确定原因,流等.

I also know that ParseErorrs collection can determine Reason, Stream etc.

是否有错误列表(或者您可以根据自己的经验告诉我们),HtmlAgility修复错误的可靠性如何?HtmlAgility无法修复哪些错误?

Is there a list of errors (or can you tell from your own experience) how reliable is HtmlAgility for fixing errors and what errors cannot be fixed by HtmlAgility?

推荐答案

从历史上看,Html Agility Pack从来没有设计为 fix Html,而是能够加载,修改和修改HTML.即使此HTML出现错误,也要保存回去.

Historically, Html Agility Pack was never designed to fix Html, but rather to be able to load, modify & save it back, even if this Html has errors.

这意味着它将修复通常由浏览器自动修复的错误,例如您在问题中显示的错误.错误列表已通过实验确定,您可以浏览源以获取有关其的深入了解.话虽这么说,它实际上是在2000/2001年设计的,所以在该领域可能已经发生了变化:-)

It means it will fix errors that in general are fixed automatically by browsers, like the one you show in your question. The list of errors has been determined experimentally, and you can browse the source for a deep insight about it. That being said, it was actually designed back in 2000/2001 years so things may have changed in that area :-)

ParseErrors集合将包含带有代码的HtmlParseError对象.该代码是一个已记录的枚举:

The ParseErrors collection will contain HtmlParseError objects with a code. The code is an enum that's documented:

    /// A tag was not closed.
    TagNotClosed,

    /// A tag was not opened.
    TagNotOpened,

    /// There is a charset mismatch between stream and declared (META) encoding.
    CharsetMismatch,

    /// An end tag was not required.
    EndTagNotRequired,

    /// An end tag is invalid at this position.
    EndTagInvalidHere

HtmlDocument上还有一个OptionFixNestedTags属性(默认值为false),当检测到嵌套错误时,该属性可以修复LI,TR,TH,TD标签.这意味着,如果检测到没有所有所需的关闭TD的关闭TR,它们将自动关闭.同样,这正是浏览器将使用格式错误的HTML的功能.

There is also an OptionFixNestedTags property on HtmlDocument (default value is false), that is capable of fixing LI, TR, TH, TD tags when nesting errors are detected. It means if it detects a closing TR without all the needed closing TD, they will be closed automatically. Again, this is exactly what browser will do with malformed Html.

这篇关于HtmlAgility ParseErrors属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆