失去了“小于”的标志在HtmlAgilityPack loadhtml [英] Losing the 'less than' sign in HtmlAgilityPack loadhtml

查看:263
本文介绍了失去了“小于”的标志在HtmlAgilityPack loadhtml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我开始与HtmlAgilityPack试验。 。我不熟悉其所有的选项,我想为此我做错了什么。

I recently started experimenting with the HtmlAgilityPack. I am not familiar with all of its options and I think therefor I am doing something wrong.

我有以下内容的字符串:

I have a string with the following content:

string s = "<span style=\"color: #0000FF;\"><</span>";

您看在我的范围我有一个小于的标志。
我处理这个字符串用下面的代码:

You see that in my span I have a 'less than' sign. I process this string with the following code:

HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(s);



但是,当我做这样的跨度一个快速和肮脏的样子:

But when I do a quick and dirty look in the span like this:

htmlDocument.DocumentNode.ChildNodes[0].InnerHtml

我看到的跨度是空的。

做的选项,我需要设置维持小于的标志。我已经尝试过这样的:

What option do I need to set maintain the 'less than' sign. I already tried this:

htmlDocument.OptionAutoCloseOnEnd = false;
htmlDocument.OptionCheckSyntax = false;
htmlDocument.OptionFixNestedTags = false;



但没有成功。

but with no success.

< STRONG>我知道它是无效的HTML。我用这来修复无效的HTML和对'小于'标志使用的HTMLEncode

请告诉我在正确的方向。在此先感谢

Please direct me in the right direction. Thanks in advance

推荐答案

在HTML敏捷包检测到这是一个错误并为它创建一个HtmlParseError实例。您可以阅读使用的HTMLDocument类的ParseErrors的所有错误。所以,如果你运行该代码:

The Html Agility Packs detects this as an error and creates an HtmlParseError instance for it. You can read all errors using the ParseErrors of the HtmlDocument class. So, if you run this code:

    string s = "<span style=\"color: #0000FF;\"><</span>";
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(s);
    doc.Save(Console.Out);

    Console.WriteLine();
    Console.WriteLine();

    foreach (HtmlParseError err in doc.ParseErrors)
    {
        Console.WriteLine("Error");
        Console.WriteLine(" code=" + err.Code);
        Console.WriteLine(" reason=" + err.Reason);
        Console.WriteLine(" text=" + err.SourceText);
        Console.WriteLine(" line=" + err.Line);
        Console.WriteLine(" pos=" + err.StreamPosition);
        Console.WriteLine(" col=" + err.LinePosition);
    }



这将显示该(修正文本第一,并详细描述有关错误,则):

It will display this (the corrected text first, and details about the error then):

<span style="color: #0000FF;"></span>

Error
 code=EndTagNotRequired
 reason=End tag </> is not required
 text=<
 line=1
 pos=30
 col=31



所以,你可以尝试修复这个错误,因为你拥有所有必需的信息(包括行,列和流位置),但固定(未检测)的一般过程中的错误HTML是很复杂的。

So you can try to fix this error, as you have all required information (including line, column, and stream position) but the general process of fixing (not detecting) errors in HTML is very complex.

这篇关于失去了“小于”的标志在HtmlAgilityPack loadhtml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆