可以获取HtmlNode的位置&原始输入中的长度? [英] Possible to get HtmlNode's position & length within original input?
问题描述
考虑以下HTML片段(_
用于空格):
Consider the following HTML fragment (_
is used for whitespace):
<head>
...
<link ... ___/>
<!-- ... -->
...
</head>
我正在使用HTML Agility Pack(HAP)读取HTML文件/片段并去除链接.我想做的是找到LINK
(和一些其他)元素,然后将其替换为空格,如下所示:
I'm using Html Agility Pack (HAP) to read HTML files/fragments and to strip out links. What I want to do is find the LINK
(and some other) elements and then replace them with whitespace, like so:
<head>
...
____________
<!-- ... -->
...
</head>
到目前为止,解析部分似乎一直在工作,我得到了我要寻找的节点.但是,HAP会尝试修复HTML内容,而我需要一切都完全相同,除了我要进行的更改之外.另外,在回写先前读取的内容时,HAP似乎有很多错误,因此我要采用的方法是让HAP解析输入,然后回到原始输入并替换我所输入的内容不想.
The parsing part seems to be working so far, I get the nodes I'm looking for. However, HAP tries to fix the HTML content while I need everything to be exactly the same, except for the changes I'm trying to make. Plus, HAP seems to have quite a few bugs when it comes to writing back content that was read in previously, so the approach I want to take is let HAP parse the input and then I go back to the original input and replace content that I don't want.
问题是,HtmlNode
似乎没有输入长度属性.它的StreamPosition
似乎指示在输入中从哪里开始读取节点的内容,但是我找不到长度属性,该长度属性告诉我构建该节点要消耗多少字符.
The problem is, HtmlNode
doesn't seem to have an input length property. It has StreamPosition
which seems to indicate where reading of the node's content started within the input but I couldn't find a length property that'd tell me how many characters were consumed to build the node.
我尝试使用OuterHtml
属性,但是不幸的是,HAP尝试通过删除___/
部分(不应关闭LINK
元素)来修复LINK
.因此,OuterHtml.Length
返回错误的长度.
I tried using the OuterHtml
propety but, unfortunately, HAP tries to fix the LINK
by removing the ___/
part (a LINK
element is not supposed to be closed). Because of this, OuterHtml.Length
returns the wrong length.
HAP中是否有一种获取此信息的方法?
Is there a way in HAP to get this information?
推荐答案
我最终修改了HtmlAgilityPack的代码,以暴露一个新属性,该属性返回HtmlNode
的私有_outerlength
字段.
I ended up modifying the code of HtmlAgilityPack to expose a new property that returns the private _outerlength
field of HtmlNode
.
public virtual int OuterLength
{
get
{
return ( _outerlength );
}
}
到目前为止,看来一切正常.
This seems to be working fine so far.
这篇关于可以获取HtmlNode的位置&原始输入中的长度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!