以通用方式使用html.ParseFragment [英] Using html.ParseFragment in a generic way
问题描述
使用实验性的 code.google.com/p/go.net/html
包,我们可以使用 ParseFragment
解析HTML文档的某些子部分.
Using the experimental code.google.com/p/go.net/html
package, we can use ParseFragment
to parse some sub-section of an HTML document.
赞:
var s = `
<option id="foo">first</option>
<option Class="tester">second</option>
<option>third</option>
`
doc, err := html.ParseFragment(strings.NewReader(s), &html.Node{
Type: html.ElementNode,
Data: "body",
DataAtom: atom.Body,
})
这对于大多数元素都适用.但是,当某些元素位于HTML的根位置时,例如 tbody
, tr
和 td
(和也许其他人,不确定).它只是忽略标签,仅提供文本内容.
This works fine for most elements. But it doesn't seem to work when certain elements are at the root position of the HTML, like tbody
, tr
, and td
(and perhaps others, not sure). It simply ignores the tags and only gives the text content.
可以通过提供语义正确的父级而不是 atom.Body
来解决此问题,但这要求我们事先知道HTML的含义.
This can be remedied by providing the semantically correct parent instead of atom.Body
, but that requires that we know in advance what the HTML will be.
我希望有一个像 atom.DocumentFragment
这样的通用根目录,但是我没有看到.那么,是否有某种方式可以与任意HTML片段一起使用呢?
I'd hoped there was a generic root like atom.DocumentFragment
, but I don't see that. So is there some way to use this in such a manner that it'll work with any arbitrary HTML fragment?
推荐答案
ParseFragment
始终是上下文相关的,因为它遵循HTML5片段解析算法.该算法是为实现DOM innerHTML属性而设计的,从给定的innerHTML字符串生成的正确树取决于周围的上下文(尤其是上下文是否在表中).
ParseFragment
is always context-sensitive because it follows the HTML5 fragment-parsing algorithm. That algorithm is designed for implementing the DOM innerHTML property, and the correct tree to generate from a given innerHTML string depends on the surrounding context (especially whether the context is in a table or not).
因此, html
包无法独立于其上下文来解析HTML片段.
So the html
package has no way to parse an HTML fragment independently of its context.
如果您需要有关解析如何取决于上下文的更多信息,请参见 http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#reset-the-insertion-mode-approprilyly
If you need more information about how the parsing depends on the context, see http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#reset-the-insertion-mode-appropriately
这篇关于以通用方式使用html.ParseFragment的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!