以通用方式使用html.ParseFragment [英] Using html.ParseFragment in a generic way

查看:46
本文介绍了以通用方式使用html.ParseFragment的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用实验性的 code.google.com/p/go.net/html 包,我们可以使用 ParseFragment 解析HTML文档的某些子部分.

Using the experimental code.google.com/p/go.net/html package, we can use ParseFragment to parse some sub-section of an HTML document.

赞:

var s = `
    <option id="foo">first</option>
    <option Class="tester">second</option>
    <option>third</option>
`
doc, err := html.ParseFragment(strings.NewReader(s), &html.Node{
    Type: html.ElementNode,
    Data: "body",
    DataAtom: atom.Body,
})

这对于大多数元素都适用.但是,当某些元素位于HTML的根位置时,例如 tbody tr td (和也许其他人,不确定).它只是忽略标签,仅提供文本内容.

This works fine for most elements. But it doesn't seem to work when certain elements are at the root position of the HTML, like tbody, tr, and td (and perhaps others, not sure). It simply ignores the tags and only gives the text content.

可以通过提供语义正确的父级而不是 atom.Body 来解决此问题,但这要求我们事先知道HTML的含义.

This can be remedied by providing the semantically correct parent instead of atom.Body, but that requires that we know in advance what the HTML will be.

我希望有一个像 atom.DocumentFragment 这样的通用根目录,但是我没有看到.那么,是否有某种方式可以与任意HTML片段一起使用呢?

I'd hoped there was a generic root like atom.DocumentFragment, but I don't see that. So is there some way to use this in such a manner that it'll work with any arbitrary HTML fragment?

推荐答案

ParseFragment 始终是上下文相关的,因为它遵循HTML5片段解析算法.该算法是为实现DOM innerHTML属性而设计的,从给定的innerHTML字符串生成的正确树取决于周围的上下文(尤其是上下文是否在表中).

ParseFragment is always context-sensitive because it follows the HTML5 fragment-parsing algorithm. That algorithm is designed for implementing the DOM innerHTML property, and the correct tree to generate from a given innerHTML string depends on the surrounding context (especially whether the context is in a table or not).

因此, html 包无法独立于其上下文来解析HTML片段.

So the html package has no way to parse an HTML fragment independently of its context.

如果您需要有关解析如何取决于上下文的更多信息,请参见 http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#reset-the-insertion-mode-approprilyly

If you need more information about how the parsing depends on the context, see http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#reset-the-insertion-mode-appropriately

这篇关于以通用方式使用html.ParseFragment的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆