Python lxml更改标签层次结构？ [英] Python lxml changes tag hierarchy?

查看：219 发布时间：2018/6/23 14:09:15 python html xml lxml

本文介绍了Python lxml更改标签层次结构？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在使用lxml时遇到了一个小问题。我正在将XML文档转换为HTML文档。
原始的XML看起来像这样（它看起来像HTML，但它在XML文档中）：

 < p>本地化 - 艾菲尔铁塔？ Paris或Vegas< p> Bayes定理p（A | B）< / p>< / p>

当我这样做时（item是上面的字符串）

  lxml.html.tostring（lxml.html.fromstring（item））

我得到这个：

 < div>< p>本地化 - 埃菲尔铁塔？巴黎或维加斯< / p>< p>贝叶斯定理p（A | B）< / p>< / div>

我对< div>没有任何问题，但事实是'贝叶斯定理'的段落不再嵌套在外段是一个问题。

任何人都知道为什么lxml正在这样做以及如何阻止它？感谢。

解决方案

lxml正在这样做，因为它不存储无效的HTML，并且< p> ; 元素不能嵌套在HTML中：

P元素代表一个段落。它不能包含块级元素（包括P本身）。

I'm having a small issue with lxml. I'm converting an XML doc into an HTML doc. The original XML looks like this (it looks like HTML, but it's in the XML doc):

<p>Localization - Eiffel tower? Paris or Vegas <p>Bayes theorem p(A|B)</p></p>

When I do this (item is the string above)

lxml.html.tostring(lxml.html.fromstring(item))

I get this:

<div><p>Localization - Eiffel tower? Paris or Vegas </p><p>Bayes theorem p(A|B)</p></div>

I don't have any problem with the <div>s, but the fact that the 'Bayes theorem' paragraph is no longer nested within the outer paragraph is a problem.

Anyone know why lxml is doing this and how to stop it? Thanks.
解决方案
lxml is doing this because it doesn't store invalid HTML, and <p> elements can't be nested in HTML:

The P element represents a paragraph. It cannot contain block-level elements (including P itself).

这篇关于Python lxml更改标签层次结构？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python lxml更改标签层次结构？ [英] Python lxml changes tag hierarchy?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Python lxml更改标签层次结构？ [英] Python lxml changes tag hierarchy?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭