Xpath提取当前节点内容，包括所有子节点 [英] Xpath extract current node content including all child node

查看：1852 发布时间：2020/5/4 8:37:31 python xpath lxml

本文介绍了Xpath提取当前节点内容，包括所有子节点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

提取当前节点内容(包括所有子节点)时遇到问题.

I've met a problem while extracting current node content including all child node.

就像下面的代码一样，我想获取字符串 abcdefg<b>b1b2b3</b> 在预标签中.

Just like the following code, I want to get string abcdefg<b>b1b2b3</b> in pre tag.

但是我不能使用"child :: *"来获取它. 如果使用"/text()"，则会丢失b标签格式信息.请帮帮我.

But I could not use "child::*" to get it. If I use "/text()", I lost b tag format information. Please help me out.

# -*- coding: utf-8 -*-
from lxml import html
import lxml.etree as le

input = "<pre>abcdefg<b>b1b2b3</b></pre>"
input_xpath = "//pre/child::*"
tree = html.fromstring(input)
result = tree.xpath(input_xpath)
result1 = [le.tostring(item) for item in result]
result2 = ''.join(result1)
print result2

output: <b>b1b2b3</b>

推荐答案

获取XML节点的内容标记(有时称为)，您可以从选择节点开始(而不是选择子项或文本内容):

To get XML node's content markup (sometimes referred to as "innerXML") , you can start by selecting the node (instead of selecting the child or the text content) :

from lxml import html
import lxml.etree as le

input = "<pre>abcdefg<b>b1b2b3</b></pre>"
tree = html.fromstring(input)
node = tree.xpath("//pre")[0]

然后将文本内容与所有子节点标记结合起来:

then combine the text content with all child nodes markup :

result = node.text + ''.join(le.tostring(e) for e in node)
print result

输出:

abcdefg<b>b1b2b3</b>

这篇关于Xpath提取当前节点内容，包括所有子节点的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Xpath提取当前节点内容，包括所有子节点 [英] Xpath extract current node content including all child node

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Xpath提取当前节点内容，包括所有子节点 [英] Xpath extract current node content including all child node

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭