如何使用Python在XPath中以多行文本搜索内容? [英] How to search for content in XPath in multiline text using Python?
问题描述
当我使用contains在元素的text()中搜索数据的存在时,它适用于纯数据,但当元素内容中有换行符,换行符/标记时,它不起作用.在这种情况下,如何使//td[contains(text(), "")]
工作?谢谢!
When I search for the existence of data in text() of an element using contains, it works for plain data but not when there are carriage returns, new lines/tags in the element content. How to make //td[contains(text(), "")]
work in this case? Thank you!
XML:
<table>
<tr>
<td>
Hello world <i> how are you? </i>
Have a wonderful day.
Good bye!
</td>
</tr>
<tr>
<td>
Hello NJ <i>, how are you?
Have a wonderful day.</i>
</td>
</tr>
</table>
Python:
>>> tdout=open('tdmultiplelines.htm', 'r')
>>> tdouthtml=lh.parse(tdout)
>>> tdout.close()
>>> tdouthtml
<lxml.etree._ElementTree object at 0x2aaae0024368>
>>> tdouthtml.xpath('//td/text()')
['\n Hello world ', '\n Have a wonderful day.\n Good bye!\n ', '\n Hello NJ ', '\n ']
>>> tdouthtml.xpath('//td[contains(text(),"Good bye")]')
[] ##-> But *Good bye* is already in the `td` contents, though as a list.
>>> tdouthtml.xpath('//td[text() = "\n Hello world "]')
[<Element td at 0x2aaae005c410>]
推荐答案
使用:
//td[text()[contains(.,'Good bye')]]
说明:
出现此问题的原因不是文本节点的字符串值是多行字符串-真正的原因是td
元素具有多个文本节点子级.
The reason for the problem is not that a text node's string value is a multiline string -- the real reason is that the td
element has more than one text-node children.
在提供的表达式中:
//td[contains(text(),"Good bye")]
传递给函数contains()
的第一个参数是一个节点集,包含多个文本节点.
the first argument passed to the function contains()
is a node-set of more than one text nodes.
根据XPath 1.0规范(在XPath 2.0中,这只会引发类型错误),一个对函数的求值,该函数需要一个字符串参数,但会传递一个节点集,而仅采用字符串的值.节点集中的第一个节点.
As per XPath 1.0 specification (in XPath 2.0 this simply raises a type error), a the evaluation of a function that expects a string argument but is passed a node-set instead, takes the string value only of the 1st node in the node-set.
在这种情况下,传递的节点集中的第一个文本节点具有字符串值:
"
Hello world "
,因此比较失败,并且未选择所需的td
元素.
so the comparison fails and the wanted td
element isn't selected.
基于XSLT的验证:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select="//td[text()[contains(.,'Good bye')]]"/>
</xsl:template>
</xsl:stylesheet>
当此转换应用于提供的XML文档时:
<table>
<tr>
<td>
Hello world <i> how are you? </i>
Have a wonderful day.
Good bye!
</td>
</tr>
<tr>
<td>
Hello NJ <i>, how are you?
Have a wonderful day.</i>
</td>
</tr>
</table>
对XPath表达式求值,并将选定的节点(在本例中为一个)复制到输出中:
<td>
Hello world <i> how are you? </i>
Have a wonderful day.
Good bye!
</td>
这篇关于如何使用Python在XPath中以多行文本搜索内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!