在Python中的两个标签之间获取数据 [英] Get data between two tags in Python

查看:848
本文介绍了在Python中的两个标签之间获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<h3>
<a href="article.jsp?tp=&arnumber=16">
Granular computing based
<span class="snippet">data</span>
<span class="snippet">mining</span>
in the views of rough set and fuzzy set
</a>
</h3>

使用Python,我想从定位标记中获取值,该定位标记应该是从粗糙集和模糊集的角度出发基于粒度计算的数据挖掘

Using Python I want to get the values from the anchor tag which should be Granular computing based data mining in the views of rough set and fuzzy set

我尝试使用lxml

parser = etree.HTMLParser()
tree   = etree.parse(StringIO.StringIO(html), parser)                   
xpath1 = "//h3/a/child::text() | //h3/a/span/child::text()"
rawResponse = tree.xpath(xpath1)              
print rawResponse

并获得以下输出

['\r\n\t\t','\r\n\t\t\t\t\t\t\t\t\tgranular computing based','data','mining','in the view of roughset and fuzzyset\r\n\t\t\t\t\t\t\]

推荐答案

您可以使用text_content方法:

import lxml.html as LH

html = '''<h3>
<a href="article.jsp?tp=&arnumber=16">
Granular computing based
<span class="snippet">data</span>
<span class="snippet">mining</span>
in the views of rough set and fuzzy set
</a>
</h3>'''

root = LH.fromstring(html)
for elt in root.xpath('//a'):
    print(elt.text_content())

收益

Granular computing based
data
mining
in the views of rough set and fuzzy set

或者,要删除空格,您可以使用

or, to remove whitespace, you could use

print(' '.join(elt.text_content().split()))

获得

Granular computing based data mining in the views of rough set and fuzzy set

这是您可能会发现有用的另一种选择:

Here is another option which you might find useful:

print(' '.join([elt.strip() for elt in root.xpath('//a/descendant-or-self::text()')]))

收益

Granular computing based data  mining in the views of rough set and fuzzy set

(请注意,它在datamining之间留有多余的空间.)

(Note it leaves an extra space between data and mining however.)

'//a/descendant-or-self::text()'是的更通用的版本 "//a/child::text() | //a/span/child::text()".它将遍历所有子孙等.

'//a/descendant-or-self::text()' is a more generalized version of "//a/child::text() | //a/span/child::text()". It will iterate through all children and grandchildren, etc.

这篇关于在Python中的两个标签之间获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆