BeautifulSoup-如何在指定的字符串后提取文本 [英] BeautifulSoup - How to extract text after specified string

查看:482
本文介绍了BeautifulSoup-如何在指定的字符串后提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有类似HTML的

<tr>
    <td>Title:</td>
    <td>Title value</td>
</tr>

我必须指定带有文本的<td>之后要获取第二个<td>的文本.类似于:抓取<td>之后的第一个下一个<td>的文本,其中包含文本Title:.结果应为:Title value

I have to specify after which <td> with text i want to grab text of second <td>. Something like: Grab text of first next <td> after <td> which contain text Title:. Result should be: Title value

我对Python和BeutifulSoupno有一些基本的了解,而且我不知道在没有class可以指定的情况下该怎么做.

I have some basic understanding of Python and BeutifulSoupno and i have no idea how can I do this when there is no class to specify.

我已经尝试过了:

row =  soup.find_all('td', string='Title:')
text = str(row.nextSibling)
print(text)

,我收到错误:AttributeError:'ResultSet' object has no attribute 'nextSibling'

and I receive error: AttributeError: 'ResultSet' object has no attribute 'nextSibling'

推荐答案

首先,soup.find_all()返回一个ResultSet,其中包含所有带有标签td且字符串为Title:的元素.

First of all, soup.find_all() returns a ResultSet which contains all the elements with tag td and string as Title: .

对于结果集中的每个此类元素,您将需要单独获取nextSibling(同样,您应该循环遍历,直到找到标记td的nextSibling为止,因为您可以在它们之间获取其他元素(例如NavigableString )).

For each such element in the result set , you will need to get the nextSibling separately (also, you should loop through until you find the nextSibling of tag td , since you can get other elements in between (like a NavigableString)).

示例-

>>> from bs4 import BeautifulSoup
>>> s="""<tr>
...     <td>Title:</td>
...     <td>Title value</td>
... </tr>"""
>>> soup = BeautifulSoup(s,'html.parser')
>>> row =  soup.find_all('td', string='Title:')
>>> for r in row:
...     nextSib = r.nextSibling
...     while nextSib.name != 'td' and nextSib is not None:
...             nextSib = nextSib.nextSibling
...     print(nextSib.text)
...
Title value


或者您可以使用另一个支持XPATH的库,并且使用Xpath可以轻松地做到这一点.其他库--lxmlxml.etree.

这篇关于BeautifulSoup-如何在指定的字符串后提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆