BeautifulSoup-如何在指定的字符串后提取文本 [英] BeautifulSoup - How to extract text after specified string

查看：482 发布时间：2020/9/20 7:34:50 python python-3.x beautifulsoup extract

本文介绍了BeautifulSoup-如何在指定的字符串后提取文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有类似HTML的

<tr>
    <td>Title:</td>
    <td>Title value</td>
</tr>

我必须指定带有文本的<td>之后要获取第二个<td>的文本.类似于:抓取<td>之后的第一个下一个<td>的文本，其中包含文本Title:.结果应为:Title value

I have to specify after which <td> with text i want to grab text of second <td>. Something like: Grab text of first next <td> after <td> which contain text Title:. Result should be: Title value

我对Python和BeutifulSoupno有一些基本的了解，而且我不知道在没有class可以指定的情况下该怎么做.

I have some basic understanding of Python and BeutifulSoupno and i have no idea how can I do this when there is no class to specify.

我已经尝试过了:

row =  soup.find_all('td', string='Title:')
text = str(row.nextSibling)
print(text)

，我收到错误:AttributeError:'ResultSet' object has no attribute 'nextSibling'

and I receive error: AttributeError: 'ResultSet' object has no attribute 'nextSibling'

推荐答案

首先，soup.find_all()返回一个ResultSet，其中包含所有带有标签td且字符串为Title:的元素.

First of all, soup.find_all() returns a ResultSet which contains all the elements with tag td and string as Title: .

对于结果集中的每个此类元素，您将需要单独获取nextSibling(同样，您应该循环遍历，直到找到标记td的nextSibling为止，因为您可以在它们之间获取其他元素(例如NavigableString )).

For each such element in the result set , you will need to get the nextSibling separately (also, you should loop through until you find the nextSibling of tag td , since you can get other elements in between (like a NavigableString)).

示例-

>>> from bs4 import BeautifulSoup
>>> s="""<tr>
...     <td>Title:</td>
...     <td>Title value</td>
... </tr>"""
>>> soup = BeautifulSoup(s,'html.parser')
>>> row =  soup.find_all('td', string='Title:')
>>> for r in row:
...     nextSib = r.nextSibling
...     while nextSib.name != 'td' and nextSib is not None:
...             nextSib = nextSib.nextSibling
...     print(nextSib.text)
...
Title value

或者您可以使用另一个支持XPATH的库，并且使用Xpath可以轻松地做到这一点.其他库--lxml或xml.etree.

这篇关于BeautifulSoup-如何在指定的字符串后提取文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BeautifulSoup-如何在指定的字符串后提取文本 [英] BeautifulSoup - How to extract text after specified string

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

BeautifulSoup-如何在指定的字符串后提取文本 [英] BeautifulSoup - How to extract text after specified string

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭