使用Beautiful Soup在字符串前获取元素 [英] Get an element before a string with Beautiful Soup

查看:87
本文介绍了使用Beautiful Soup在字符串前获取元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Beautiful Soup在网站上搜索一组整数值,并生成与名称匹配的这些值的列表.但是,我遇到的问题是,该网站对我需要的元素使用了一些非常模糊的类名(列表项"),这些元素是在其他元素中复制的,而我不想抓住.到目前为止,我的代码如下:

I'm using Beautiful Soup to search a website for a set of integer values and produce a list of these, matched to names. However, the problem I'm having is that the website uses some very vague class names for the elements I need ("list-item") that are reproduced in other elements, which I don't want to grab. So far my code looks like:

from bs4 import BeautifulSoup as bs
import requests

url = "http://beautifulnumberssite.com/"
html = requests.get(url).text
soup = bs(html)

names = soup.findAll("h1", class_="th1")
stats = soup.findAll("li", class_="list-item")

print(names, stats)

但是,这也返回了很多我不想要的东西.有没有办法使Beautiful Soup只返回元素的内容,然后是某个字符串?因此,如果网页包含类似以下内容的部分:

However, this is also returning a whole bunch of stuff I don't want. Is there a way I make it so Beautiful Soup only returns the contents of elements which are followed by a certain string? So, if the web-page contains a section that's like:

<li class='list-item'>
<strong>65</strong>
Important Values
</li>
<li class='list-item'>
<strong>49</strong>
Useless Values
</li>

我希望能够设置Beautiful Soup/Python来解析诸如重要值"之类的字符串,并在该元素之前直接获取该元素(忽略任何换行符或空格),或者更好的是,其中包含的值元素.因此,在这种情况下,Beautiful Soup会打印:

I would like to be able to set Beautiful Soup/Python to parse for a string like "Important Values" and get the element directly before it (ignoring any line breaks or white-space), or better yet the value contained within the element. So in this case Beautiful Soup would either print:

<strong>65</strong>

或更优选地,只是:

65

这可能吗?

推荐答案

只需迭代您的类元素,并检查其内容是否与您的重要字符串匹配:

Just iterate on your class elements and check if their contents match your important string :

for listItem in soup.findAll('li', class_='list-item'):
    if listItem.decode_contents(formatter="html").find('Important Values') != -1:
        print(listItem.find('strong').contents)        

这篇关于使用Beautiful Soup在字符串前获取元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆