了解 Beautiful Soup 中的 Find() 函数 [英] Understand the Find() function in Beautiful Soup

查看:36
本文介绍了了解 Beautiful Soup 中的 Find() 函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道我想要做的很简单,但它让我感到悲伤.我想使用 BeautifulSoup 从 HTML 中提取数据.为此,我需要正确使用 .find() 函数.这是我正在使用的 HTML:

<div class="profile-info"><img class="profile-pic" src="https://pbs.twimg.com/profile_images/471758097036226560/tLLeiOiL_normal.jpeg"/><h4>Ed Boon</h4><span class="screen-name"><a href="http://www.twitter.com/noobde" target="_blank">@noobde</a></span>

<div class="追随者"><div class="pie"></div><div class="pie-data"><span class="real number" data-value=73599>73,599</span><span class="real">真实</span><br/><span class="fake number" data-value=32452>32,452</span><span class="fake">假的</span><br/><h6>关注者</h6>

<div class="score"><img src="//twitteraudit-prod.s3.amazonaws.com/dist/f977287de6281fe3e1ef36d48d996fb83dd6a876/img/audit-result-good.png"/><div class="percentage good">69%

<h6>审计分数</h6>

我想要的值是来自 data-value=7359973599,来自 data-value=3245232352>,以及 percentage good 中的 69%.

使用过去的代码和在线示例,这是我目前所拥有的:

RealValue = soup.find("div", {"class":"real number"})['data-value']FakeValue = soup.find("audit", {"class":"fake number"})['data-value']

到目前为止都没有效果.我不知道如何制作 find 以提取 69% 数字.

解决方案

soup.find("div", {"class":"real number"})['data-value']

在这里您正在搜索 div 元素,但 span 在您的示例 HTML 数据中具有实数"类,请尝试:

soup.find("span", {"class": "real number", "data-value": True})['data-value']

这里我们还要检查 data-value 属性是否存在.

<小时>

要查找具有实数"或假数"类的元素,您可以创建一个 CSS 选择器:

for elm in sound.select(".real.number,.fake.number"):打印(榆树.get(数据值"))

<小时>

要获得 69% 值:

soup.find("div", {"class": "percentage good"}).get_text(strip=True)

或者,一个 CSS 选择器:

soup.select_one(".percentage.good").get_text(strip=True)汤.select_one(".score .percentage").get_text(strip=True)

或者,定位具有 Audit score 文本的 h6 元素,然后获取 前面的兄弟:

soup.find("h6", text="Audit score").previous_sibling.get_text(strip=True)

I know what I'm trying to do is simple but it's causing me grief. I'd like pull data from HTML using BeautifulSoup. To do that I need to properly use the .find() function. Here's the HTML I'm working with:

<div class="audit">

    <div class="profile-info">
        <img class="profile-pic" src="https://pbs.twimg.com/profile_images/471758097036226560/tLLeiOiL_normal.jpeg" />
        <h4>Ed Boon</h4>
        <span class="screen-name"><a href="http://www.twitter.com/noobde" target="_blank">@noobde</a></span>
    </div>

        <div class="followers">
            <div class="pie"></div>
            <div class="pie-data">
                <span class="real number" data-value=73599>73,599</span><span class="real"> Real</span><br />
                <span class="fake number" data-value=32452>32,452</span><span class="fake"> Fake</span><br />
                <h6>Followers</h6>
            </div>
        </div>
        <div class="score">
            <img src="//twitteraudit-prod.s3.amazonaws.com/dist/f977287de6281fe3e1ef36d48d996fb83dd6a876/img/audit-result-good.png" />
            <div class="percentage good">
                69%
            </div>
            <h6>Audit score</h6>

The values I want are 73599 from data-value=73599, 32352 from data-value=32452, and the 69% from percentage good.

Using past code and online examples, this is what I have so far:

RealValue = soup.find("div", {"class":"real number"})['data-value']
FakeValue = soup.find("audit", {"class":"fake number"})['data-value']

Both so far to no effect. I'm not sure how to craft the find in order to pull the 69% number.

解决方案

soup.find("div", {"class":"real number"})['data-value']

Here you are searching for a div element, but the span has the "real number" class in your example HTML data, try instead:

soup.find("span", {"class": "real number", "data-value": True})['data-value']

Here we are also checking for presence of data-value attribute.


To find elements having "real number" or "fake number" classes, you can make a CSS selector:

for elm in soup.select(".real.number,.fake.number"):
    print(elm.get("data-value"))


To get the 69% value:

soup.find("div", {"class": "percentage good"}).get_text(strip=True)

Or, a CSS selector:

soup.select_one(".percentage.good").get_text(strip=True)
soup.select_one(".score .percentage").get_text(strip=True)

Or, locating the h6 element having Audit score text and then getting the preceding sibling:

soup.find("h6", text="Audit score").previous_sibling.get_text(strip=True)

这篇关于了解 Beautiful Soup 中的 Find() 函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
前端开发最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆