BeautifulSoup“抓取"用他们的名字和身份证 [英] BeautifulSoup "Scraping" using their name and their id

查看:80
本文介绍了BeautifulSoup“抓取"用他们的名字和身份证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用beautifulsoup,但不确定如何正确使用find,findall和其他功能...

I'm using beautifulsoup but I'm unsure how to correctly make use of find, findall and the other functions...

如果我有

<div class="hey"></div>

使用:soup.find_all("div", class_="hey")

将正确找到有问题的div,但是对于以下情况,我不知道如何做:

Will correctly find the div in question, however I do not know how to do it for the following:

<h3 id="me"></h3> # Find this one via "h3" and "id"

<li id="test1"></li># Find this one via "li" and "id"

<li custom="test2321"></li># Find this one via "li" and "custom"

<li id="test1" class="tester"></li> # Find this one via "li" and "class"

<ul class="here"></ul> # Find this one via "ul" and "class"

任何想法都将不胜感激:)

Any ideas would be much appreciated :)

推荐答案

看看下面的代码:

from bs4 import BeautifulSoup

html = """
<h3 id="me"></h3>
<li id="test1"></li>
<li custom="test2321"></li>
<li id="test1" class="tester"></li>
<ul class="here"></ul>
"""

soup = BeautifulSoup(html)

# This tells BS to look at all the h3 tags, and find the ones that have an ID of me
# This however should not be done because IDs are supposed to be unique, so
# soup.find_all(id="me") should be used
one = soup.find_all("h3", {"id": "me"})
print one

# Same as above, if something has an ID, just use the ID
two = soup.find_all("li", {"id": "test1"})  # ids should be unique
print two

# Tells BS to look at all the li tags and find the node with a custom attribute
three = soup.find_all("li", {"custom": "test2321"})
print three

# Again ID, should have been enough
four = soup.find_all("li", {"id": "test1", "class": "tester"})
print four

# Look at ul tags, and find the one with a class attribute of "here"
four = soup.find_all("ul", {"class": "here"})
print four

输出:

[<h3 id="me"></h3>]
[<li id="test1"></li>, <li class="tester" id="test1"></li>]
[<li custom="test2321"></li>]
[<li class="tester" id="test1"></li>]
[<ul class="here"></ul>]

应提供所需的文档.

This should provide the required documentation.

这篇关于BeautifulSoup“抓取"用他们的名字和身份证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆