从网页中抓取数据属性 [英] Scraping data- attributes from web page
本文介绍了从网页中抓取数据属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要一些帮助来使用 python 来抓取一些数据属性形成一个站点.我曾尝试使用 lxml
和 requests
,但没有成功,并且在网上查看并找到了一些关于使用美丽汤的文章.唯一的问题是我不确定如何.
这是我想刮的.
<div class="card-entry";数据变量1 =0";数据变量2 =1"数据变量3 =20";数据-var4=3"data-var5=9">…</div>><div class="card-entry";数据变量1 =1";数据-var2=2"数据-var3=9"数据-var4=2"data-var5=7">…</div>><div class="card-entry";数据变量1 =2";数据-var2=3"数据-var3=1"数据-var4=3"data-var5=3">…</div><div class="card-entry";数据变量1 =3"数据-var2=4"数据变量3 =5";数据-var4=2"data-var5=9">...</div>我正在尝试获取 data-var5
值,但我不知道如何获取.希望有人能帮忙.
问候,
哈扎
解决方案你可以使用select
.你可以试试:
from bs4 import BeautifulSouphtml = """<div class="card-entry" data-var1="0" data-var2="1" data-var3="20" data-var4="3" data-var5="9">... </div><div class="card-entry" data-var1="1" data-var2="2" data-var3="9" data-var4="2" data-var5="7">... </div><div class="card-entry" data-var1="2" data-var2="3" data-var3="1" data-var4="3" data-var5="3">...</div><div class="card-entry" data-var1="3" data-var2="4" data-var3="5" data-var4="2" data-var5="9">...</div>"""汤 = BeautifulSoup(html, "lxml")data_var = soup.select('div[data-var5]')对于 data_var 中的数据:打印(数据变量5:"+数据['数据变量5'])
输出将是:
data-var5: 9数据变量 5:7数据变量 5:3数据变量 5:9
I am needing some assistance on using python to scrape some data- attributes form a site. I have tried using lxml
and requests
with no luck and have looked online and I found some articles about using beautiful soup. The only problem is I am not sure how.
Here is what I would like to scrape.
<div class="card-body ">
<div class="card-entry" data-var1="0" data-var2="1" data-var3="20" data-var4="3" data-var5="9">… </div>">
<div class="card-entry" data-var1="1" data-var2="2" data-var3="9" data-var4="2" data-var5="7">… </div>">
<div class="card-entry" data-var1="2" data-var2="3" data-var3="1" data-var4="3" data-var5="3">…</div>
<div class="card-entry" data-var1="3" data-var2="4" data-var3="5" data-var4="2" data-var5="9">…</div>
I am trying to get the data-var5
value out but I have no idea how. Hope someone can help.
Regards,
Hazza
解决方案 you can use select
. you can try it:
from bs4 import BeautifulSoup
html = """
<div class="card-entry" data-var1="0" data-var2="1" data-var3="20" data-var4="3" data-var5="9">… </div>
<div class="card-entry" data-var1="1" data-var2="2" data-var3="9" data-var4="2" data-var5="7">… </div>
<div class="card-entry" data-var1="2" data-var2="3" data-var3="1" data-var4="3" data-var5="3">…</div>
<div class="card-entry" data-var1="3" data-var2="4" data-var3="5" data-var4="2" data-var5="9">…</div>
"""
soup = BeautifulSoup(html, "lxml")
data_var = soup.select('div[data-var5]')
for data in data_var:
print("data-var5: " + data['data-var5'])
Output will be:
data-var5: 9
data-var5: 7
data-var5: 3
data-var5: 9
这篇关于从网页中抓取数据属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文