从网页中抓取数据属性 [英] Scraping data- attributes from web page

查看:35
本文介绍了从网页中抓取数据属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一些帮助来使用 python 来抓取一些数据属性形成一个站点.我曾尝试使用 lxmlrequests ,但没有成功,并且在网上查看并找到了一些关于使用美丽汤的文章.唯一的问题是我不确定如何.

这是我想刮的.

<div class="card-entry";数据变量1 =0";数据变量2 =1"数据变量3 =20";数据-var4=3"data-var5=9">…</div>><div class="card-entry";数据变量1 =1";数据-var2=2"数据-var3=9"数据-var4=2"data-var5=7">…</div>><div class="card-entry";数据变量1 =2";数据-var2=3"数据-var3=1"数据-var4=3"data-var5=3">…</div><div class="card-entry";数据变量1 =3"数据-var2=4"数据变量3 =5";数据-var4=2"data-var5=9">...</div>

我正在尝试获取 data-var5 值,但我不知道如何获取.希望有人能帮忙.

问候,

哈扎

解决方案

你可以使用select.你可以试试:

from bs4 import BeautifulSouphtml = """<div class="card-entry" data-var1="0" data-var2="1" data-var3="20" data-var4="3" data-var5="9">... </div><div class="card-entry" data-var1="1" data-var2="2" data-var3="9" data-var4="2" data-var5="7">... </div><div class="card-entry" data-var1="2" data-var2="3" data-var3="1" data-var4="3" data-var5="3">...</div><div class="card-entry" data-var1="3" data-var2="4" data-var3="5" data-var4="2" data-var5="9">...</div>"""汤 = BeautifulSoup(html, "lxml")data_var = soup.select('div[data-var5]')对于 data_var 中的数据:打印(数据变量5:"+数据['数据变量5'])

输出将是:

data-var5: 9数据变量 5:7数据变量 5:3数据变量 5:9

I am needing some assistance on using python to scrape some data- attributes form a site. I have tried using lxml and requests with no luck and have looked online and I found some articles about using beautiful soup. The only problem is I am not sure how.

Here is what I would like to scrape.

<div class="card-body ">

<div class="card-entry" data-var1="0" data-var2="1" data-var3="20" data-var4="3" data-var5="9">… </div>">
<div class="card-entry" data-var1="1" data-var2="2" data-var3="9" data-var4="2" data-var5="7">… </div>">
<div class="card-entry" data-var1="2" data-var2="3" data-var3="1" data-var4="3" data-var5="3">…</div>
<div class="card-entry" data-var1="3" data-var2="4" data-var3="5" data-var4="2" data-var5="9">…</div> 

I am trying to get the data-var5 value out but I have no idea how. Hope someone can help.

Regards,

Hazza

解决方案

you can use select. you can try it:

from bs4 import BeautifulSoup
html = """
<div class="card-entry" data-var1="0" data-var2="1" data-var3="20" data-var4="3" data-var5="9">… </div>
<div class="card-entry" data-var1="1" data-var2="2" data-var3="9" data-var4="2" data-var5="7">… </div>
<div class="card-entry" data-var1="2" data-var2="3" data-var3="1" data-var4="3" data-var5="3">…</div>
<div class="card-entry" data-var1="3" data-var2="4" data-var3="5" data-var4="2" data-var5="9">…</div> 
"""

soup = BeautifulSoup(html, "lxml")
data_var = soup.select('div[data-var5]')

for data in data_var:
    print("data-var5: " + data['data-var5'])

Output will be:

data-var5: 9
data-var5: 7
data-var5: 3
data-var5: 9

这篇关于从网页中抓取数据属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆