BeautifulSoup提取div中的数据 [英] BeautifulSoup extract data within a div

查看:202
本文介绍了BeautifulSoup提取div中的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经浏览了一个小时的先前问题,并尝试了各种解决方案,但我无法解决这个问题.我已经从一个网站中提取了想要的结果,现在我只需要挖掘这些div以获取想要的特定信息即可.

I've browsed the previous questions for an hour and tried various solutions but I can't get this to work. I've extracted the results I want from a website, now I just have to mine these divs for the specific information I want.

结果是这样隔离的:

items=soup.findAll(id=re.compile("itembase"))

对于每个项目,我想从这段html中提取例如经纬度:

For each item, I want to extract for example the lat and long from this piece of html:

<div id="itembase29" class="result-item -result unselected clearfix even" data-
part="fl_base" data-lat="51.9006" data-lon="-8.51008" data-number="29" 
is-local="true" data-customer="32060963" data-addrid="1" 
data-id="4b00fae498e3cc370133e8a14fd75160">
<div class="arrow">
</div>

我该怎么做?谢谢.

推荐答案

  1. 将您的html对象传递到漂亮的汤中.

  1. Pass your html object into beautiful soup.

soup = BeautifulSoup(html)

  • 查找div.

  • Find the div.

    div = soup.div
    

  • 从div获取您要查找的属性.

  • Get the attributes you're looking for from the div.

    lat, lon = div.attrs['data-lat'], div.attrs['data-lon']
    

  • 打印.

  • Print.

    >>> print lat, lon
    51.9006 -8.51008
    

  • 为了清楚起见,我在其中保留了.attrs方法,但是更笼统地说,您可以像字典一样访问 any 元素的属性,甚至不需要方法,如下所示:div['data-lon'].这显然不适用于div的列表,您需要遍历该列表.

    I left the .attrs method in there for clarity, but in more general terms, you can access the attributes of any element like a dictionary, you don't even really need the .attrs method, like so: div['data-lon']. This obviously doesnt work over a list of divs, you need to iterate over the list.

    for div in divs:
        print div['data-lon'], div['data-lat']
    

    或列表理解.

    [(div['data-lon'], div['data-lat']) for div in divs]
    

    这篇关于BeautifulSoup提取div中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆