BeautifulSoup：刮痧有源$ C $ C组相同的属性不同的数据集 [英] BeautifulSoup: Scraping different data sets having same set of attributes in the source code

查看：186 发布时间：2016/8/5 19:19:24 python python-2.7 beautifulsoup python-requests web-mining

本文介绍了BeautifulSoup：刮痧有源$ C $ C组相同的属性不同的数据集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用的是从一个Twitter帐户的追随者刮的总数和鸣叫总数的 BeautifulSoup 模块。然而，当我试图检查各自领域的内容网页上，我发现，无论是场被封闭内同一组HTML属性：

I'm using the BeautifulSoup module for scraping the total number of followers and total number of tweets from a Twitter account. However, when I tried inspecting the elements of the respective fields on the web page, I found that both the fields are enclosed inside same set of html attributes:

关注

<a class="ProfileNav-stat ProfileNav-stat--link u-borderUserColor u-textCenter js-tooltip js-nav u-textUserColor" data-nav="followers" href="/IAmJericho/followers" data-original-title="2,469,681 Followers">
          <span class="ProfileNav-label">Followers</span>
          <span class="ProfileNav-value" data-is-compact="true">2.47M</span>
</a>

分享Tweet计数

Tweet count

    <a class="ProfileNav-stat ProfileNav-stat--link u-borderUserColor u-textCenter js-tooltip js-nav" data-nav="tweets" tabindex="0" data-original-title="21,769 Tweets">
                <span class="ProfileNav-label">Tweets</span>
                <span class="ProfileNav-value" data-is-compact="true">21.8K</span>
</a>

这是我写的剧本开采：

import requests
import urllib2
from bs4 import BeautifulSoup

link = "https://twitter.com/iamjericho"
r = urllib2.urlopen(link)
src = r.read()
res = BeautifulSoup(src)
followers = ''
for e in res.findAll('span', {'data-is-compact':'true'}):
    followers = e.text

print followers

然而，由于两者的价值，总鸣叫计数和追随者的总数被封闭同一组HTML里面的属性，即范围内标记类=ProfileNav价值和数据是紧凑型=真正的，我只得到了总数的结果追随者数量返回运行上面的脚本。

However, since the values of both, the total tweet count and total number of followers are enclosed inside same set of HTML attributes, ie inside a span tag with class = "ProfileNav-value" and data-is-compact = "true", I only get the results of the total number of followers returned by running the above script.

怎么可能提取两组信息封闭的类似HTML从BeautifulSoup属性？在

How could I possibly extract two sets of information enclosed inside similar HTML attributes from BeautifulSoup?

推荐答案

在此情况下，一个方法去实现它，是检查数据是紧凑型=真正的仅出现两次，每次要提取每一块数据，并且你也知道，鸣叫是第一和追随者第二，这样你就可以在同一顺序的标题列表，并使用拉链来加入他们的元组在同一时间同时打印，如：

In this case, one way to achieve it, is to check that data-is-compact="true" only appears twice for each piece of data you want to extract, and also you know that tweets is first and followers second, so you can have a list with those titles in same order and use a zip to join them in a tuple to print both at same time, like:

import urllib2
from bs4 import BeautifulSoup

profile = ['Tweets', 'Followers']

link = "https://twitter.com/iamjericho"
r = urllib2.urlopen(link)
src = r.read()
res = BeautifulSoup(src)
followers = ''
for p, d in zip(profile, res.find_all('span', { 'data-is-compact': "true"})):
    print p, d.text

它产生的：

Tweets 21,8K                                                                                                                                                                                                                                                                   
Followers 2,47M

这篇关于BeautifulSoup：刮痧有源$ C $ C组相同的属性不同的数据集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BeautifulSoup：刮痧有源$ C $ C组相同的属性不同的数据集 [英] BeautifulSoup: Scraping different data sets having same set of attributes in the source code

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

BeautifulSoup：刮痧有源$ C ​​$ C组相同的属性不同的数据集 [英] BeautifulSoup: Scraping different data sets having same set of attributes in the source code

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

BeautifulSoup：刮痧有源$ C $ C组相同的属性不同的数据集 [英] BeautifulSoup: Scraping different data sets having same set of attributes in the source code

登录关闭