使用python + beautifulSoup4从动态图形刮数据 [英] scraping data from a dynamic graph using python+beautifulSoup4

查看:280
本文介绍了使用python + beautifulSoup4从动态图形刮数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要实现数据从动态图形刮任务,并提取数据。图为随时间类似,如果你看一个公司的股票的图表,你会发现什么更新。我使用的请求,并在python beautifulsoup4库,但我只是想出如何刮文本和链接数据。似乎无法弄清楚如何我可以得到图的值到CSV文件

I need to implement a data scraping task and extract data from a dynamic graph. The graph is update with time similar to what you would find if you look at the graph of a company's stock. I am using the requests and beautifulsoup4 library in python but I have only figured out how to scrape text and links data. Can't seem to figure out how i can get the values of the graph into a csv file

问题的图表可以发现 - 的http://www.apptrace.com/app/instagram/id389801252/ranks/topfreeapplications/36

The graph in question can be found at - http://www.apptrace.com/app/instagram/id389801252/ranks/topfreeapplications/36

推荐答案

@Oliver W.已经提供了一个很好的答案,但使用要求链接这里),避免了要注意网络呼叫和整体是一个好得多的包比的urllib

@Oliver W. provided a good answer already, but using requests (link here) avoids having to note the network call and is overall a much nicer package than urllib.

如果你想多一点灵活的与code,你可以写一个函数,它的国名,并开始和结束日期。

If you wanna be a bit more flexible with your code, you can write a function that takes the country name and start and end date.

import requests
import pandas as pd
import json

def load_data(country='', start_date='2014-08-09', end_date='2014-11-1'):
    base = "http://www.apptrace.com/api/app/389801252/rankings/country/"
    extra = "?country={0}&start_date={1}&end_date={2}&device=iphone&list_type=normal&chart_subtype=iphone"
    addr = base + extra.format(country, start_date, end_date)

    page = requests.get(addr)
    json_data = page.json() #gets the json data from the page
    ranks = json_data['rankings'][0]['ranks']
    ranks = json.dumps(ranks)  #Ensures it has valid json format
    df = pd.read_json(ranks, orient='records')
    return df

在网页中改变的东西,看看有什么其他价值可以从国家得到(加拿大是'可以'为例)。空字符串是美国。

Change things in the webpage to see what other values you can get from country (Canada is 'CAN' for example). The empty string is for the USA.

东风看起来像这样

    date        rank
0   2014-08-09  10
1   2014-08-10  10
2   2014-08-11  9
3   2014-08-12  8
4   2014-08-13  8
5   2014-08-14  7
6   2014-08-15  6
7   2014-08-16  8

在手的大熊猫数据帧,可以导出到 CSV 或导出之前,结合大量的dataframes

With the pandas dataframe in hand, you can export to csvor combine many dataframes before you export

df = load_data()
df.to_csv("file_name.csv")

这篇关于使用python + beautifulSoup4从动态图形刮数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆