网页抓取futbin.com [英] Web-scraping futbin.com

查看:131
本文介绍了网页抓取futbin.com的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从futbin.com收集包含FIFA最终团队球员的时间序列数据的数据集.我在GitHub https://www.futbin.com/19/player/143/Cristiano%20Ronaldo/

我尝试了一些操作,但似乎无法解析/提取此信息...有人可以帮我还是给我提示?预先感谢

解决方案

很难以这种方式获取数据.如果您检查浏览器网络工具,则可以看到创建图表的数据来自http请求.当然不要滥用它.

 导入请求从datetime导入datetimeplayer_ids = {'Arturo Vidal':181872,'Pierre-Emerick Aubameyang':188567,罗伯特·莱万多夫斯基(Robert Lewandowski):188545,'杰罗姆·博阿滕(Jerome Boateng):183907,'Sergio Ramos':155862,安东尼·格里兹曼(Antoine Griezmann):194765,大卫·阿拉巴(David Alaba):197445,保罗·迪巴拉(Paulo Dybala):211110,拉贾·宁格兰(Radja Nainggolan):178518}对于player_ids.items()中的(name,id):r = request.get('https://www.futbin.com/19/playerGraph?type=daily_graph&year=19&player= {0}'.format(id))数据= r.json()打印(名称)打印(-" * 20)#将ps更改为xbox或pc,以获取其他价格对于数据中的价格['ps']:#响应中有额外的零.date = datetime.utcfromtimestamp(price [0]/1000).strftime('%Y-%m-%d')价格=价格[1]打印(日期,价格) 

这会给你

  Arturo Vidal--------------------2018-09-21 84502018-09-22 93182018-09-23 108202018-09-24 132882018-09-25 133462018-09-26 172352018-09-27 190922018-09-28 159602018-09-29 142832018-09-30 149672018-10-01 153802018-10-02 153672018-10-03 13192皮埃尔·埃默里克·奥巴梅扬--------------------2018-09-21 1360002018-09-22 1606732018-09-23 2054742018-09-24 2163442018-09-25 2447502018-09-26 2770072018-09-27 2886592018-09-28 2590072018-09-29 2617992018-09-30 2707712018-10-01 2742452018-10-02 2810572018-10-03 275606罗伯特·莱万多夫斯基--------------------2018-09-21 730002018-09-22 799612018-09-23 948272018-09-24 1178932018-09-25 1253102018-09-26 1446302018-09-27 1592242018-09-28 1351222018-09-29 1326962018-09-30 1377282018-10-01 1431302018-10-02 1509682018-10-03 144250 

列表继续.

I am trying to collect a dataset with time series data of FIFA ultimate team players from futbin.com. I have found a script on GitHub https://github.com/darkyin87/futbin-scraper which is able to scrape the current price of a player given a list of players/ids:

import requests  
import json  

domain = 'https://www.futbin.com'  
version = 19  
page = 'playerPrices'  

player_ids = {  
  'Arturo Vidal': 181872,  
  'Pierre-Emerick Aubameyang': 188567,  
  'Robert Lewandowski': 188545,  
  'Jerome Boateng': 183907,  
  'Sergio Ramos': 155862,  
  'Antoine Griezmann': 194765,  
  'David Alaba': 197445,  
  'Paulo Dybala': 211110,  
  'Radja Nainggolan': 178518  
}

def fetch_prices():  
 ret_val = {}  
  for name, id in player_ids.iteritems():  
    url = "%s/%s/%s?player=%s" % (domain, version, page, id)  
    response = requests.get(url)  
    data = response.json()  
    ret_val[name] = data[str(id)]['prices']['ps']['LCPrice']  
  return ret_val  

if __name__ == "__main__":  
  prices = fetch_prices()  

fetch_prices  

But the information I am looking for is not the current price but rather the price (specifically the PS price) history which is located on the bottom as I graph. https://www.futbin.com/19/player/143/Cristiano%20Ronaldo/

I tried a few things but I seem to be unable to parse/extract this information... could someone help me out or give me a hint? Thanks in advance

解决方案

It is hard to get data that way. If you check your browser network tools you can see the data that creates chart comes from http request. Don't abuse it of course.

import requests
from datetime import datetime

player_ids = {  
  'Arturo Vidal': 181872,  
  'Pierre-Emerick Aubameyang': 188567,  
  'Robert Lewandowski': 188545,  
  'Jerome Boateng': 183907,  
  'Sergio Ramos': 155862,  
  'Antoine Griezmann': 194765,  
  'David Alaba': 197445,  
  'Paulo Dybala': 211110,  
  'Radja Nainggolan': 178518  
}

for (name,id) in player_ids.items():
    r = requests.get('https://www.futbin.com/19/playerGraph?type=daily_graph&year=19&player={0}'.format(id))
    data = r.json()

    print(name)   
    print("-"*20)
    #Change ps to xbox or pc to get other prices
    for price in data['ps']:
        #There is extra zeroes in response.
        date = datetime.utcfromtimestamp(price[0] / 1000).strftime('%Y-%m-%d')
        price = price[1]
        print(date,price)

This will give you

Arturo Vidal
--------------------
2018-09-21 8450
2018-09-22 9318
2018-09-23 10820
2018-09-24 13288
2018-09-25 13346
2018-09-26 17235
2018-09-27 19092
2018-09-28 15960
2018-09-29 14283
2018-09-30 14967
2018-10-01 15380
2018-10-02 15367
2018-10-03 13192
Pierre-Emerick Aubameyang
--------------------
2018-09-21 136000
2018-09-22 160673
2018-09-23 205474
2018-09-24 216344
2018-09-25 244750
2018-09-26 277007
2018-09-27 288659
2018-09-28 259007
2018-09-29 261799
2018-09-30 270771
2018-10-01 274245
2018-10-02 281057
2018-10-03 275606
Robert Lewandowski
--------------------
2018-09-21 73000
2018-09-22 79961
2018-09-23 94827
2018-09-24 117893
2018-09-25 125310
2018-09-26 144630
2018-09-27 159224
2018-09-28 135122
2018-09-29 132696
2018-09-30 137728
2018-10-01 143130
2018-10-02 150968
2018-10-03 144250

And the list goes on.

这篇关于网页抓取futbin.com的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆