如何使用Python抓取实时流数据? [英] How to scrape real time streaming data with Python?

查看:519
本文介绍了如何使用Python抓取实时流数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取此网页的航班数量 https://www.flightradar24.com/56.16,-49.51

I was trying to scrape the number of flights for this webpage https://www.flightradar24.com/56.16,-49.51

该数字在下图中突出显示:

The number is highlighted in the picture below:

该数字每8秒更新一次.

The number is updated every 8 seconds.

这是我对BeautifulSoup进行的尝试:

This is what I tried with BeautifulSoup:

import requests
from bs4 import BeautifulSoup
import time

r=requests.get("https://www.flightradar24.com/56.16,-49.51")
c=r.content
soup=BeautifulSoup(c,"html.parser")
value=soup.find_all("span",{"class":"choiceValue"})
print(value)

但这总是返回0:

[<span class="choiceValue" id="menuPlanesValue">0</span>]

查看源也显示0,所以我理解了BeautifulSoup为什么也返回0.

View source also shows 0, so I understand why BeautifulSoup returns 0 too.

有人知道其他任何方法来获取当前值吗?

Anyone know any other method to get the current value?

推荐答案

您的方法存在的问题是页面首先加载视图,然后执行常规请求以刷新页面.如果您查看Chrome开发者控制台中的网络"标签(例如),则会看到对

The problem with your approach is that the page first loads a view, then performs regular requests to refresh the page. If you look at the network tab in the developer console in Chrome (for example), you'll see the requests to https://data-live.flightradar24.com/zones/fcgi/feed.js?bounds=59.09,52.64,-58.77,-47.71&faa=1&mlat=1&flarm=1&adsb=1&gnd=1&air=1&vehicles=1&estimated=1&maxage=7200&gliders=1&stats=1

响应是常规的json:

The response is regular json:

{
  "full_count": 11879,
  "version": 4,
  "afefdca": [
    "A86AB5",
    56.4288,
    -56.0721,
    233,
    38000,
    420,
    "0000",
    "T-F5M",
    "B763",
    "N641UA",
    1473852497,
    "LHR",
    "ORD",
    "UA929",
    0,
    0,
    "UAL929",
    0
  ],
  ...
  "aff19d9": [
    "A12F78",
    56.3235,
    -49.3597,
    251,
    36000,
    436,
    "0000",
    "F-EST",
    "B752",
    "N176AA",
    1473852497,
    "DUB",
    "JFK",
    "AA291",
    0,
    0,
    "AAL291",
    0
  ],
  "stats": {
    "total": {
      "ads-b": 8521,
      "mlat": 2045,
      "faa": 598,
      "flarm": 152,
      "estimated": 464
    },
    "visible": {
      "ads-b": 0,
      "mlat": 0,
      "faa": 6,
      "flarm": 0,
      "estimated": 3
    }
  }
}

我不确定此API是否受到任何保护,但是看起来我可以使用curl毫无问题地访问它.

I'm not sure if this API is protected in any way, but it seems like I can access it without any issues using curl.

更多信息:

  • aviation.stackexchange - Is there an API to get real-time FAA flight data?
  • Flightradar24 Forum - API access (meaning your use case is probably discouraged)

这篇关于如何使用Python抓取实时流数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆