Python web-socket 解释(读取)数据 [英] Python web-socket interpreting(reading) Data

查看:48
本文介绍了Python web-socket 解释(读取)数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试试验并了解有关用于网络抓取的套接字的更多信息.

Trying to experiment and learn more about sockets for web-scraping.

我正在尝试通过 WebSockets 从网站流式传输信息.我能够接收数据,但想知道从中读取和解释传入数据的正确方法是什么.

I am trying to stream information from a website via WebSockets. I was able to receive data but was wondering what would be the correct approach to read and interpret incoming data from it.

我使用的是 Python 3.7.我能够使用 这是我正在使用的代码:

I am using Python 3.7. I was able to set up the connection using an example from https://towardsdatascience.com/scraping-in-another-dimension-7c6890a156da I am trying to get some stock price data to display on https://finance.yahoo.com/quote/BTC-USD/chart via sockets. This is the code I am using:

import websocket
import json
from websocket import create_connection


headers = json.dumps({
    'Accept-Encoding':'gzip deflat,br',
    'Accept-Language':'en-US,en;q=0.9,zh-TW;q=0.8,zh;q=0.7,zh-CN;q=0.6',
    'Cache-Control': 'no-cache',
    'Connection': 'Upgrade',

    'Host': 'streamer.finance.yahoo.com',
    'Origin': 'https://finance.yahoo.com',
    'Pragma': 'no-cache',
    'Sec-WebSocket-Extensions': 'permessage-deflate; client_max_window_bits',
    'Sec-WebSocket-Key': 'VW2m4Lw2Rz2nXaWO10kxhw==',
    'Sec-WebSocket-Version': '13',
    'Upgrade': 'websocket',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36'
    })

ws = create_connection('wss://streamer.finance.yahoo.com/',headers=headers)

ws.send('{"subscribe":["^GSPC","^DJI","^IXIC","^RUT","CL=F","GC=F","SI=F","EURUSD=X","^TNX","^VIX","GBPUSD=X","JPY=X","BTC-USD","^FTSE","^N225"]}')


while True:
    result  = ws.recv()
    print(result)

ws.close()

这让我得到了这样的结果:

which allows me to get results like these:

CgReREpJFebCzkYYwJHv8LZbKgNESkkwCTgBRWYd6D5I7tDaigFlAOHuQtgBBA==
CgVKUFk9WBUX2ddCGMCR7/C2WyoDQ0NZMA44AUUVH9w+ZQCM7D7YAQg=
CghFVVJVU0Q9WBVA2Yw/GMCR7/C2WyoDQ0NZMA44AUXuDJI+ZQAgTTvYAQg=
CghHQlBVU0Q9WBUQO58/GMCR7/C2WyoDQ0NZMA44AUXz/fY/ZcDrwDzYAQg=
CgReVklYFYXrkUEYgKOB8LZbKgNXQ0IwCTgBRcRWCcBlwMzMvtgBBA==
CghHQlBVU0Q9WBUVOp8/GJCh7/C2WyoDQ0NZMA44AUWcrfY/ZQCtwDzYAQg=
CgVKUFk9WBUv3ddCGJCh7/C2WyoDQ0NZMA44AUVQ7t8+ZQCk8D7YAQg=
CghFVVJVU0Q9WBU424w/GJCh7/C2WyoDQ0NZMA44AUWi2pQ+ZQAQUTvYAQg=

不确定如何解释我收到的数据,或者网络浏览器如何解释这些数据.似乎浏览器正在接收与我相同的数据.

Not sure how to interpret the data I am receiving, or how the web browser interprets this data. It seems to be that the browser is receiving the same data that I am though.

推荐答案

我猜这是 Protobuf 编码的数据.您可以通过查看雅虎财经页面的 Javascript 源代码看到,一旦订阅了股票代码,回复就会由解码程序处理.

My guess is that this is Protobuf encoded data. You can see by looking at the Javascript source code for the yahoo finance page, once a ticker has been subscribed, the replies are handled by a decoding routine.

https://finance.yahoo.com/__finStreamer-worker.js

... 在下面的代码片段中,有一个从 base64 文本到字节,然后到 Javascript 对象(类型为 PricingData)的清晰转换.请注意 protobuf 的提及.

... in following snippet, there is a clear conversion from the base64 text to bytes and then to a Javascript object (of type PricingData). Note the mention of protobuf.

QuoteStreamer.prototype.handleWebSocketUpdate = function (event) {
    try {
        var PricingData = protobuf.roots.default.quotefeeder.PricingData;
        var buffer = base64ToArray(event.data); // decode from base 64
        var data = PricingData.decode(buffer); // Decode using protobuff
        data = PricingData.toObject(data, { // Convert to a JS object
            enums: String
        });

您接下来需要弄清楚的是 Yahoo 使用的 Protobuf 模式(然后允许您在 Python 中生成解码器),但我不确定它是否公开.但是,您可以检查他们为执行解码而生成的实际 Protobuf Javascript 代码,并尝试直接将其复制到 Python 中,或者对 protobuf 架构进行猜测.

What you next need to figure out is the Protobuf schema used by Yahoo (which then allows you to generate a decoder in Python), but I'm not sure it is public. However you can inspect the actual Protobuf Javascript code they generated to perform the decoding, and try to directly copy it in Python, or make a guess at the protobuf schema.

Javascript 解码器在这里:https://finance.yahoo.com/__finStreamer-proto.js

The Javascript decoder is here: https://finance.yahoo.com/__finStreamer-proto.js

这篇关于Python web-socket 解释(读取)数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆