如何理解雅虎的原始 HTML!使用 Python 检索数据时的财务? [英] How to understand this raw HTML of Yahoo! Finance when retrieving data using Python?

查看:23
本文介绍了如何理解雅虎的原始 HTML!使用 Python 检索数据时的财务?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试从 Yahoo! 检索股票价格金融,例如 Apple Inc..我的代码是这样的:(使用Python 2)

I've been trying to retrieve stock price from Yahoo! Finance, like for Apple Inc.. My code is like this:(using Python 2)

import requests
from bs4 import BeautifulSoup as bs

html='http://finance.yahoo.com/quote/AAPL/profile?p=AAPL'
r = requests.get(html)
soup = bs(r.text)

问题是当我看到这个网页后面的原始 HTML 时,类是动态的,见下图.这使得 BeautifulSoup 很难获得标签.如何理解类,如何获取数据?

The problem is when I see raw HTML behind this webpage, the class is dynamic, see figure below. This makes it hard for BeautifulSoup to get tags. How to understand the class and how to get data?

雅虎的 HTML!财务页面

PS:1) 我知道pandas_datareader.data,但那是历史数据.我要实时股票数据;

PS: 1) I know pandas_datareader.data, but that's for historical data. I want the real-time stock data;

2) 我不想使用 selenium 打开一个新的浏览器窗口.

2) I don't want to use selenium to open a new browser window.

推荐答案

数据显然是使用 reactjs 填充的,因此您将无法使用类名等可靠地解析它.您可以从 root.App.main 脚本的页面源中获取 json 格式的所有数据:

The data is obviously populated using reactjs so you won't be able to parse it reliably using class names etc.. You can get all the data in json format from the page source from the root.App.main script :

import  requests
from bs4 import BeautifulSoup
import re
from json import loads

soup = BeautifulSoup(requests.get("http://finance.yahoo.com/quote/AAPL/profile?p=AAPL").content)
script = soup.find("script",text=re.compile("root.App.main")).text
data = loads(re.search("root.App.mains+=s+({.*})", script).group(1))
print(data)

它为您提供了大量的 json,您可以浏览数据并选择您需要的内容,如下所示:

Which gives you a whole load of json, you can go through the data and pick what you need like below :

stores = data["context"]["dispatcher"]["stores"]
from  pprint import pprint as pp

pp(stores[u'QuoteSummaryStore']) 

这给了你:

{u'price': {u'averageDailyVolume10Day': {u'fmt': u'63.06M',
                                         u'longFmt': u'63,056,525',
                                         u'raw': 63056525},
            u'averageDailyVolume3Month': {u'fmt': u'36.53M',
                                          u'longFmt': u'36,527,196',
                                          u'raw': 36527196},
            u'currency': u'USD',
            u'currencySymbol': u'$',
            u'exchange': u'NMS',
            u'exchangeName': u'NasdaqGS',
            u'longName': u'Apple Inc.',
            u'marketState': u'PRE',
            u'maxAge': 1,
            u'openInterest': {},
            u'postMarketChange': {u'fmt': u'0.11', u'raw': 0.11000061},
            u'postMarketChangePercent': {u'fmt': u'0.10%',
                                         u'raw': 0.0009687416},
            u'postMarketPrice': {u'fmt': u'113.66', u'raw': 113.66},
            u'postMarketSource': u'DELAYED',
            u'postMarketTime': 1474502277,
            u'preMarketChange': {u'fmt': u'0.42', u'raw': 0.41999817},
            u'preMarketChangePercent': {u'fmt': u'0.37%',
                                        u'raw': 0.0036987949},
            u'preMarketPrice': {u'fmt': u'113.97', u'raw': 113.97},
            u'preMarketSource': u'FREE_REALTIME',
            u'preMarketTime': 1474536411,
            u'quoteType': u'EQUITY',
            u'regularMarketChange': {u'fmt': u'-0.02', u'raw': -0.019996643},
            u'regularMarketChangePercent': {u'fmt': u'-0.02%',
                                            u'raw': -0.00017607327},
            u'regularMarketDayHigh': {u'fmt': u'113.99', u'raw': 113.989},
            u'regularMarketDayLow': {u'fmt': u'112.44', u'raw': 112.441},
            u'regularMarketOpen': {u'fmt': u'113.82', u'raw': 113.82},
            u'regularMarketPreviousClose': {u'fmt': u'113.57',
                                            u'raw': 113.57},
            u'regularMarketPrice': {u'fmt': u'113.55', u'raw': 113.55},
            u'regularMarketSource': u'FREE_REALTIME',
            u'regularMarketTime': 1474488000,
            u'regularMarketVolume': {u'fmt': u'31.57M',
                                     u'longFmt': u'31,574,028.00',
                                     u'raw': 31574028},
            u'shortName': u'Apple Inc.',
            u'strikePrice': {},
            u'symbol': u'AAPL',
            u'underlyingSymbol': None},
 u'price,summaryDetail': {},
 u'quoteType': {u'exchange': u'NMS',
                u'headSymbol': None,
                u'longName': u'Apple Inc.',
                u'market': u'us_market',
                u'messageBoardId': u'finmb_24937',
                u'quoteType': u'EQUITY',
                u'shortName': u'Apple Inc.',
                u'symbol': u'AAPL',
                u'underlyingExchangeSymbol': None,
                u'underlyingSymbol': None,
                u'uuid': u'8b10e4ae-9eeb-3684-921a-9ab27e4d87aa'},
 u'summaryDetail': {u'ask': {u'fmt': u'114.00', u'raw': 114},
                    u'askSize': {u'fmt': u'100',
                                 u'longFmt': u'100',
                                 u'raw': 100},
                    u'averageDailyVolume10Day': {u'fmt': u'63.06M',
                                                 u'longFmt': u'63,056,525',
                                                 u'raw': 63056525},
                    u'averageVolume': {u'fmt': u'36.53M',
                                       u'longFmt': u'36,527,196',
                                       u'raw': 36527196},
                    u'averageVolume10days': {u'fmt': u'63.06M',
                                             u'longFmt': u'63,056,525',
                                             u'raw': 63056525},
                    u'beta': {u'fmt': u'1.52', u'raw': 1.51744},
                    u'bid': {u'fmt': u'113.68', u'raw': 113.68},
                    u'bidSize': {u'fmt': u'400',
                                 u'longFmt': u'400',
                                 u'raw': 400},
                    u'dayHigh': {u'fmt': u'113.99', u'raw': 113.989},
                    u'dayLow': {u'fmt': u'112.44', u'raw': 112.441},
                    u'dividendRate': {u'fmt': u'2.28', u'raw': 2.28},
                    u'dividendYield': {u'fmt': u'2.01%', u'raw': 0.0201},
                    u'exDividendDate': {u'fmt': u'2016-08-04',
                                        u'raw': 1470268800},
                    u'expireDate': {},
                    u'fiftyDayAverage': {u'fmt': u'108.61',
                                         u'raw': 108.608284},
                    u'fiftyTwoWeekHigh': {u'fmt': u'123.82', u'raw': 123.82},
                    u'fiftyTwoWeekLow': {u'fmt': u'89.47', u'raw': 89.47},
                    u'fiveYearAvgDividendYield': {},
                    u'forwardPE': {u'fmt': u'12.70', u'raw': 12.701344},
                    u'marketCap': {u'fmt': u'611.86B',
                                   u'longFmt': u'611,857,399,808',
                                   u'raw': 611857399808},
                    u'maxAge': 1,
                    u'navPrice': {},
                    u'open': {u'fmt': u'113.82', u'raw': 113.82},
                    u'openInterest': {},
                    u'payoutRatio': {u'fmt': u'24.80%', u'raw': 0.248},
                    u'previousClose': {u'fmt': u'113.57', u'raw': 113.57},
                    u'priceToSalesTrailing12Months': {u'fmt': u'2.78',
                                                      u'raw': 2.777534},
                    u'regularMarketDayHigh': {u'fmt': u'113.99',
                                              u'raw': 113.989},
                    u'regularMarketDayLow': {u'fmt': u'112.44',
                                             u'raw': 112.441},
                    u'regularMarketOpen': {u'fmt': u'113.82', u'raw': 113.82},
                    u'regularMarketPreviousClose': {u'fmt': u'113.57',
                                                    u'raw': 113.57},
                    u'regularMarketVolume': {u'fmt': u'31.57M',
                                             u'longFmt': u'31,574,028',
                                             u'raw': 31574028},
                    u'strikePrice': {},
                    u'totalAssets': {},
                    u'trailingAnnualDividendRate': {u'fmt': u'2.13',
                                                    u'raw': 2.13},
                    u'trailingAnnualDividendYield': {u'fmt': u'1.88%',
                                                     u'raw': 0.018754954},
                    u'trailingPE': {u'fmt': u'13.24', u'raw': 13.240438},
                    u'twoHundredDayAverage': {u'fmt': u'102.39',
                                              u'raw': 102.39367},
                    u'volume': {u'fmt': u'31.57M',
                                u'longFmt': u'31,574,028',
                                u'raw': 31574028},
                    u'yield': {},
                    u'ytdReturn': {}},
 u'symbol': u'AAPL'}

这篇关于如何理解雅虎的原始 HTML!使用 Python 检索数据时的财务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆