来自Yahoo的python lxml etree applet信息 [英] python lxml etree applet information from yahoo

查看:81
本文介绍了来自Yahoo的python lxml etree applet信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

雅虎财经更新了他们的网站.我有一个用于提取分析师建议的lxml/etree脚本.但是,现在有分析师的建议,但只是作为图形显示.您可以在此页面上查看示例.右侧列上的建议趋势"图显示了分析师报告的数量,这些报告显示了强劲的买入,买入,持有,跑输大盘和卖出情况.

Yahoo finance updated their website. I had an lxml/etree script that used to extract the analyst recommendations. Now, however, the analyst recommendations are there, but only as a graphic. You can see an example on this page. The graph called Recommendation Trends on the right hand column shows the number of analyst reports showing strong buy, buy, hold, underperform, and sell.

我的猜测是,雅虎将在未来一段时间内对页面进行一些调整,但是让我怀疑是否可以以任何合理的方式提取此类数据?

My guess is that yahoo will make a few adjustments to the page over the coming little while, but it got me wondering whether such data was extractable in any reasonable way?

  1. 我的意思是,有没有一种方法可以使图形与之配合使用?
  2. 即使成功,也有合理的方法从图形中提取数据吗?

我以前是这样获得资源的:

I used to get the source like this:

url = 'https://finance.yahoo.com/quote/'+code+'/analyst?p='+code
tree = etree.HTML(urllib.request.urlopen(url).read())

,然后在html树中找到数据.但这显然是不可能的.

and then find the data in the html tree. But obviously that's impossible now.

推荐答案

正如评论所说,它们已移至ReactJS,因此lxml不再重要,因为HTML页面中没有数据.现在,您需要环顾四周,找到他们从中提取数据的端点.如果有推荐趋势,那么就在那里.

As comments say they have moved to ReactJS, so lxml is no longer to the point because there's no data in the HTML page. Now you need to look around and find the endpoint where they are pulling the data from. In case of Recommendation Trends it's there.

#!/usr/bin/env python3


import json
from pprint import pprint
from urllib.request import urlopen
from urllib.parse import urlencode


def parse():
    host   = 'https://query2.finance.yahoo.com'
    path   = '/v10/finance/quoteSummary/CSX'
    params = {
        'formatted' : 'true',
        'lang'      : 'en-US',
        'region'    : 'US',
        'modules'   : 'recommendationTrend'
    }

    response = urlopen('{}{}?{}'.format(host, path, urlencode(params)))
    data = json.loads(response.read().decode())

    pprint(data)


if __name__ == '__main__':
    parse()

输出看起来像这样.

{
  'quoteSummary': {
    'error': None,
    'result': [{
      'recommendationTrend': {
        'maxAge': 86400,
        'trend': [{
            'buy': 0,
            'hold': 0,
            'period': '0w',
            'sell': 0,
            'strongBuy': 0,
            'strongSell': 0
          },
          {
            'buy': 0,
            'hold': 0,
            'period': '-1w',
            'sell': 0,
            'strongBuy': 0,
            'strongSell': 0
          },
          {
            'buy': 5,
            'hold': 12,
            'period': '0m',
            'sell': 2,
            'strongBuy': 6,
            'strongSell': 1
          },
          {
            'buy': 5,
            'hold': 12,
            'period': '-1m',
            'sell': 2,
            'strongBuy': 7,
            'strongSell': 1
          },
          {
            'buy': 6,
            'hold': 11,
            'period': '-2m',
            'sell': 2,
            'strongBuy': 8,
            'strongSell': 1
          },
          {
            'buy': 6,
            'hold': 11,
            'period': '-3m',
            'sell': 2,
            'strongBuy': 8,
            'strongSell': 1
          }]
        }
    }]
  }
}

如何查找数据

我所做的大致是:

How to look for data

What I did was roughly:

  1. 在目标窗口小部件中找到一些唯一标记(例如图表值或趋势字符串)
  2. 页面的开源(使用HTML和JS的某种格式程序,例如)
  3. 在此处查找令牌(在第三页中,该部分以/* -- Data -- */开头)
  4. 搜索".js"以获取脚本标签(或程序化包含物,例如require.js)并在那里查找令牌
  5. 在Firebug或Chromium Developer Tools中打开网络"标签,并检查XHR请求
  6. 然后使用邮递员(或卷曲(如果您喜欢使用终端机))剥离额外的参数,并查看端点的反应
  1. Find some unique token in the target widget (say chart value or Trend string)
  2. Open source of the page (use some formatter for HTML and JS, e.g. this)
  3. Look for the token there (in the the page three is section that starts with /* -- Data -- */)
  4. Search for ".js" to get script tags (or programmatic inclusions, e.g. require.js) and look for token there
  5. Open network tab in Firebug or Chromium Developer Tools and inspect XHR requests
  6. Then use Postman (or curl if you prefer terminal) to strip extra parameters and see how the endpoint reacts

这篇关于来自Yahoo的python lxml etree applet信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆