使用 Python 3 从雅虎财经网站检索股票信息 [英] Using Python 3 to Retrieve Stock Information From Yahoo Finance Site

查看:65
本文介绍了使用 Python 3 从雅虎财经网站检索股票信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试移植一个脚本,该脚本将从 Yahoo Finance 网站请求基本数据,但我想查找特定项目而不是整个报告,例如市账率.所以,我遵循了 Sentdex 的教程,了解如何做到这一点.问题是示例代码是为 Python 2.7 编写的,我正在尝试使其适用于 Python 3,当然还通过添加更多功能来扩展它.

这是目前为止的样子:

导入时间导入 urllib导入 urllib.requestsp500short = ['a', 'aa', 'aapl', 'abbv', 'abc', 'abt', 'ace', 'aci', 'acn', 'act', 'adbe', 'adi', 'adm', 'adp']def yahooKeyStats(股票):尝试:sourceCode = urllib.request.urlopen('http://finance.yahoo.com/q/ks?s='+stock).read()pbr = sourceCode.split('Price/Book (mrq):</td><td class="yfnc_tabledata1">')[1].split('</td>')[0]打印('市帐率:'),股票,pbr除了作为 e 的例外:打印('在主循环中失败'),str(e)对于 sp500short 中的每只股票:yahooKeyStats(eachStock)时间.睡眠(1)

我几乎可以肯定问题出在 pbr 变量定义上,在它的拆分部分.该:

 价格/图书 (mrq):</td><td class="yfnc_tabledata1">

还有……:

...只是一种分隔符,因为我正在寻找的实际值介于上面列出的这两个项目之间.但是,到目前为止,它只在执行时给我异常消息.

任何帮助将不胜感激.干杯,

解决方案

看起来 urllib.request.urlopen.read() 返回的数据类型为 <代码>字节.

来自 python 文档:

<块引用>

注意 urlopen 返回一个字节对象.这是因为 urlopen 无法自动确定它从 http 服务器接收的字节流的编码.通常,程序一旦确定或猜测了适当的编码,就会将返回的字节对象解码为字符串.

split 方法在这里失败.尝试在 .read() 之后附加 .decode().问题是您正试图用字符串拆分 sourceCode 类型的 bytes 变量.解码 sourceCode 会将其从字节转换为字符串.或者,您可以 .encode() 两个分隔符.

bytes.decode

I have been trying to port a script which will request fundamental data from Yahoo Finance site, but I would like to look for specific items instead of the entire reports, like price to book ratios, for example. So, I have followed a tutorial from Sentdex on how to do that. The problem is that the example code is written for Python 2.7 and I am trying to make that work for Python 3, and of course expand on it by adding more features.

Here is how it is looking so far:

import time
import urllib
import urllib.request


sp500short = ['a', 'aa', 'aapl', 'abbv', 'abc', 'abt', 'ace', 'aci', 'acn', 'act', 'adbe', 'adi', 'adm', 'adp']


def yahooKeyStats(stock):

    try:
        sourceCode = urllib.request.urlopen('http://finance.yahoo.com/q/ks?s='+stock).read()
        pbr = sourceCode.split('Price/Book (mrq):</td><td class="yfnc_tabledata1">')[1].split('</td>')[0]       
        print ('price to book ratio:'),stock,pbr

    except Exception as e:
        print ('failed in the main loop'),str(e)


for eachStock in sp500short:
    yahooKeyStats(eachStock)
    time.sleep(1)

I'm almost sure the problem is on the pbr variable definition, on the splitting part of it. The:

 Price/Book (mrq):</td><td class="yfnc_tabledata1">

And...:

</td>

...are just sort of delimiters as what I'm looking for, the actual value, is in between those two items listed above.But, so far it is only giving me the exception message when executing it.

Any help will be much appreciated. Cheers,

解决方案

It looks like urllib.request.urlopen and .read() is returning data with type bytes.

From the python docs:

Note that urlopen returns a bytes object. This is because there is no way for urlopen to automatically determine the encoding of the byte stream it receives from the http server. In general, a program will decode the returned bytes object to string once it determines or guesses the appropriate encoding.

The split method is failing here. Try appending .decode() after .read(). The issue is that you are trying to split the sourceCode variable which is of type bytes by a string. Decoding sourceCode will convert it from bytes to string. Alternatively, you could .encode() both of your delimiters.

bytes.decode

这篇关于使用 Python 3 从雅虎财经网站检索股票信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆