通过从雅虎抓取股票,使用正则表达式在单行上获取多个数据 [英] Using Regex to get multiple data on single line by scraping stocks from yahoo

查看:62
本文介绍了通过从雅虎抓取股票,使用正则表达式在单行上获取多个数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

导入urllib进口重新stock_symbols = ['aapl', 'spy', 'goog', 'nflx', 'msft']对于我在范围内(len(stocks_symbols)):htmlfile = urllib.urlopen("https://finance.yahoo.com/q?s="+stocks_symbols[i])htmltext = htmlfile.read(htmlfile)正则表达式 = '<span id="yfs_l84_' + stock_symbols[i] + '">(.+?)</span>'模式=重新编译(正则表达式)价格 = re.findall(pattern, htmltext)regex1 = '<h2 id="yui_3_9_1_9_(.^?))">(.+?)</h2>'模式 1 = 重新编译(regex1)name1 = re.findall(pattern1, htmltext)打印价格",stocks_symbols[i].upper(),name1,is",price[0]

我猜问题出在regex1

regex1 = '

(.+?)

'

我尝试阅读文档,但无法弄清楚.

在这个程序中,我尝试使用 Stock-Symbol 作为列表的输入来抓取 Stock-NameStock-Price.>

我认为我正在做的是在一个似乎不正确的变量中传递 2 (.+?) .

输出:

回溯(最近一次调用最后一次):文件C:\Py\stock\stocks.py",第 14 行,在 <module> 中.模式 1 = 重新编译(regex1)文件C:\canopy-1.4.0.1938.win-x86\lib\re.py",第190行,编译返回_compile(模式,标志)文件C:\canopy-1.4.0.1938.win-x86\lib\re.py",第 242 行,在 _compile引发错误,v # 无效的表达式错误:没有什么可重复的

解决方案

^ 匹配字符串的开头和之后的 ? 不是合法的正则表达式.如果您将正则表达式更改为 regex1 = '(.+?)' 它应该可以工作.请注意,您还有一个括号太多了.

此外,还有一种更好的方式来获取雅虎的股票信息.您可以使用 YQL 查询很多表(包括股票信息),并且还有一个 YQL-Console 在这里您可以尝试您的查询.

你从那里得到的结果是 JSON 或 XML,它们可以通过一些 python 库很好地处理.

import urllib
import re

stocks_symbols = ['aapl', 'spy', 'goog', 'nflx', 'msft']

for i in range(len(stocks_symbols)):
    htmlfile = urllib.urlopen("https://finance.yahoo.com/q?s=" + stocks_symbols[i])
    htmltext = htmlfile.read(htmlfile)
    regex = '<span id="yfs_l84_' + stocks_symbols[i] + '">(.+?)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern, htmltext)

    regex1 = '<h2 id="yui_3_9_1_9_(.^?))">(.+?)</h2>'
    pattern1 = re.compile(regex1)
    name1 = re.findall(pattern1, htmltext)
    print "Price of", stocks_symbols[i].upper(), name1, "is", price[0]

I guess the problem is in regex1,

regex1 = '<h2 id="yui_3_9_1_9_(.^?))">(.+?)</h2>'

I tried reading documentation but was unable to figure it out.

In this program I trying to scrape Stock-Name and Stock-Price with input of Stock-Symbol as a list.

what I think I am doing is to passing 2 (.+?) in one variable which seems incorrect.

OutPut:

Traceback (most recent call last):
  File "C:\Py\stock\stocks.py", line 14, in <module>
    pattern1 = re.compile(regex1)
  File "C:\canopy-1.4.0.1938.win-x86\lib\re.py", line 190, in compile
    return _compile(pattern, flags)
  File "C:\canopy-1.4.0.1938.win-x86\lib\re.py", line 242, in _compile
    raise error, v # invalid expression
error: nothing to repeat 

解决方案

^ matches the start of a string and a ? after that is not a legal regex. If you change your regex to regex1 = '(.+?)' it should work. Note that you also had one parenthesis too much.

Furthermore there is a better way to get yahoo's stock information. You can query a lot of tables (including stock info) with YQL and there is also a YQL-Console where you can try out your queries.

The result you get from there is JSON or XML, which can be handled pretty good via some python libraries.

这篇关于通过从雅虎抓取股票,使用正则表达式在单行上获取多个数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆