通过从雅虎抓取股票,使用正则表达式在单行上获取多个数据 [英] Using Regex to get multiple data on single line by scraping stocks from yahoo
问题描述
导入urllib进口重新stock_symbols = ['aapl', 'spy', 'goog', 'nflx', 'msft']对于我在范围内(len(stocks_symbols)):htmlfile = urllib.urlopen("https://finance.yahoo.com/q?s="+stocks_symbols[i])htmltext = htmlfile.read(htmlfile)正则表达式 = '<span id="yfs_l84_' + stock_symbols[i] + '">(.+?)</span>'模式=重新编译(正则表达式)价格 = re.findall(pattern, htmltext)regex1 = '<h2 id="yui_3_9_1_9_(.^?))">(.+?)</h2>'模式 1 = 重新编译(regex1)name1 = re.findall(pattern1, htmltext)打印价格",stocks_symbols[i].upper(),name1,is",price[0]
我猜问题出在regex1
,
regex1 = '(.+?)
'
我尝试阅读文档,但无法弄清楚.
在这个程序中,我尝试使用 Stock-Symbol 作为列表的输入来抓取 Stock-Name 和 Stock-Price.>
我认为我正在做的是在一个似乎不正确的变量中传递 2 (.+?) .
输出:
回溯(最近一次调用最后一次):文件C:\Py\stock\stocks.py",第 14 行,在 <module> 中.模式 1 = 重新编译(regex1)文件C:\canopy-1.4.0.1938.win-x86\lib\re.py",第190行,编译返回_compile(模式,标志)文件C:\canopy-1.4.0.1938.win-x86\lib\re.py",第 242 行,在 _compile引发错误,v # 无效的表达式错误:没有什么可重复的
^
匹配字符串的开头和之后的 ?
不是合法的正则表达式.如果您将正则表达式更改为 regex1 = '(.+?)'
它应该可以工作.请注意,您还有一个括号太多了.
此外,还有一种更好的方式来获取雅虎的股票信息.您可以使用 YQL 查询很多表(包括股票信息),并且还有一个 YQL-Console 在这里您可以尝试您的查询.
你从那里得到的结果是 JSON 或 XML,它们可以通过一些 python 库很好地处理.
import urllib
import re
stocks_symbols = ['aapl', 'spy', 'goog', 'nflx', 'msft']
for i in range(len(stocks_symbols)):
htmlfile = urllib.urlopen("https://finance.yahoo.com/q?s=" + stocks_symbols[i])
htmltext = htmlfile.read(htmlfile)
regex = '<span id="yfs_l84_' + stocks_symbols[i] + '">(.+?)</span>'
pattern = re.compile(regex)
price = re.findall(pattern, htmltext)
regex1 = '<h2 id="yui_3_9_1_9_(.^?))">(.+?)</h2>'
pattern1 = re.compile(regex1)
name1 = re.findall(pattern1, htmltext)
print "Price of", stocks_symbols[i].upper(), name1, "is", price[0]
I guess the problem is in regex1
,
regex1 = '<h2 id="yui_3_9_1_9_(.^?))">(.+?)</h2>'
I tried reading documentation but was unable to figure it out.
In this program I trying to scrape Stock-Name and Stock-Price with input of Stock-Symbol as a list.
what I think I am doing is to passing 2 (.+?) in one variable which seems incorrect.
OutPut:
Traceback (most recent call last):
File "C:\Py\stock\stocks.py", line 14, in <module>
pattern1 = re.compile(regex1)
File "C:\canopy-1.4.0.1938.win-x86\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\canopy-1.4.0.1938.win-x86\lib\re.py", line 242, in _compile
raise error, v # invalid expression
error: nothing to repeat
^
matches the start of a string and a ?
after that is not a legal regex. If you change your regex to regex1 = '(.+?)'
it should work. Note that you also had one parenthesis too much.
Furthermore there is a better way to get yahoo's stock information. You can query a lot of tables (including stock info) with YQL and there is also a YQL-Console where you can try out your queries.
The result you get from there is JSON or XML, which can be handled pretty good via some python libraries.
这篇关于通过从雅虎抓取股票,使用正则表达式在单行上获取多个数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!