与BeautifulSoup或LXML.HTML WebScraping [英] WebScraping with BeautifulSoup or LXML.HTML
本文介绍了与BeautifulSoup或LXML.HTML WebScraping的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我已经看到了一些网络广播和需要努力做到这一点帮助:
我一直在使用lxml.html。雅虎最近改变了网络结构。
I have seen some webcasts and need help in trying to do this: I have been using lxml.html. Yahoo recently changed the web structure.
目标页面;
http://finance.yahoo.com/quote/ ?IBM /期权日期= 1469750400&放大器;跨=真
在使用Chrome的督察:我看到在数据
In Chrome using inspector: I see the data in
//*[@id="main-0-Quote-Proxy"]/section/section/div[2]/section/section/table
再有更多的code
then some more code
如何做出来的得到这个数据到一个列表。
我想从LLY到MSFT到其他股改?结果
我如何日期之间切换....并得到所有个月。
How Do get this data out into a list.
I want to change to other stock from "LLY" to "Msft"?
How do I switch between dates....And get all months.
推荐答案
在此基础上的答案@hoju:
Basing the Answer on @hoju:
import lxml.html
import calendar
from datetime import datetime
exDate = "2014-11-22"
symbol = "LLY"
dt = datetime.strptime(exDate, '%Y-%m-%d')
ym = calendar.timegm(dt.utctimetuple())
url = 'http://finance.yahoo.com/q/op?s=%s&date=%s' % (symbol, ym,)
doc = lxml.html.parse(url)
table = doc.xpath('//table[@class="details-table quote-table Fz-m"]/tbody/tr')
rows = []
for tr in table:
d = [td.text_content().strip().replace(',','') for td in tr.xpath('./td')]
rows.append(d)
print rows
这篇关于与BeautifulSoup或LXML.HTML WebScraping的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文