使用 urllib 抓取网页 [英] Webscraping with urllib

查看:42
本文介绍了使用 urllib 抓取网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望从

I am looking to get some information off the CME website Namely I want to get the Futures Yield and the Futures DV01 for the 10y Treasury Note Future. Found this little snippet on an old thread:

import urllib.request
class AppURLopener(urllib.request.FancyURLopener):
    version = "Mozilla/5.0"
opener = AppURLopener()
fh = opener.open('http://www.cmegroup.com/tools-information/quikstrike/treasury-analytics.html')

It throws a deprecation warning and I am not quite sure how I get the info from the website. Can someone enlighten me please what the new syntax should be and and how to get the info. Thanks

解决方案

Run the script when you are done installing selenium.

from selenium import webdriver ; from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get("http://www.cmegroup.com/tools-information/quikstrike/treasury-analytics.html")

driver.switch_to_frame(driver.find_element_by_tag_name("iframe"))
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()

table = soup.select('table.grid')[0]
list_of_rows = [[t_data.text for t_data in item.select('th,td')]
                for item in table.select('tr')]

for data in list_of_rows:
    print(data)

I think, this is the table [partial picture] you are after:

这篇关于使用 urllib 抓取网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆