使用Python抓取多个网页具有与第一页相同的结果 [英] Scraping multiple web pages has the same results as the first page using Python

查看：61 发布时间：2020/9/20 8:10:36 python-3.x web-scraping beautifulsoup

本文介绍了使用Python抓取多个网页具有与第一页相同的结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的问题是我试图从芝商所网站上获得产品名称.但是，尽管我在循环中更改了URL，为什么为什么该代码仍无法访问下一页?有什么想法和意见吗?预先感谢.

My question is about that I tried to get the product names from CME group website. However, why the code be wouldn't be able to access the next page although I changed the URLs in the loop? Any ideas and opinions on this? Thanks in advance.

from urllib.request import Request
from urllib.request import urlopen
from bs4 import BeautifulSoup

for i in range(1,6):
 url='http://www.cmegroup.com/trading/products/#pageNumber='+str(i)+'&sortAsc=false'

 CMEacess=Request(url,headers={'User-Agent':'Mozilla/5.0'})
 print(url)
 print('page: '+str(i))

 CMEpage=urlopen(CMEacess).read()
 CMEsoup=BeautifulSoup(CMEpage,'html.parser')

 namelist=CMEsoup.findAll('th',attrs={'class','cmeTableLeft'})

  for name in namelist:
    print(name.get_text())

  print('\n')

推荐答案

您可以尝试使用请求库而不是urllib.我只是使用与您相似的代码成功访问了第5页.

You could try using the requests library rather than urllib. I just accessed page 5 successfully using code similar to yours with this difference.

请注意，文字'D3'出现在第五页，而不是出现在第一页.

Note that the literal 'D3' appears on page five but not on page one.

>>> import requests
>>> i = 5
>>> url='http://www.cmegroup.com/trading/products/#pageNumber='+str(i)+'&sortAsc=false'
>>> page = requests.get(url).content
>>> import bs4
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> soup.find_all(string='D3')
['D3', 'D3']

这篇关于使用Python抓取多个网页具有与第一页相同的结果的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Python抓取多个网页具有与第一页相同的结果 [英] Scraping multiple web pages has the same results as the first page using Python

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用Python抓取多个网页具有与第一页相同的结果 [英] Scraping multiple web pages has the same results as the first page using Python

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭