使用Beautifulsoup进行网页抓取，带来了不同的内容 [英] Web scrape using Beautifulsoup , brings different content

查看：103 发布时间：2020/9/20 8:16:51 python html beautifulsoup html-parsing

本文介绍了使用Beautifulsoup进行网页抓取，带来了不同的内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果您访问 http://www.imdb.com/title/tt2375692/episodes?season = 1 在这里，那么您会看到第1集第1集的发布日期是2014年1月25日，

If you visit http://www.imdb.com/title/tt2375692/episodes?season=1 here, then you will see that season 1,episode 1's publish date is 25 Jan. 2014,

这是我用来抓取的代码.

This is the code I am using to scrape.

    req = urllib2.Request('http://www.imdb.com/title/tt2375692/episodes?season=1')
    self.diziPage = urllib2.urlopen(req).read()
    self.diziSoup = BeautifulSoup(self.diziPage,from_encoding="utf8")

在我抓取网站后，除了播出日期外，一切都很好，第1集的播出日期为2014年4月20日，当我访问时还不存在，其余所有信息均已发布.

After I scrape the site, everything is fine except the airdate, episode 1 's airdate comes out 20 April 2014, which is not in present when I visit, all of the rest informations comes corrent.

我认为可能是因为标头，我做了一些实验，但是没有用.

I thought it may be because of headers I did some experiments but that didnt work.

使用Beautifulsoup进行网页抓取，带来了不同的内容 [英] Web scrape using Beautifulsoup , brings different content

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用Beautifulsoup进行网页抓取，带来了不同的内容 [英] Web scrape using Beautifulsoup , brings different content

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭