使用python请求和beautifulsoup4的响应中缺少html [英] Missing html in response using python requests and beautifulsoup4

查看:81
本文介绍了使用python请求和beautifulsoup4的响应中缺少html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我在浏览器中查看页面源代码时,出现的是html.但是,当我使用python请求发出请求时,html不会出现.

When I view the page source in my browser, the html I am after appears there. However, when I make a requests using python requests the html doesn't appear.

我要抓取的网址是 http://dota2lounge.com/match?m=13362 ,而我在页面中输入的特定html是.

The url I'm trying to scrape is http://dota2lounge.com/match?m=13362, and the specific html I am after in the page is.

<div class="full">
    <a class="button" onclick="ChoseEvent(13362,'Whole Match',false)">Match</a>
    <a class="button" onclick="ChoseEvent(13392,'1st Game','1462327200')">1st Game</a>
    <a class="button" onclick="ChoseEvent(13424,'2nd Game','1462327200')">2nd Game</a>
    <br><div id="toma" class="full" style="background: #444;line-height: 2.5rem;border: 1px solid #333;text-align: center;">Whole Match</div>
</div>

我想获取按钮的"onclick"值.到目前为止,我已经尝试过:

I'd like to get the 'onclick' values of the buttons. So far I've tried:

r = requests.get('http://dota2lounge.com/match?m=13268')
soup = bs(r.content, 'lxml')
buttons = soup.find_all('a', class_='button')

什么都不起作用.

r.content

似乎也没有显示html.

Doesn't appear to show the html either.

推荐答案

就像您在python中发出请求时未运行的javascript所添加的所需元素一样.查看这个问题.

Looks like the elements you want are being added by javascript that isn't being run when you make the request in python. Check out this question.

如果您只刮过一次(即,您只想获取数据,而又不想建立一个机器人来为您玩游戏),最快的选择通常是创建一个仅包含.htm文件的文件.链接到要抓取的每个页面(将每个链接放在<a>标记中,甚至不需要文本).然后,您可以在firefox中使用类似 downthemall 的工具来保存每个页面的本地副本,格式正确.

If you're just scraping this once (i.e. you just want the data and you're not trying to build a bot to play the game for you), the quickest option is often to just create a .htm file containing only links to every page you want to scrape (put each link in an <a> tag, you don't even need text). Then you can use a tool like downthemall in firefox to save a local copy of every page with the proper formatting.

这篇关于使用python请求和beautifulsoup4的响应中缺少html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆