Python-Urllib2等待页面加载以抓取数据 [英] Python - Urllib2 Wait for page to load to scrape data

查看:232
本文介绍了Python-Urllib2等待页面加载以抓取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我想说的是,我不想使用Python 2.7.10未提供的任何库。相同的问题也发布在Stack Overflow上,但在Requests库中得到了回答。

Firstly, I'd like to say that I do not want to use any libraries that are not provided with Python 2.7.10. The same question was posted on Stack Overflow but was answered with the Requests library.

我有一个脚本,使用urllib2登录到Roblox.com。要在尝试登录之前检查是否存在验证码,我想做 check_captcha = re.findall('recaptcha_image',newlogin),但是roblox需要重定向到验证码登录页面,验证码必须加载到页面上。

I have a script that logs into Roblox.com using urllib2. To check if there is a captcha before I try to log in, I wanted to do check_captcha = re.findall('recaptcha_image', newlogin) but roblox needs to redirect to the captcha login page AND the captcha has to load onto the page.

因此,在继续执行 .read()

So how can I make Python wait to redirect/load the page fully before I go ahead and .read() and scrape it.

推荐答案

这将等待10秒钟,然后才能读取:

This will wait 10 seconds before it reads it:

import urllib2
import time
url = 'Roblox url'
data = urllib2.urlopen(url)
time.sleep(10)
data = data.read()

这篇关于Python-Urllib2等待页面加载以抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆