Python:如何解析需要登录的网页的HTML? [英] Python: How do I parse HTML of a webpage that requires being logged in?

查看:271
本文介绍了Python:如何解析需要登录的网页的HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析需要登录的网页的HTML.我可以使用以下脚本获取网页的HTML:

I'm trying to parse the HTML of a webpage that requires being logged in. I can get the HTML of a webpage using this script:

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
import re

webpage = urlopen ('https://www.example.com')
soup = BeautifulSoup (webpage)
print soup
#This would print the source of example.com

但是,要获得我登录的网页的来源会更加困难. 我尝试将('https://www.example.com')替换为('https://user:pass@example.com'),但出现无效URL错误.

But trying to get the source of a webpage that I'm logged into proves to be more difficult. I tried replacing the ('https://www.example.com') with ('https://user:pass@example.com') but I got an Invalid URL error.

有人知道我该怎么做吗? 预先感谢.

Anyone know how I could do this? Thanks in advance.

推荐答案

Selenium WebDriver( http://seleniumhq .org/projects/webdriver/)可能很适合您的需求.您可以登录到该页面,然后打印HTML的内容.这是一个示例:

Selenium WebDriver ( http://seleniumhq.org/projects/webdriver/ ) might be good for your needs here. You can log in to the page and then print the contents of the HTML. Here's an example:

from selenium import webdriver

# initiate
driver = webdriver.Firefox() # initiate a driver, in this case Firefox
driver.get("http://example.com") # go to the url

# locate the login form
username_field = driver.find_element_by_name(...) # get the username field
password_field = driver.find_element_by_name(...) # get the password field

# log in
username_field.send_keys("username") # enter in your username
password_field.send_keys("password") # enter in your password
password_field.submit() # submit it

# print HTML
html = driver.page_source
print html

这篇关于Python:如何解析需要登录的网页的HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆