Python:如何解析需要登录的网页的HTML? [英] Python: How do I parse HTML of a webpage that requires being logged in?
问题描述
我正在尝试解析需要登录的网页的HTML.我可以使用以下脚本获取网页的HTML:
I'm trying to parse the HTML of a webpage that requires being logged in. I can get the HTML of a webpage using this script:
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
import re
webpage = urlopen ('https://www.example.com')
soup = BeautifulSoup (webpage)
print soup
#This would print the source of example.com
但是,要获得我登录的网页的来源会更加困难. 我尝试将('https://www.example.com')替换为('https://user:pass@example.com'),但出现无效URL错误.
But trying to get the source of a webpage that I'm logged into proves to be more difficult. I tried replacing the ('https://www.example.com') with ('https://user:pass@example.com') but I got an Invalid URL error.
有人知道我该怎么做吗? 预先感谢.
Anyone know how I could do this? Thanks in advance.
推荐答案
Selenium WebDriver( http://seleniumhq .org/projects/webdriver/)可能很适合您的需求.您可以登录到该页面,然后打印HTML的内容.这是一个示例:
Selenium WebDriver ( http://seleniumhq.org/projects/webdriver/ ) might be good for your needs here. You can log in to the page and then print the contents of the HTML. Here's an example:
from selenium import webdriver
# initiate
driver = webdriver.Firefox() # initiate a driver, in this case Firefox
driver.get("http://example.com") # go to the url
# locate the login form
username_field = driver.find_element_by_name(...) # get the username field
password_field = driver.find_element_by_name(...) # get the password field
# log in
username_field.send_keys("username") # enter in your username
password_field.send_keys("password") # enter in your password
password_field.submit() # submit it
# print HTML
html = driver.page_source
print html
这篇关于Python:如何解析需要登录的网页的HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!