使用需要登录的python 3抓取网站 [英] Scraping a website with python 3 that requires login

查看：90 发布时间：2021/4/15 19:06:25 python python-3.x web-scraping beautifulsoup mechanicalsoup

本文介绍了使用需要登录的python 3抓取网站的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这只是有关某些抓取身份验证的问题.使用 BeautifulSoup :

Just a question regarding some scraping authentication. Using BeautifulSoup:

#importing the requests lib  
import requests
from bs4 import BeautifulSoup

#specifying the page
page = requests.get("http://localhost:8080/login?from=%2F")
#parsing through the api
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

我认为从这里输出很重要:

From here the output, I think would be important:

 <table>
   <tr>
    <td>
     User:
    </td>
    <td>
     <input autocapitalize="off" autocorrect="off" id="j_username" name="j_username" type="text"/>
    </td>
   </tr>
   <tr>
    <td>
     Password:
    </td>
    <td>
     <input name="j_password" type="password"/>
    </td>
   </tr>
   <tr>
    <td align="right">
     <input id="remember_me" name="remember_me" type="checkbox"/>
    </td>
    <td>
     <label for="remember_me">
      Remember me on this computer
     </label>
    </td>
   </tr>
  </table>

这可以使网站正常运行，但是需要登录.在这里，我正在使用 mechanicalsoup 库:

This scrapes the website fine, but it requires a login. Here I am using the mechanicalsoup library:

import mechanicalsoup

browser = mechanicalsoup.StatefulBrowser()
browser.open("http://localhost:8080/login?from=%2F")
browser.get_url()
browser.get_current_page()
browser.get_current_page().find_all('form')
browser["j_username"] = "admin"
browser ["j_password"] = "password"
browser.launch_browser()

但是仍然不允许我登录.

However it still won't let me login.

有人使用过用于python 3的抓取工具，可让他们抓取具有身份验证的网站吗?

Has anyone used a scraping tool for python 3 that lets them scrape a site that has authentication?

推荐答案

我看到您正在使用请求.登录网站的语法如下:

I see you're using requests. The syntax for logging in to a site is as follows:

import requests
page = requests.get("http://localhost:8080/login?from=%2F", auth=
('username', 'password'))

希望这会有所帮助！您可以在此处阅读有关身份验证的更多信息: http://docs.python-requests.org/en/master/user/authentication/

Hope this helps! You can read more about authentication here: http://docs.python-requests.org/en/master/user/authentication/

这篇关于使用需要登录的python 3抓取网站的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用需要登录的python 3抓取网站 [英] Scraping a website with python 3 that requires login

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用需要登录的python 3抓取网站 [英] Scraping a website with python 3 that requires login

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭