BeautifulSoup不能向我显示网站的内容吗? [英] Can't BeautifulSoup show me the content of the website?

查看:141
本文介绍了BeautifulSoup不能向我显示网站的内容吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用一个名为BeautifulSoup的库来抓取网站的内容.

I want to scrape the contents of a website, using the library called BeautifulSoup.

代码:

from bs4 import BeautifulSoup
from urllib.request import urlopen
html_http_response = urlopen("http://www.airlinequality.com/airport-reviews/jeddah-airport/")
data = html_http_response.read()
soup = BeautifulSoup(data, "html.parser")
print(soup.prettify())

输出:

<html style="height:100%">
 <head>
  <meta content="NOINDEX, NOFOLLOW" name="ROBOTS"/>
  <meta content="telephone=no" name="format-detection"/>
  <meta content="initial-scale=1.0" name="viewport"/>
  <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
 </head>
 <body style="margin:0px;height:100%">
  <iframe frameborder="0" height="100%" marginheight="0px" marginwidth="0px" src="/_Incapsula_Resource?CWUDNSAI=9&amp;xinfo=9-57435048-0%200NNN%20RT%281512733380259%202%29%20q%280%20-1%20-1%20-1%29%20r%280%20-1%29%20B12%284%2c315%2c0%29%20U19&amp;incident_id=466002040110357581-305794245507288265&amp;edet=12&amp;cinfo=04000000" width="100%">
   Request unsuccessful. Incapsula incident ID: 466002040110357581-305794245507288265
  </iframe>
 </body>
</html>

从浏览器检查内容时,主体包含iFrame平衡垫,而不是所显示的内容.

The body contains an iFrame balise instead of the content shown when inspecting the content from the browser.

推荐答案

该网站使用Cookie来验证请求.如果您是初次访问网站,则需要选中I'm not Robot选项.因此,它将在请求的标头上传递incap_ses_415_965359,PHPSESSID,visid_incap_965359,_ga和_gid值并将其发送.

This website uses cookies to validate the requests. If you the website for the first time, you need to check I'm not Robot option. So it passes incap_ses_415_965359, PHPSESSID, visid_incap_965359, _ga and _gid values on the header of the requests and sends it.

因此,我从chrome开发工具中获取了cookie,并将其保存在字典中.

So, I got cookies from chrome dev tool and saved it in a dictionary.

 from bs4 import BeautifulSoup
import requests

cookies = {
     'incap_ses_415_965359':'djRha9OqhshstDcXvPV8cmHCBQGBKloAAAAAN3/D9dvoqwEc7GPEwefkhQ==', 'PHPSESSID':'fjmr7plc0dmocm8roq7togcp92', 'visid_incap_965359':'akteT8lDT1iyST7XJO7wdQGBKloAAAns;aAAQkIPAAAAAACAWbWAAQ6Ozzrln35KG6DhLXMRYnMjxOmY', '_ga':'GA1.2.894579844.151uus2734989', '_gid':"GA1.2.1055878562.1598994989"
}
html_http_response = requests.get("http://www.airlinequality.com/airport-reviews/jeddah-airport", cookies=cookies)
data = html_http_response.text
soup = BeautifulSoup(data, "html.parser")
print(soup.prettify())

从浏览器获取cookie值并更新

Get cookie values from your browser and update it

这篇关于BeautifulSoup不能向我显示网站的内容吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆