BeautifulSoup不能向我显示网站的内容吗? [英] Can't BeautifulSoup show me the content of the website?
问题描述
我想使用一个名为BeautifulSoup的库来抓取网站的内容.
I want to scrape the contents of a website, using the library called BeautifulSoup.
代码:
from bs4 import BeautifulSoup
from urllib.request import urlopen
html_http_response = urlopen("http://www.airlinequality.com/airport-reviews/jeddah-airport/")
data = html_http_response.read()
soup = BeautifulSoup(data, "html.parser")
print(soup.prettify())
输出:
<html style="height:100%">
<head>
<meta content="NOINDEX, NOFOLLOW" name="ROBOTS"/>
<meta content="telephone=no" name="format-detection"/>
<meta content="initial-scale=1.0" name="viewport"/>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
</head>
<body style="margin:0px;height:100%">
<iframe frameborder="0" height="100%" marginheight="0px" marginwidth="0px" src="/_Incapsula_Resource?CWUDNSAI=9&xinfo=9-57435048-0%200NNN%20RT%281512733380259%202%29%20q%280%20-1%20-1%20-1%29%20r%280%20-1%29%20B12%284%2c315%2c0%29%20U19&incident_id=466002040110357581-305794245507288265&edet=12&cinfo=04000000" width="100%">
Request unsuccessful. Incapsula incident ID: 466002040110357581-305794245507288265
</iframe>
</body>
</html>
从浏览器检查内容时,主体包含iFrame平衡垫,而不是所显示的内容.
The body contains an iFrame balise instead of the content shown when inspecting the content from the browser.
推荐答案
该网站使用Cookie来验证请求.如果您是初次访问网站,则需要选中I'm not Robot
选项.因此,它将在请求的标头上传递incap_ses_415_965359,PHPSESSID,visid_incap_965359,_ga和_gid值并将其发送.
This website uses cookies to validate the requests. If you the website for the first time, you need to check I'm not Robot
option. So it passes incap_ses_415_965359, PHPSESSID, visid_incap_965359, _ga and _gid values on the header of the requests and sends it.
因此,我从chrome开发工具中获取了cookie,并将其保存在字典中.
So, I got cookies from chrome dev tool and saved it in a dictionary.
from bs4 import BeautifulSoup
import requests
cookies = {
'incap_ses_415_965359':'djRha9OqhshstDcXvPV8cmHCBQGBKloAAAAAN3/D9dvoqwEc7GPEwefkhQ==', 'PHPSESSID':'fjmr7plc0dmocm8roq7togcp92', 'visid_incap_965359':'akteT8lDT1iyST7XJO7wdQGBKloAAAns;aAAQkIPAAAAAACAWbWAAQ6Ozzrln35KG6DhLXMRYnMjxOmY', '_ga':'GA1.2.894579844.151uus2734989', '_gid':"GA1.2.1055878562.1598994989"
}
html_http_response = requests.get("http://www.airlinequality.com/airport-reviews/jeddah-airport", cookies=cookies)
data = html_http_response.text
soup = BeautifulSoup(data, "html.parser")
print(soup.prettify())
从浏览器获取cookie值并更新
Get cookie values from your browser and update it
这篇关于BeautifulSoup不能向我显示网站的内容吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!