特定网站在python和chrome中返回不同的响应 [英] A specific site is returning a different response on python and in chrome
问题描述
我正在尝试使用python访问特定站点, 不管我使用哪个库,我似乎都无法访问它.
I am trying to access a specific site using python, and no matter which lib I use I just can't seem to access it.
我尝试了Selenium + PhantomJS,我尝试了请求和urllib.
I have tried Selenium+PhantomJS, I have tried requests and urllib.
每当我尝试从浏览器访问该网站时,我都会得到一个json文件,每当我尝试从python脚本中访问该网站时,我都会得到一个html文件(其中包含一个巨大的缩小脚本)
Whenever I try to access the site from the browser I get a json file, and whenever I try to access it from a python script I get an html file (which has a huge minified script inside it)
我怀疑此站点检测到我正在无提示地发送请求并阻止了我的请求,但是我不知道怎么办.
I suspect this site is detecting I'm sending the request headlessly and is blocking my requests, but I can't figure out how.
站点地址为: http://www.yesplanet.co.il/presentationsJSON
如果有人能指出正确的方向,我将不胜感激. 谢谢!
I would very much appreciate if anyone can point me in the right direction. Thanks!
这是我的硒代码:
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("http://www.yesplanet.co.il/presentationsJSON")
source = driver.page_source
这时我打印了源代码,发现它不是我期望的.
At this point I print the source and see it is not what I expected.
这是一个无法实现的请求实现:
Here is a requests implementation that also does not work:
import requests
res = requests.get("http://www.yesplanet.co.il/presentationsJSON")
source = res.content
这里也一样.
推荐答案
如果我设置了包括发送cookie在内的一堆标题,它对我来说就是有效的.
It works for me if I set a bunch of headers including sending a cookie.
curl -H "Cookie:rbzid=d29SMXE1Rktrdm5kS2x0YW5EdVZwUzNpYVhWdUlJSndlVzEvUU9vWG5OU2dRSVNnWTc3TWYwaHN4V2REVGJyNFBMSFl1bXErMGFLNXNtUGxVb0ZwS3dVRDRhajEwczFMMmE3cUc1blBmaTEzeFZFWGhrbHgrUXhNeHRhZnhWNjBib1pTenM5bjFvOUhVRVoxOTNGRHBYQXQwVzVsYXdSSXliME5LeUZjU0Rhb2tHa09ycUNVYmJyOUVjMERJN3daaUlFUGhwUHpvT0dDblcwU0wwMEM3NlJZRGw1K1pXZ2NKNkJRTWhvNUtaZz1AQEAxOTVAQEAtNjY2NjY2NjYwNjA-" -H "Accept-Language: en-US,en;q=0.8,ja;q=0.6" -H "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" http://www.yesplanet.co.il/presentationsJSON
不确定其他哪些标题很重要
Not sure which other headers are important
我通过检查开发工具中的网络面板来查看chrome发送的标头
I looked at what headers chrome was sending by checking the network panel i the dev tools
由此我也可以看到chrome发出了2个请求
From that I can also see chrome made 2 requests
这篇关于特定网站在python和chrome中返回不同的响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!