特定网站在python和chrome中返回不同的响应 [英] A specific site is returning a different response on python and in chrome

查看:132
本文介绍了特定网站在python和chrome中返回不同的响应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python访问特定站点, 不管我使用哪个库,我似乎都无法访问它.

I am trying to access a specific site using python, and no matter which lib I use I just can't seem to access it.

我尝试了Selenium + PhantomJS,我尝试了请求和urllib.

I have tried Selenium+PhantomJS, I have tried requests and urllib.

每当我尝试从浏览器访问该网站时,我都会得到一个json文件,每当我尝试从python脚本中访问该网站时,我都会得到一个html文件(其中包含一个巨大的缩小脚本)

Whenever I try to access the site from the browser I get a json file, and whenever I try to access it from a python script I get an html file (which has a huge minified script inside it)

我怀疑此站点检测到我正在无提示地发送请求并阻止了我的请求,但是我不知道怎么办.

I suspect this site is detecting I'm sending the request headlessly and is blocking my requests, but I can't figure out how.

站点地址为: http://www.yesplanet.co.il/presentationsJSON

如果有人能指出正确的方向,我将不胜感激. 谢谢!

I would very much appreciate if anyone can point me in the right direction. Thanks!

这是我的硒代码:

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("http://www.yesplanet.co.il/presentationsJSON")
source = driver.page_source

这时我打印了源代码,发现它不是我期望的.

At this point I print the source and see it is not what I expected.

这是一个无法实现的请求实现:

Here is a requests implementation that also does not work:

import requests
res = requests.get("http://www.yesplanet.co.il/presentationsJSON")
source = res.content

这里也一样.

推荐答案

如果我设置了包括发送cookie在内的一堆标题,它对我来说就是有效的.

It works for me if I set a bunch of headers including sending a cookie.

curl -H "Cookie:rbzid=d29SMXE1Rktrdm5kS2x0YW5EdVZwUzNpYVhWdUlJSndlVzEvUU9vWG5OU2dRSVNnWTc3TWYwaHN4V2REVGJyNFBMSFl1bXErMGFLNXNtUGxVb0ZwS3dVRDRhajEwczFMMmE3cUc1blBmaTEzeFZFWGhrbHgrUXhNeHRhZnhWNjBib1pTenM5bjFvOUhVRVoxOTNGRHBYQXQwVzVsYXdSSXliME5LeUZjU0Rhb2tHa09ycUNVYmJyOUVjMERJN3daaUlFUGhwUHpvT0dDblcwU0wwMEM3NlJZRGw1K1pXZ2NKNkJRTWhvNUtaZz1AQEAxOTVAQEAtNjY2NjY2NjYwNjA-" -H "Accept-Language: en-US,en;q=0.8,ja;q=0.6" -H "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" http://www.yesplanet.co.il/presentationsJSON

不确定其他哪些标题很重要

Not sure which other headers are important

我通过检查开发工具中的网络面板来查看chrome发送的标头

I looked at what headers chrome was sending by checking the network panel i the dev tools

由此我也可以看到chrome发出了2个请求

From that I can also see chrome made 2 requests

这篇关于特定网站在python和chrome中返回不同的响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆