如何使用Python请求来伪造浏览器访问? [英] How to use Python requests to fake a browser visit?

查看:540
本文介绍了如何使用Python请求来伪造浏览器访问?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从下面的网站获取内容。如果我使用Firefox或Chrome等浏览器,我可以得到我想要的真实网页,但如果使用Python请求包(或 wget 命令)来获取它,返回完全不同的HTML页面。我认为该网站的开发者已经为此做了一些阻止,所以问题是:



如何通过使用python请求或命令wget来伪造浏览器访问



http://www.ichangtou。 com /#company:data_000008.html

解决方案

提供 User-Agent 标题

 导入请求
$ b $ url ='http://www.ichangtou.com/#company:data_000008.html'
headers = {'User-Agent':'Mozilla /5.0(Macintosh; Intel Mac OS X 10_10_1)AppleWebKit / 537.36(KHTML,如Gecko)Chrome / 39.0.2171.95 Safari / 537.36'}

response = requests.get(url,headers = headers)
print(response.content)

仅供参考,这里是用户代理字符串列表适用于不同的浏览器:






作为一个方面说明,有一个非常有用的第三方软件包叫做假使用代理,它提供了一个用户代理的抽象层:


假用户代理



截至日期

演示:

 >>> from fake_useragent import UserAgent 
>>> ua = UserAgent()
>>> ua.chrome
u'Mozilla / 5.0(Windows NT 6.2; Win64; x64)AppleWebKit / 537.36(KHTML,如Gecko)Chrome / 32.0.1667.0 Safari / 537.36'
>>> ua.random
u'Mozilla / 5.0(Windows NT 6.1; WOW64)AppleWebKit / 537.36(KHTML,像Gecko)Chrome / 36.0.1985.67 Safari / 537.36'
pre>

I want to get the content from the below website. If I use a browser like Firefox or Chrome I could get the real website page I want, but if I use the Python requests package (or wget command) to get it, it returns a totally different HTML page. I thought the developer of the website had made some blocks for this, so the question is:

How do I fake a browser visit by using python requests or command wget?

http://www.ichangtou.com/#company:data_000008.html

解决方案

Provide a User-Agent header:

import requests

url = 'http://www.ichangtou.com/#company:data_000008.html'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
print(response.content)

FYI, here is a list of User-Agent strings for different browsers:


As a side note, there is a pretty useful third-party package called fake-useragent that provides a nice abstraction layer over user agents:

fake-useragent

Up to date simple useragent faker with real world database

Demo:

>>> from fake_useragent import UserAgent
>>> ua = UserAgent()
>>> ua.chrome
u'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36'
>>> ua.random
u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36'

这篇关于如何使用Python请求来伪造浏览器访问?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆