在请求中传递标头的影响? [英] Effects of passing headers in a requests?
问题描述
我想知道在 requests.get
中传递标头有什么不同,即 requests.get(url, headers)
和 requests 之间的区别.get(url)
.
I want to know what difference it makes when you pass headers in requests.get
i.e. the difference between requests.get(url, headers)
and requests.get(url)
.
我有这两段代码:
from lxml import html
from lxml import etree
import requests
import re
url = "http://www.amazon.in/SanDisk-micro-USB-connector-OTG-enabled-Android/dp/B00RBGYGMO"
page = requests.get(url)
tree = html.fromstring(page.text)
XPATH_IMAGE_SOURCE = '//*[@id="main-image-container"]//img/@src'
image_source = tree.xpath(XPATH_IMAGE_SOURCE)
print 'type: ',type(image_source[0])
print image_source[0]
它的输出是您所期望的网址.但是这个:
this whose out put is a url as you'd expect. But this:
from lxml import html
from lxml import etree
import requests
import re
url = "http://www.amazon.in/SanDisk-micro-USB-connector-OTG-enabled-Android/dp/B00RBGYGMO"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
page = requests.get(url, headers=headers)
tree = html.fromstring(page.text)
XPATH_IMAGE_SOURCE = '//*[@id="main-image-container"]//img/@src'
image_source = tree.xpath(XPATH_IMAGE_SOURCE)
print 'type: ',type(image_source[0])
print image_source[0]
有一个以 data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAoHBwgHBgoIC
开头的输出我猜这是没有渲染的实际图像,只是普通数据.知道如何将它保存在 url 形式中吗?标头的存在还有哪些其他方式会影响我们得到的响应?
has an output that starts with data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAoHBwgHBgoIC
I'm guessing this is the actual image without the rendering, just plain data. Any idea how I could keep it in url form? In what other ways does the presence of a header affect the response we get?
谢谢
推荐答案
将第一个代码的响应保存到 html 文件并在浏览器中打开:
Save the first code's response to html file and open in your browser:
如您所见,您在没有标题的情况下被亚马逊禁止.
as you can see, you are banned by amazon without headers.
使用这个 xpath:
use this xpath:
XPATH_IMAGE_SOURCE = '//*[@id="main-image-container"]//img/@data-old-hires'
出:
type: <class 'lxml.etree._ElementStringResult'>
http://ecx.images-amazon.com/images/I/617TjMIouyL._SL1274_.jpg
这是原始 html 数据:
this is raw html data:
<img alt=".." src=" data:image/webp;base64,UklGRuYIAABXRUJQVlA4INoIAACQQQCdASosAcsAPrFWpEqkIqQhIxN6gIgWCek6r4bUf/..."
data-old-hires="http://ecx.images-amazon.com/images/I/617TjMIouyL._SL1274_.jpg"
图片url在data-old-hires
属性中.
这篇关于在请求中传递标头的影响?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!