Selenium请求的HTTP标头中缺少引荐来源 [英] Referer missing in HTTP header of Selenium request

查看:110
本文介绍了Selenium请求的HTTP标头中缺少引荐来源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用Selenium编写一些测试,并注意到,标头中缺少Referer.我编写了以下最小示例,以使用 https://httpbin.org/headers 进行测试:

I'm writing some tests with Selenium and noticed, that Referer is missing from the headers. I wrote the following minimal example to test this with https://httpbin.org/headers:

import selenium.webdriver

options = selenium.webdriver.FirefoxOptions()
options.add_argument('--headless')

profile = selenium.webdriver.FirefoxProfile()
profile.set_preference('devtools.jsonview.enabled', False)

driver = selenium.webdriver.Firefox(firefox_options=options, firefox_profile=profile)
wait = selenium.webdriver.support.ui.WebDriverWait(driver, 10)

driver.get('http://www.python.org')
assert 'Python' in driver.title

url = 'https://httpbin.org/headers'
driver.execute_script('window.location.href = "{}";'.format(url))
wait.until(lambda driver: driver.current_url == url)
print(driver.page_source)

driver.close()

哪些印刷品:

<html><head><link rel="alternate stylesheet" type="text/css" href="resource://content-accessible/plaintext.css" title="Wrap Long Lines"></head><body><pre>{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-US,en;q=0.5", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0"
  }
}
</pre></body></html>

因此没有Referer.但是,如果我浏览到任何页面并手动执行

So there is no Referer. However, if I browse to any page and manually execute

window.location.href = "https://httpbin.org/headers"

在Firefox控制台中,按预期显示Referer .

in the Firefox console, Referer does appear as expected.

如以下评论所述,使用时

As pointed out in the comments below, when using

driver.get("javascript: window.location.href = '{}'".format(url))

代替

driver.execute_script("window.location.href = '{}';".format(url))

该请求确实包含Referer.另外,当使用Chrome而不是Firefox时,两种方法都包含Referer.

the request does include Referer. Also, when using Chrome instead of Firefox, both methods include Referer.

所以主要问题仍然存在:如上所述,使用Firefox发送请求时,为什么请求中缺少Referer?

So the main question still stands: Why is Referer missing in the request when sent with Firefox as described above?

推荐答案

Referer

Referer请求标头包含上一个网页的地址,从该地址开始到当前请求的页面的链接. Referer标头允许服务器识别人们从何处访问它们,例如,可以将该数据用于分析,日志记录或优化的缓存.

The Referer request header contains the address of the previous web page from which a link to the currently requested page was followed. The Referer header allows servers to identify where people are visiting them from and may use that data for analytics, logging, or optimized caching, for example.

重要提示:尽管此标头有许多无害的用法,但对于用户安全和隐私可能会产生不良后果.

Important: Although this header has many innocent uses it can have undesirable consequences for user security and privacy.

来源: https://developer.mozilla.org /en-US/docs/Web/HTTP/Headers/Referer

但是:

在以下情况下,浏览器不会发送Referer标头:

A Referer header is not sent by browsers if:

  • 引荐资源是本地文件"或数据" URI.
  • 使用了不安全的HTTP请求,并使用安全协议(HTTPS)接收了引荐页.

来源: https://developer.mozilla.org /en-US/docs/Web/HTTP/Headers/Referer

Referer HTTP标头相关的一些隐私和安全风险:

There are some privacy and security risks associated with the Referer HTTP header:

Referer标头包含前一个网页的地址,从该地址开始一直指向当前请求的页面的链接,该地址可进一步用于分析,日志记录或优化的缓存.

The Referer header contains the address of the previous web page from which a link to the currently requested page was followed, which can be further used for analytics, logging, or optimized caching.

来源: https://developer.mozilla .org/zh-CN/docs/Web/Security/Referer_header:_privacy_and_security_concerns#The_referrer_problem

Referer标头的角度来看,可以通过以下步骤缓解大多数安全风险:

From the Referer header perspective majority of security risks can be mitigated following the steps:

  • Referrer-Policy :使用服务器上的Referrer-Policy标头,以控制通过Referer标头发送哪些信息.同样,无引荐的指令将完全忽略引荐标头.
  • HTML元素上的referrerpolicy属性可能会泄漏此类信息(例如<img><a>).例如,可以将其设置为no-referrer以停止完全发送Referer标头.
  • 在可能泄漏此类信息(例如<img><a>)的HTML元素上,rel属性设置为noreferrer.
  • 退出页面重定向技术:这是目前没有缺陷的唯一可行方法,是使退出页面不包含在referer标头中.许多网站都采用这种方法,包括Google和Facebook.如果正确实现,它不会显示引用者数据显示私人信息,而只会显示用户来自的网站.而不是引荐来源网址数据显示为http://example.com/user/foobar,而是新的引荐来源网址数据显示为http://example.com/exit?url=http%3A%2F%2Fexample.com.该方法的工作方式是让您网站上的所有外部链接都转到中间页面,然后该页面重定向到最终页面.下面我们有一个指向网站example.com的链接,并且URL对完整URL进行了编码,并将其添加到退出页面的url参数中.
  • Referrer-Policy: Using the Referrer-Policy header on your server to control what information is sent through the Referer header. Again, a directive of no-referrer would omit the Referer header entirely.
  • The referrerpolicy attribute on HTML elements that are in danger of leaking such information (such as <img> and <a>). This can for example be set to no-referrer to stop the Referer header being sent altogether.
  • The rel attribute set to noreferrer on HTML elements that are in danger of leaking such information (such as <img> and <a>).
  • The Exit Page Redirect technique: This is the only method that should work at the moment without flaw is to have an exit page that you don’t mind having inside of the referer header. Many websites implement this method, including Google and Facebook. Instead of having the referrer data show private information, it only shows the website that the user came from, if implemented correctly. Instead of the referrer data appearing as http://example.com/user/foobar the new referrer data will appear as http://example.com/exit?url=http%3A%2F%2Fexample.com. The way the method works is by having all external links on your website go to a intermediary page that then redirects to the final page. Below we have a link to the website example.com and we URL encode the full URL and add it to the url parameter of our exit page.

来源:

  • https://developer.mozilla.org/en-US/docs/Web/Security/Referer_header:_privacy_and_security_concerns#How_can_we_fix_this
  • https://geekthis.net/post/hide-http-referer-headers/#exit-page-redirect

我已经通过GeckoDriver/Firefox和ChromeDriver/Chrome组合执行了您的代码:

I have executed your code through both through GeckoDriver/Firefox and ChromeDriver/Chrome combination:

driver.get('http://www.python.org')
assert 'Python' in driver.title

url = 'https://httpbin.org/headers'
driver.execute_script('window.location.href = "{}";'.format(url))
WebDriverWait(driver, 10).until(lambda driver: driver.current_url == url)
print(driver.page_source)

观察:

  • 使用GeckoDriver/Firefox Referer: "https://www.python.org/"标头丢失,如下所示:

    Observation:

    • Using GeckoDriver/Firefox Referer: "https://www.python.org/" header was missing as follows:

          {
            "headers": {
              "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
              "Accept-Encoding": "gzip, deflate, br", 
              "Accept-Language": "en-US,en;q=0.5", 
              "Host": "httpbin.org", 
              "Upgrade-Insecure-Requests": "1", 
              "User-Agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0"
            }
          }
      

    • 使用ChromeDriver/Chrome Referer: "https://www.python.org/"标头出现,如下所示:

    • Using ChromeDriver/Chrome Referer: "https://www.python.org/" header was present as follows:

          {
            "headers": {
              "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", 
              "Accept-Encoding": "gzip, deflate, br", 
              "Accept-Language": "en-US,en;q=0.9", 
              "Host": "httpbin.org", 
              "Referer": "https://www.python.org/", 
              "Upgrade-Insecure-Requests": "1", 
              "User-Agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36"
            }
          }
      

    • 在处理Referer标头时,GeckoDriver/Firefox似乎是一个问题.

      It seems to be an issue with GeckoDriver/Firefox in handling the Referer header.

      推荐人政策

      这篇关于Selenium请求的HTTP标头中缺少引荐来源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆