“Lighthouse 无法下载 robots.txt 文件"尽管文件可以访问 [英] "Lighthouse was unable to download a robots.txt file" despite the file being accessible

查看:18
本文介绍了“Lighthouse 无法下载 robots.txt 文件"尽管文件可以访问的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 NodeJS/NextJS 应用程序在

发生此错误是因为浏览器对您网站提供的标头强制执行内容安全策略限制(为了便于阅读,分成几行):

内容安全策略:默认源自我";script-src 'self' *.google-analytics.com;img-src 'self' *.google-analytics.com;连接源代码无";style-src 'self' 'unsafe-inline' fonts.googleapis.com;font-src 'self' fonts.gstatic.com;对象源自我";媒体源自我";框架源自我"

注意 connect-src 'none'; 部分.根据 CSP 规范,这意味着没有 URL 可以使用脚本接口从服务文档中加载.实际上,任何 fetch 都会被拒绝.

由于您配置 内容安全策略中间件(来自 commit a6aef0e):

import csp from 'helmet-csp';server.use(csp({指令:{defaultSrc: ["'self'"],scriptSrc: ["'self'", '*.google-analytics.com'],imgSrc: ["'self'", '*.google-analytics.com'],connectSrc: ["'none'"],styleSrc: ["'self'", "'unsafe-inline'", 'maxcdn.bootstrapcdn.com'],//移除 unsafe-inline 以获得更好的安全性fontSrc: ["'self'"],objectSrc: ["'self'"],mediaSrc: ["'self'"],frameSrc: ["'self'"]}}));


解决方案/解决方法:要解决审计报告中的问题,您可以:

  • 在 Lighthouse 中等待(或提交)修复
  • 使用 connect-src 'self' 指令,它的副作用是允许来自 Next.js 应用的浏览器端的 HTTP 请求

I have a NodeJS/NextJS app running at http://www.schandillia.com. The project has a robots.txt file accessible at http://www.schandillia.com/robots.txt. As of now, the file is bare-bones for testing purposes:

User-agent: *
Allow: /

However, when I run a Lighthouse audit on my site, it throws a Crawling and Indexing error saying it couldn't download a robots.txt file. I repeat, the file is available at http://www.schandillia.com/robots.txt.

The project's codebase, should you need to take a look, is up at https://github.com/amitschandillia/proost. The robots.txt file is located at proost/web/static/ but accessible at root thanks to the following in my Nginx config:

# ... the rest of your configuration
  location = /robots.txt {
    proxy_pass http://127.0.0.1:3000/static/robots.txt;
  }

The complete config file is available for review on github at https://github.com/amitschandillia/proost/blob/master/.help_docs/configs/nginx.conf.

Please advice if there's something I'm overlooking.

解决方案

TL;DR: Your robots.txt is served fine, but Lighthouse can not fetch it properly because its audit can currently not work with the connect-src directive of of your site’s Content Security Policy, due to a known limitation which is being tracked as issue #4386 was fixed in Chrome 92.


Explanation: Lighthouse attempts to fetch the robots.txt file by way of a script ran from the document served by the root of your site. Here is the code it uses to perform this request (found in lighthouse-core):

const response = await fetch(new URL('/robots.txt', location.href).href);

If you try to run this code from your site, you will notice that a "Refused to connect" error is thrown:

This error happens because the browser enforces the Content Security Policy restrictions from the headers served by your site (split on several lines for readability):

content-security-policy:
    default-src 'self';
    script-src 'self' *.google-analytics.com;
    img-src 'self' *.google-analytics.com;
    connect-src 'none';
    style-src 'self' 'unsafe-inline' fonts.googleapis.com;
    font-src 'self' fonts.gstatic.com;
    object-src 'self';
    media-src 'self';
    frame-src 'self'

Notice the connect-src 'none'; part. Per the CSP spec, it means that no URL can be loaded using script interfaces from within the served document. In practice, any fetch is refused.

This header is explicitly sent by the server layer of your by Next.js application, because of the way you configured your Content Security Policy middleware (from commit a6aef0e):

import csp from 'helmet-csp';

server.use(csp({
  directives: {
    defaultSrc: ["'self'"],
    scriptSrc: ["'self'", '*.google-analytics.com'],
    imgSrc: ["'self'", '*.google-analytics.com'],
    connectSrc: ["'none'"],
    styleSrc: ["'self'", "'unsafe-inline'", 'maxcdn.bootstrapcdn.com'], // Remove unsafe-inline for better security
    fontSrc: ["'self'"],
    objectSrc: ["'self'"],
    mediaSrc: ["'self'"],
    frameSrc: ["'self'"]
  }
}));


Solution/Workaround: To solve the problem in the audit report, you can either:

  • wait for (or submit) a fix in Lighthouse
  • use the connect-src 'self' directive, which will have the side effect of allowing HTTP requests from the browser side of your Next.js app

这篇关于“Lighthouse 无法下载 robots.txt 文件"尽管文件可以访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆