如何使用 Scrapy 抓取本地 HTML 文件 [英] How to crawl local HTML file with Scrapy

查看：118 发布时间：2021/7/16 22:08:09 python scrapy

本文介绍了如何使用 Scrapy 抓取本地 HTML 文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试使用以下代码抓取存储在桌面中的本地 HTML 文件，但在抓取过程之前遇到以下错误，例如没有此类文件或目录:'/robots.txt'".

I tried to crawl a local HTML file stored in my desktop with the code below, but I encounter the following errors before crawling procedure, such as "No such file or directory: '/robots.txt'".

是否可以在本地计算机(Mac)中抓取本地 HTML 文件?
如果可能，我该怎么做设置allowed_domains"和start_urls"等参数?

[Scrapy 命令]

[Scrapy command]

$ scrapy crawl test -o test01.csv

[Scrapy蜘蛛]

class TestSpider(scrapy.Spider):
    name = 'test'
    allowed_domains = []
    start_urls = ['file:///Users/Name/Desktop/test/test.html']

[错误]

2018-11-16 01:57:52 [scrapy.core.engine] INFO: Spider opened
2018-11-16 01:57:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-16 01:57:52 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024
2018-11-16 01:57:52 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET file:///robots.txt> (failed 1 times): [Errno 2] No such file or directory: '/robots.txt'
2018-11-16 01:57:56 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET file:///robots.txt> (failed 2 times): [Errno 2] No such file or directory: '/robots.txt'

如何使用 Scrapy 抓取本地 HTML 文件 [英] How to crawl local HTML file with Scrapy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用 Scrapy 抓取本地 HTML 文件 [英] How to crawl local HTML file with Scrapy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭