Scrapy Python 设置用户代理 [英] Scrapy Python Set up User Agent
问题描述
我试图通过向项目添加额外的行来覆盖我的 crawlspider 的用户代理 配置文件.代码如下:
I tried to override the user-agent of my crawlspider by adding an extra line to the project configuration file. Here is the code:
[settings]
default = myproject.settings
USER_AGENT = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36"
[deploy]
#url = http://localhost:6800/
project = myproject
但是当我在自己的网络上运行爬虫时,我注意到蜘蛛并没有选择我定制的用户代理,而是默认的Scrapy/0.18.2 (+http://scrapy.org)".任何人都可以解释我做错了什么.
But when I run the crawler against my own web, I notice the spider did not pick up my customized user agent but the default one "Scrapy/0.18.2 (+http://scrapy.org)". Can any one explain what I have done wrong.
注意:
(1).当我尝试覆盖用户代理时,它起作用了:
(1). It works when I tried to override the user agent globally:
scrapy crawl myproject.com -o output.csv -t csv -s USER_AGENT="Mozilla...."
(2).当我从配置文件中删除default = myproject.setting"行,并运行scrapy crawl myproject.com时,它说找不到蜘蛛..",所以我觉得在这种情况下不应该删除默认设置.
(2). When I remove the line "default = myproject.setting" from the configuration file, and run scrapy crawl myproject.com, it says "cannot find spider..", so I feel like the default setting should not be removed in this case.
非常感谢您提前提供帮助.
Thanks a lot for the help in advance.
推荐答案
将你的 USER_AGENT 行移到 settings.py
文件中,而不是你的 scrapy.cfg
文件中.如果您使用 scrapy startproject
命令,settings.py
应该与 items.py
处于同一级别,在您的情况下,它应该类似于 myproject/settings.py
Move your USER_AGENT line to the settings.py
file, and not in your scrapy.cfg
file. settings.py
should be at same level as items.py
if you use scrapy startproject
command, in your case it should be something like myproject/settings.py
这篇关于Scrapy Python 设置用户代理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!