Scrapy Python 设置用户代理 [英] Scrapy Python Set up User Agent

查看:51
本文介绍了Scrapy Python 设置用户代理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过向项目添加额外的行来覆盖我的 crawlspider 的用户代理 配置文件.代码如下:

I tried to override the user-agent of my crawlspider by adding an extra line to the project configuration file. Here is the code:

[settings]
default = myproject.settings
USER_AGENT = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36"


[deploy]
#url = http://localhost:6800/
project = myproject

但是当我在自己的网络上运行爬虫时,我注意到蜘蛛并没有选择我定制的用户代理,而是默认的Scrapy/0.18.2 (+http://scrapy.org)".任何人都可以解释我做错了什么.

But when I run the crawler against my own web, I notice the spider did not pick up my customized user agent but the default one "Scrapy/0.18.2 (+http://scrapy.org)". Can any one explain what I have done wrong.

注意:

(1).当我尝试覆盖用户代理时,它起作用了:

(1). It works when I tried to override the user agent globally:

scrapy crawl myproject.com -o output.csv -t csv -s USER_AGENT="Mozilla...."

(2).当我从配置文件中删除default = myproject.setting"行,并运行scrapy crawl myproject.com时,它说找不到蜘蛛..",所以我觉得在这种情况下不应该删除默认设置.

(2). When I remove the line "default = myproject.setting" from the configuration file, and run scrapy crawl myproject.com, it says "cannot find spider..", so I feel like the default setting should not be removed in this case.

非常感谢您提前提供帮助.

Thanks a lot for the help in advance.

推荐答案

将你的 USER_AGENT 行移到 settings.py 文件中,而不是你的 scrapy.cfg 文件中.如果您使用 scrapy startproject 命令,settings.py 应该与 items.py 处于同一级别,在您的情况下,它应该类似于 myproject/settings.py

Move your USER_AGENT line to the settings.py file, and not in your scrapy.cfg file. settings.py should be at same level as items.py if you use scrapy startproject command, in your case it should be something like myproject/settings.py

这篇关于Scrapy Python 设置用户代理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆