从脚本设置 Scrapy start_urls [英] Setting Scrapy start_urls from a Script
问题描述
我有一个可以工作的scrapy蜘蛛,我可以通过一个单独的脚本运行它,遵循此处示例.我还为我的脚本创建了一个 wxPython GUI,它只包含一个多行 TextCtrl,供用户输入要抓取的 URL 列表和要提交的按钮.目前 start_urls 被硬编码到我的蜘蛛中 - 如何将在我的 TextCtrl 中输入的 URL 传递给我的蜘蛛中的 start_urls 数组?在此先感谢您的帮助!
I have a working scrapy spider and I'm able to run it through a separate script following the example here. I have also created a wxPython GUI for my script that simply contains a multi-line TextCtrl for users to input a list of URLs to scrape and a button to submit. Currently the start_urls are hardcoded into my spider - How can I pass the URLs entered in my TextCtrl to the start_urls array in my spider? Thanks in advance for the help!
推荐答案
只需在您的 Spider
实例上设置 start_urls
:
Just set start_urls
on your Spider
instance:
spider = FollowAllSpider(domain=domain)
spider.start_urls = ['http://google.com']
这篇关于从脚本设置 Scrapy start_urls的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!