从脚本设置 Scrapy start_urls [英] Setting Scrapy start_urls from a Script

查看：38 发布时间：2021/6/26 19:41:36 python python-2.7 wxpython web-scraping scrapy

本文介绍了从脚本设置 Scrapy start_urls的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个可以工作的scrapy蜘蛛，我可以通过一个单独的脚本运行它，遵循此处示例.我还为我的脚本创建了一个 wxPython GUI，它只包含一个多行 TextCtrl，供用户输入要抓取的 URL 列表和要提交的按钮.目前 start_urls 被硬编码到我的蜘蛛中 - 如何将在我的 TextCtrl 中输入的 URL 传递给我的蜘蛛中的 start_urls 数组?在此先感谢您的帮助！

I have a working scrapy spider and I'm able to run it through a separate script following the example here. I have also created a wxPython GUI for my script that simply contains a multi-line TextCtrl for users to input a list of URLs to scrape and a button to submit. Currently the start_urls are hardcoded into my spider - How can I pass the URLs entered in my TextCtrl to the start_urls array in my spider? Thanks in advance for the help!

推荐答案

只需在您的 Spider 实例上设置 start_urls:

Just set start_urls on your Spider instance:

spider = FollowAllSpider(domain=domain)
spider.start_urls = ['http://google.com']

这篇关于从脚本设置 Scrapy start_urls的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从脚本设置 Scrapy start_urls [英] Setting Scrapy start_urls from a Script

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从脚本设置 Scrapy start_urls [英] Setting Scrapy start_urls from a Script

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭