Python Scrapy-从MySQL填充start_urls [英] Python Scrapy - populate start_urls from mysql

查看：213 发布时间：2020/5/15 0:44:25 python mysql scrapy web-crawler

本文介绍了Python Scrapy-从MySQL填充start_urls的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 spider.py 从MYSQL表中使用SELECT填充start_url.当我运行"scrapy runtimepider spider.py"时，我没有任何输出，只是它没有错误完成.

I am trying to populate start_url with a SELECT from a MYSQL table using spider.py. When i run "scrapy runspider spider.py" i get no output, just that it finished with no error.

我已经在python脚本中测试了SELECT查询，并使用MYSQL表中的条目填充了start_url.

I have tested the SELECT query in a python script and start_url get populated with the entrys from the MYSQL table.

spider.py

from scrapy.spider import BaseSpider
from scrapy.selector import Selector
import MySQLdb


class ProductsSpider(BaseSpider):
    name = "Products"
    allowed_domains = ["test.com"]
    start_urls = []

    def parse(self, response):
        print self.start_urls

    def populate_start_urls(self, url):
        conn = MySQLdb.connect(
                user='user',
                passwd='password',
                db='scrapy',
                host='localhost',
                charset="utf8",
                use_unicode=True
                )
        cursor = conn.cursor()
        cursor.execute(
            'SELECT url FROM links;'
            )
    rows = cursor.fetchall()

    for row in rows:
        start_urls.append(row[0])
    conn.close()

推荐答案

更好的方法是覆盖

A better approach is to override the start_requests method.

这可以像populate_start_urls一样查询数据库，并返回

This can query your database, much like populate_start_urls, and return a sequence of Request objects.

您只需要将populate_start_urls方法重命名为start_requests并修改以下几行:

You would just need to rename your populate_start_urls method to start_requests and modify the following lines:

for row in rows:
    yield self.make_requests_from_url(row[0])

这篇关于Python Scrapy-从MySQL填充start_urls的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python Scrapy-从MySQL填充start_urls [英] Python Scrapy - populate start_urls from mysql

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

Python Scrapy-从MySQL填充start_urls [英] Python Scrapy - populate start_urls from mysql

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭