Scrapy Pipeline 不会插入 MySQL [英] Scrapy Pipeline doesn't insert into MySQL

查看:64
本文介绍了Scrapy Pipeline 不会插入 MySQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Scrapy 为大学项目构建一个小应用程序.蜘蛛正在抓取项目,但我的管道没有将数据插入 mysql 数据库.为了测试管道是否不工作或 pymysl 实现是否不工作,我编写了一个测试脚本:

代码开始

#!/usr/bin/python3导入pymysqlstr1 = "嘿"str2 = "那里"str3 = "小"str4 = "脚本"db = pymysql.connect("localhost","root","**********","stromtarife")游标 = db.cursor()cursor.execute("SELECT * FROM vattenfall")cursor.execute("INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)", (str1, str2, str3, str4))cursor.execute("SELECT * FROM vattenfall")数据 = cursor.fetchone()打印(数据)数据库提交()游标.close()db.close()

代码结束

在我运行这个脚本后,我的数据库有了一个新记录,所以它不是我的 pymysql.connect() 函数,它坏了.

我将提供我的爬虫代码:

<块引用>

vattenfall_form.py

# -*- 编码:utf-8 -*-导入scrapy从 scrapy.crawler 导入 CrawlerProcess从 stromtarife.items 导入 StromtarifeItem从scrapy.http 导入FormRequestVattenfallEasy24KemptenV1500Spider(scrapy.Spider)类:name = 'vattenfall-easy24-v1500-p87435'def start_requests(self):返回 [表单请求("https://www.vattenfall.de/de/stromtarife.htm",formdata={"place": "87435", "zipCode": "87435", "cityName": "Kempten","electricity_consumptionprivate": "1500", "street": "", "hno": ""},回调=self.parse),]定义解析(自我,响应):item = StromtarifeItem()item['jahrespreis'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[3]/td[2]/text()').extract_first()item['treuebonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[2]/td/strong/text()').extract_first()item['sofortbonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[1]/td/strong/text()').extract_first()item['tarif'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[1]/h2/span/text()').extract_first()产量项目VattenfallEasy24KemptenV2500Spider(scrapy.Spider)类:name = 'vattenfall-easy24-v2500-p87435'def start_requests(self):返回 [表单请求("https://www.vattenfall.de/de/stromtarife.htm",formdata={"place": "87435", "zipCode": "87435", "cityName": "Kempten","electricity_consumptionprivate": "2500", "street": "", "hno": ""},回调=self.parse),]定义解析(自我,响应):item = StromtarifeItem()item['jahrespreis'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[3]/td[2]/text()').extract_first()item['treuebonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[2]/td/strong/text()').extract_first()item['sofortbonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[1]/td/strong/text()').extract_first()item['tarif'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[1]/h2/span/text()').extract_first()产量项目进程 = CrawlerProcess()process.crawl(VattenfallEasy24KemptenV1500Spider)process.crawl(VattenfallEasy24KemptenV2500Spider)process.start()

<块引用>

管道.py

导入pymysql从 stromtarife.items 导入 StromtarifeItem类StromtarifePipeline(对象):def __init__(self):self.connection = pymysql.connect("localhost","root","**********","stromtarife")self.cursor = self.connection.cursor()def process_item(self, item, spider):self.cursor.execute("INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)", (item['tarif'], item['sofortbonus'],item['treuebonus'], item['jahrespreis']))self.connection.commit()self.cursor.close()self.connection.close()

<块引用>

settings.py(我只更改了那一行)

ITEM_PIPELINES = {'stromtarife.pipelines.StromtarifePipeline':300,}

那么我的代码有什么问题?我无法弄清楚,如果有人看到我遗漏的东西,我会非常高兴.提前致谢!

解决方案

不应在每次处理项目时关闭 pymsql 连接.

您应该像这样在管道中编写 close_spider 函数,以便在执行结束时仅关闭一次连接:

 def close_spider(self, spider):self.cursor.close()self.connection.close()

此外,您需要在 process_item

结束时返回您的项目

您的文件 pipeline.py 应如下所示:

导入pymysql从 stromtarife.items 导入 StromtarifeItem类StromtarifePipeline(对象):def __init__(self):self.connection = pymysql.connect("localhost","root","**********","stromtarife")self.cursor = self.connection.cursor()def process_item(self, item, spider):self.cursor.execute("INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)", (item['tarif'], item['sofortbonus'],item['treuebonus'], item['jahrespreis']))self.connection.commit()归还物品def close_spider(自我,蜘蛛):self.cursor.close()self.connection.close()

更新:

我试过你的代码,问题在管道中,有两个问题:

  • 您尝试索引欧元符号 ,但我认为 mysql 不喜欢它.
  • 您的查询字符串构建得不好.

我设法通过像这样编写管道来完成任务:

def process_item(self, item, spider):query = """INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)""" % ("1", "2", "3", "4")self.cursor.execute(查询)self.connection.commit()归还物品

我认为您应该从您尝试插入的价格中删除 .

希望这有帮助,让我知道.

I'm trying to build a small app for a university project with Scrapy. The spider is scraping the items, but my pipeline is not inserting data into mysql database. In order to test whether the pipeline is not working or the pymysl implementation is not working I wrote a test script:

Code Start

#!/usr/bin/python3

import pymysql

str1 = "hey"
str2 = "there"
str3 = "little"
str4 = "script"

db = pymysql.connect("localhost","root","**********","stromtarife" )

cursor = db.cursor()

cursor.execute("SELECT * FROM vattenfall")
cursor.execute("INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)", (str1, str2, str3, str4))
cursor.execute("SELECT * FROM vattenfall")
data = cursor.fetchone()
print(data)
db.commit()
cursor.close()

db.close()

Code End

After i run this script my database has a new record, so its not my pymysql.connect() function, which is broke.

I'll provide my scrapy code:

vattenfall_form.py

# -*- coding: utf-8 -*-
import scrapy
from scrapy.crawler import CrawlerProcess
from stromtarife.items import StromtarifeItem

from scrapy.http import FormRequest

class VattenfallEasy24KemptenV1500Spider(scrapy.Spider):
    name = 'vattenfall-easy24-v1500-p87435'

    def start_requests(self):
        return [
            FormRequest(
                "https://www.vattenfall.de/de/stromtarife.htm",
                formdata={"place": "87435", "zipCode": "87435", "cityName": "Kempten",
                      "electricity_consumptionprivate": "1500", "street": "", "hno": ""},
            callback=self.parse
        ),
    ]

    def parse(self, response):
        item = StromtarifeItem()
        item['jahrespreis'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[3]/td[2]/text()').extract_first()
        item['treuebonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[2]/td/strong/text()').extract_first()
        item['sofortbonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[1]/td/strong/text()').extract_first()
        item['tarif'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[1]/h2/span/text()').extract_first()
        yield item



class VattenfallEasy24KemptenV2500Spider(scrapy.Spider):
    name = 'vattenfall-easy24-v2500-p87435'

    def start_requests(self):
        return [
                    FormRequest(
                    "https://www.vattenfall.de/de/stromtarife.htm",
                    formdata={"place": "87435", "zipCode": "87435", "cityName": "Kempten",
                              "electricity_consumptionprivate": "2500", "street": "", "hno": ""},
                    callback=self.parse
                ),
    ]

    def parse(self, response):
        item = StromtarifeItem()
        item['jahrespreis'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[3]/td[2]/text()').extract_first()
        item['treuebonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[2]/td/strong/text()').extract_first()
        item['sofortbonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[1]/td/strong/text()').extract_first()
        item['tarif'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[1]/h2/span/text()').extract_first()
        yield item



process = CrawlerProcess()
process.crawl(VattenfallEasy24KemptenV1500Spider)
process.crawl(VattenfallEasy24KemptenV2500Spider)
process.start()

pipelines.py

import pymysql
from stromtarife.items import StromtarifeItem


class StromtarifePipeline(object):
    def __init__(self):
        self.connection = pymysql.connect("localhost","root","**********","stromtarife")
        self.cursor = self.connection.cursor()


    def process_item(self, item, spider):
        self.cursor.execute("INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)", (item['tarif'], item['sofortbonus'], item['treuebonus'], item['jahrespreis']))
        self.connection.commit()
        self.cursor.close()
        self.connection.close()

settings.py (i changed only that line)

ITEM_PIPELINES = {
   'stromtarife.pipelines.StromtarifePipeline': 300,
}

So what is wrong with my code ? I couldn't figure it out and would be really happy if someone is seeing something i'm missing. Thanks in advance!

解决方案

You should not close your pymsql connection every time you process an item.

You should write the close_spider function in your pipeline like this, so the connection is closed just once, at the end of the execution:

 def close_spider(self, spider):
        self.cursor.close()
        self.connection.close()

Moreover you neeed to return your item at the end of process_item

Your file pipeline.py should look like this:

import pymysql
from stromtarife.items import StromtarifeItem


class StromtarifePipeline(object):
    def __init__(self):
        self.connection = pymysql.connect("localhost","root","**********","stromtarife")
        self.cursor = self.connection.cursor()


    def process_item(self, item, spider):
        self.cursor.execute("INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)", (item['tarif'], item['sofortbonus'], item['treuebonus'], item['jahrespreis']))
        self.connection.commit()
        return item

    def close_spider(self, spider):
        self.cursor.close()
        self.connection.close()

UPDATE :

I tried your code, the problem is in the pipeline, there are two problems:

  • You try to index the euro symbol and I think mysql does not like it.
  • Your query string is not well built.

I managed to get things done by writting the pipeline like this:

def process_item(self, item, spider):
    query = """INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)""" % ("1", "2", "3", "4")
    self.cursor.execute(query)
    self.connection.commit()
    return item

I thing you should remove the from the prices you try to insert.

Hope this helps, let me know.

这篇关于Scrapy Pipeline 不会插入 MySQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆