Scrapy - 如何抓取网站 &将数据存储在 Microsoft SQL Server 数据库中? [英] Scrapy - How to crawl website & store data in Microsoft SQL Server database?

查看：70 发布时间：2021/7/16 22:17:37 python sql-server scrapy web-crawler

本文介绍了Scrapy - 如何抓取网站 &将数据存储在 Microsoft SQL Server 数据库中?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从我们公司创建的网站中提取内容.我在 MSSQL Server 中为 Scrapy 数据创建了一个表.我还设置了 Scrapy 并配置了 Python 来抓取 &提取网页数据.我的问题是，如何将 Scrapy 抓取的数据导出到我本地的 MSSQL Server 数据库中?

I'm trying to extract content from a website created by our company. I've created a table in MSSQL Server for Scrapy data. I've also set up Scrapy and configured Python to crawl & extract webpage data. My question is, how do I export the data crawled by Scrapy into my local MSSQL Server database?

这是 Scrapy 提取数据的代码:

This is Scrapy's code for extracting data:

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/page/1/',
        'http://quotes.toscrape.com/page/2/',
    ]

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').extract_first(),
                'author': quote.css('small.author::text').extract_first(),
                'tags': quote.css('div.tags a.tag::text').extract(),
            }

推荐答案

您可以使用 pymssql 模块向 SQL Server 发送数据，如下所示:

You can use pymssql module to send data to SQL Server, something like this :

import pymssql

class DataPipeline(object):
    def __init__(self):
        self.conn = pymssql.connect(host='host', user='user', password='passwd', database='db')
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):
        try:
            self.cursor.execute("INSERT INTO MYTABLE(text, author, tags) VALUES (%s, %s, %s)", (item['text'], item['author'], item['tags']))
            self.conn.commit()
        except pymssql.Error, e:
            print ("error")

        return item

此外，您需要在设置中将 'spider_name.pipelines.DataPipeline' : 300 添加到 ITEM_PIPELINES dict.

Also, you will need to add 'spider_name.pipelines.DataPipeline' : 300 to ITEM_PIPELINES dict in setting.

这篇关于Scrapy - 如何抓取网站 &将数据存储在 Microsoft SQL Server 数据库中?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Scrapy - 如何抓取网站 &将数据存储在 Microsoft SQL Server 数据库中? [英] Scrapy - How to crawl website & store data in Microsoft SQL Server database?

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

Scrapy - 如何抓取网站 &amp;将数据存储在 Microsoft SQL Server 数据库中? [英] Scrapy - How to crawl website &amp; store data in Microsoft SQL Server database?

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

Scrapy - 如何抓取网站 &将数据存储在 Microsoft SQL Server 数据库中? [英] Scrapy - How to crawl website & store data in Microsoft SQL Server database?

登录关闭