Sqlalchemy:从 Scrapy 项目动态创建表 [英] Sqlalchemy : Dynamically create table from Scrapy item

查看:76
本文介绍了Sqlalchemy:从 Scrapy 项目动态创建表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 sqlalchemy 1.1 和 scrapy.我目前正在使用管道通过 sqlalchemy 将提取的数据存储在 sqllite 表中.我想动态创建一个表来容纳被抓取的项目.

I'm working with sqlalchemy 1.1 and scrapy. I'm currently using a a pipeline to store extracted data in a sqllite table via sqlalchemy . I'd like to dynamically create a table to accommodate the item being scraped.

我的静态管道元素如下所示:

My static pipeline element looks like:

class SQLlitePipeline(object):

    def __init__(self):
        db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
        _engine = create_engine(db_path)
        _connection = _engine.connect()
        _metadata = MetaData()
        _stack_items = Table(table_name, _metadata,
                             Column("id", Integer, primary_key=True),
                             Column("value", Text))
                             Column("value2", Text))
        _metadata.create_all(_engine)
        self.connection = _connection
        self.stack_items = _stack_items

    def process_item(self, item, spider):

            try:
                ins_query = self.stack_items.insert().values(
                value=item['value'],
                value2=item['value2'],)
                self.connection.execute(ins_query)
            except IntegrityError:
                    print('THIS IS A DUP')
            return item

items.py:

class Filtered_Item(scrapy.Item):

    value= scrapy.Field()
    value2= scrapy.Field()

如何修改上面的管道以动态创建和插入过滤项的值,而不是像现在这样硬编码?

How can I modify the pipeline above to dynamically create and insert the filtered item's values instead of having these hard coded in like they are now?

推荐答案

实际上有一个软件包可以帮助您解决这个问题.

There's actually a package out there that can help you out with this.

查看:数据集:懒人数据库

这是页面的摘录:

如果写入数据库中不存在的表或列,则会自动创建.

Features

Automatic schema:

If a table or column is written that does not exist in the database, it will be created automatically.

创建或更新记录取决于是否可以找到现有版本.简单的查询助手查询,例如表中的所有行或整个表中的所有不同值一组列.

Records are either created or updated, depending on whether an existing version can be found. Query helpers for simple queries such as all rows in a table or all distinct values across a set of columns.

数据集建立在 SQLAlchemy 之上,适用于所有主要数据库,例如 SQLite、PostgreSQL 和 MySQL.

Being built on top of SQLAlchemy, dataset works with all major databases, such as SQLite, PostgreSQL and MySQL.

可以基于脚本化配置导出数据,使该过程变得简单且可复制.

Data can be exported based on a scripted configuration, making the process easy and replicable.

这篇关于Sqlalchemy:从 Scrapy 项目动态创建表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆