到数据库的非阻塞 Scrapy 管道 [英] Nonblocking Scrapy pipeline to database

查看：53 发布时间：2021/6/8 18:52:11 python sqlalchemy scrapy twisted nonblocking

本文介绍了到数据库的非阻塞 Scrapy 管道的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 Scrapy 中有一个用于获取数据项的网络抓取工具.我也想将它们异步插入到数据库中.

I have a web scraper in Scrapy that gets data items. I want to asynchronously insert them into a database as well.

例如，我有一个事务，它使用 SQLAlchemy Core 将一些项目插入到我的数据库中:

For example, I have a transaction that inserts some items into my db using SQLAlchemy Core:

def process_item(self, item, spider):
    with self.connection.begin() as conn:
        conn.execute(insert(table1).values(item['part1'])
        conn.execute(insert(table2).values(item['part2'])

我知道可以通过 alchimia<在 Twisted 中异步使用 SQLAlchemy Core/a>.alchimia 的文档代码示例如下.

I understand that it's possible to use SQLAlchemy Core asynchronously with Twisted with alchimia. The documentation code example for alchimia is below.

我不明白的是如何在alchimia框架中使用我上面的代码.如何设置 process_item 以使用反应器?

What I don't understand is how can I use my above code in the alchimia framework. How can I set up process_item to use a reactor?

我可以做这样的事情吗?

Can I do something like this?

@inlineCallbacks
def process_item(self, item, spider):
    with self.connection.begin() as conn:
        yield conn.execute(insert(table1).values(item['part1'])
        yield conn.execute(insert(table2).values(item['part2'])

反应堆部分怎么写?

或者是否有更简单的方法在 Scrapy 管道中进行非阻塞数据库插入?

作为参考，这里是 alchimia 文档中的代码示例:

For reference, here is the code example from alchimia's documentation:

from alchimia import TWISTED_STRATEGY

from sqlalchemy import (
    create_engine, MetaData, Table, Column, Integer, String
)
from sqlalchemy.schema import CreateTable

from twisted.internet.defer import inlineCallbacks
from twisted.internet.task import react


@inlineCallbacks
def main(reactor):
    engine = create_engine(
        "sqlite://", reactor=reactor, strategy=TWISTED_STRATEGY
    )

    metadata = MetaData()
    users = Table("users", metadata,
        Column("id", Integer(), primary_key=True),
        Column("name", String()),
    )

    # Create the table
    yield engine.execute(CreateTable(users))

    # Insert some users
    yield engine.execute(users.insert().values(name="Jeremy Goodwin"))
    yield engine.execute(users.insert().values(name="Natalie Hurley"))
    yield engine.execute(users.insert().values(name="Dan Rydell"))
    yield engine.execute(users.insert().values(name="Casey McCall"))
    yield engine.execute(users.insert().values(name="Dana Whitaker"))

    result = yield engine.execute(users.select(users.c.name.startswith("D")))
    d_users = yield result.fetchall()
    # Print out the users
    for user in d_users:
        print "Username: %s" % user[users.c.name]

if __name__ == "__main__":
    react(main, [])

到数据库的非阻塞 Scrapy 管道 [英] Nonblocking Scrapy pipeline to database

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

到数据库的非阻塞 Scrapy 管道 [英] Nonblocking Scrapy pipeline to database

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭