Pyramid:多线程数据库操作 [英] Pyramid: Multi-threaded Database Operation

查看:59
本文介绍了Pyramid:多线程数据库操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的应用程序从用户处接收一个或多个 URL(通常为 3-4 个 URL),从这些 URL 中抓取某些数据并将这些数据写入数据库.然而,因为抓取这些数据需要一点时间,我正在考虑在一个单独的线程中运行每个抓取,以便抓取 + 写入数据库可以在后台继续进行,这样用户就不必继续了等待.

My application receives one or more URLs (typically 3-4 URLs) from the user, scrapes certain data from those URLs and writes those data to the database. However, because scraping those data take a little while, I was thinking of running each of those scraping in a separate thread so that the scraping + writing to the database can keep on going in the background so that the user does not have to keep on waiting.

为了实现这一点,我有(仅相关部分):

To implement that, I have (relevant parts only):

@view_config(route_name="add_movie", renderer="templates/add_movie.jinja2")
def add_movie(request):
    post_data = request.POST

    if "movies" in post_data:
        movies = post_data["movies"].split(os.linesep)

        for movie_id in movies:        
            movie_thread = Thread(target=store_movie_details, args=(movie_id,))
            movie_thread.start()

    return {}

def store_movie_details(movie_id):

    movie_details = scrape_data(movie_id)
    new_movie = Movie(**movie_details) # Movie is my model.

    print new_movie  # Works fine.

    print DBSession.add(movies(**movie_details))  # Returns None.

虽然 new_movie 行确实打印了正确的报废数据,但 DBSession.add() 不起作用.事实上,它只是返回None.

While the line new_movie does print correct scrapped data, DBSession.add() doesn't work. In fact, it just returns None.

如果我删除线程并只调用方法 store_movie_details(),它工作正常.

If I remove the threads and just call the method store_movie_details(), it works fine.

怎么回事?

推荐答案

首先,Session.add() 没有提及任何有关该方法返回值的内容,因此我认为它应该返回 None.

Firstly, the SA docs on Session.add() do not mention anything about the method's return value, so I would assume it is expected to return None.

其次,我认为您打算将 new_movie 添加到会话中,而不是 movies(**movie_details),无论是什么:)

Secondly, I think you meant to add new_movie to the session, not movies(**movie_details), whatever that is :)

第三,标准 Pyramid 会话(配置了 ZopeTransactionExtension 的会话)与 Pyramid 的请求-响应周期相关联,这可能会在您的情况下产生意外行为.您需要配置一个单独的会话,您需要在 store_movie_details 中手动提交该会话.此会话需要使用 scoped_session 所以会话对象是线程本地的,不跨线程共享.

Thirdly, the standard Pyramid session (the one configured with ZopeTransactionExtension) is tied to Pyramid's request-response cycle, which may produce unexpected behavior in your situation. You need to configure a separate session which you will need to commit manually in store_movie_details. This session needs to use scoped_session so the session object is thread-local and is not shared across threads.

from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker

session_factory = sessionmaker(bind=some_engine)
AsyncSession = scoped_session(session_factory)

def store_movie_details(movie_id):

    session = AsyncSession()
    movie_details = scrape_data(movie_id)
    new_movie = Movie(**movie_details) # Movie is my model.

    session.add(new_movie)
    session.commit()

当然,这种方法只适用于非常轻量级的任务,并且如果您不介意偶尔丢失任务(例如,当网络服务器重新启动时).对于任何更严重的问题,请按照 Antoine Leclair 的建议查看 Celery 等.

And, of course, this approach is only suitable for very light-weight tasks, and if you don't mind occasionally losing a task (when the webserver restarts, for example). For anything more serious have a look at Celery etc. as Antoine Leclair suggests.

这篇关于Pyramid:多线程数据库操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆