Pyramid:多线程数据库操作 [英] Pyramid: Multi-threaded Database Operation
问题描述
我的应用程序从用户处接收一个或多个 URL(通常为 3-4 个 URL),从这些 URL 中抓取某些数据并将这些数据写入数据库.然而,因为抓取这些数据需要一点时间,我正在考虑在一个单独的线程中运行每个抓取,以便抓取 + 写入数据库可以在后台继续进行,这样用户就不必继续了等待.
My application receives one or more URLs (typically 3-4 URLs) from the user, scrapes certain data from those URLs and writes those data to the database. However, because scraping those data take a little while, I was thinking of running each of those scraping in a separate thread so that the scraping + writing to the database can keep on going in the background so that the user does not have to keep on waiting.
为了实现这一点,我有(仅相关部分):
To implement that, I have (relevant parts only):
@view_config(route_name="add_movie", renderer="templates/add_movie.jinja2")
def add_movie(request):
post_data = request.POST
if "movies" in post_data:
movies = post_data["movies"].split(os.linesep)
for movie_id in movies:
movie_thread = Thread(target=store_movie_details, args=(movie_id,))
movie_thread.start()
return {}
def store_movie_details(movie_id):
movie_details = scrape_data(movie_id)
new_movie = Movie(**movie_details) # Movie is my model.
print new_movie # Works fine.
print DBSession.add(movies(**movie_details)) # Returns None.
虽然 new_movie
行确实打印了正确的报废数据,但 DBSession.add()
不起作用.事实上,它只是返回None
.
While the line new_movie
does print correct scrapped data, DBSession.add()
doesn't work. In fact, it just returns None
.
如果我删除线程并只调用方法 store_movie_details()
,它工作正常.
If I remove the threads and just call the method store_movie_details()
, it works fine.
怎么回事?
推荐答案
首先,Session.add() 没有提及任何有关该方法返回值的内容,因此我认为它应该返回 None
.
Firstly, the SA docs on Session.add() do not mention anything about the method's return value, so I would assume it is expected to return None
.
其次,我认为您打算将 new_movie
添加到会话中,而不是 movies(**movie_details)
,无论是什么:)
Secondly, I think you meant to add new_movie
to the session, not movies(**movie_details)
, whatever that is :)
第三,标准 Pyramid 会话(配置了 ZopeTransactionExtension 的会话)与 Pyramid 的请求-响应周期相关联,这可能会在您的情况下产生意外行为.您需要配置一个单独的会话,您需要在 store_movie_details
中手动提交该会话.此会话需要使用 scoped_session 所以会话对象是线程本地的,不跨线程共享.
Thirdly, the standard Pyramid session (the one configured with ZopeTransactionExtension) is tied to Pyramid's request-response cycle, which may produce unexpected behavior in your situation. You need to configure a separate session which you will need to commit manually in store_movie_details
. This session needs to use scoped_session so the session object is thread-local and is not shared across threads.
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
session_factory = sessionmaker(bind=some_engine)
AsyncSession = scoped_session(session_factory)
def store_movie_details(movie_id):
session = AsyncSession()
movie_details = scrape_data(movie_id)
new_movie = Movie(**movie_details) # Movie is my model.
session.add(new_movie)
session.commit()
当然,这种方法只适用于非常轻量级的任务,并且如果您不介意偶尔丢失任务(例如,当网络服务器重新启动时).对于任何更严重的问题,请按照 Antoine Leclair 的建议查看 Celery 等.
And, of course, this approach is only suitable for very light-weight tasks, and if you don't mind occasionally losing a task (when the webserver restarts, for example). For anything more serious have a look at Celery etc. as Antoine Leclair suggests.
这篇关于Pyramid:多线程数据库操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!