可以使用 SQLAlchemy 事件更新非规范化数据缓存吗? [英] Can SQLAlchemy events be used to update a denormalized data cache?

查看:22
本文介绍了可以使用 SQLAlchemy 事件更新非规范化数据缓存吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于性能原因,我有一个非规范化数据库,其中一些表包含从其他表中的许多行聚合的数据.我想通过使用 SQLAlchemy events 来维护这个非规范化的数据缓存.例如,假设我正在编写论坛软件,并且希望每个 Thread 都有一个列跟踪该线程中所有评论的组合字数,以便有效地显示该信息:

For performance reasons, I've got a denormalized database where some tables contain data which has been aggregated from many rows in other tables. I'd like to maintain this denormalized data cache by using SQLAlchemy events. As an example, suppose I was writing forum software and wanted each Thread to have a column tracking the combined word count of all comments in the thread in order to efficiently display that information:

class Thread(Base):
    id = Column(UUID, primary_key=True, default=uuid.uuid4)
    title = Column(UnicodeText(), nullable=False)
    word_count = Column(Integer, nullable=False, default=0)

class Comment(Base):
    id = Column(UUID, primary_key=True, default=uuid.uuid4)
    thread_id = Column(UUID, ForeignKey('thread.id', ondelete='CASCADE'), nullable=False)
    thread = relationship('Thread', backref='comments')
    message = Column(UnicodeText(), nullable=False)
    
    @property
    def word_count(self):
        return len(self.message.split())

所以每次插入评论时(为了简单起见,我们假设评论永远不会被编辑或删除),我们想要更新关联的 Thread 上的 word_count 属性代码>对象.所以我想做类似的事情

So every time a comment is inserted (for the sake of simplicity let's say that comments are never edited or deleted), we want to update the word_count attribute on the associated Thread object. So I'd want to do something like

def after_insert(mapper, connection, target):
    thread = target.thread
    thread.word_count = sum(c.word_count for c in thread.comments)
    print("updated cached word count to", thread.word_count)

event.listen(Comment, "after_insert", after_insert)

因此,当我插入 Comment 时,我可以看到事件触发并看到它正确计算了字数,但该更改未保存到 Thread数据库中的行.我在 after_insert 文档,尽管我确实在其他一些文档中看到了一些警告,例如 after_delete.

So when I insert a Comment, I can see the event firing and see that it has correctly calculated the word count, but that change is not saved to the Thread row in the database. I don't see any caveats about updated other tables in the after_insert documentation, though I do see some caveats in some of the others, such as after_delete.

那么是否有一种受支持的方法可以使用 SQLAlchemy 事件来做到这一点?我已经将 SQLAlchemy 事件用于许多其他事情,所以我想以这种方式做所有事情,而不必编写数据库触发器.

So is there a supported way to do this with SQLAlchemy events? I'm already using SQLAlchemy events for lots of other things, so I'd like to do everything that way instead of having to write database triggers.

推荐答案

after_insert() 事件是执行此操作的一种方法,您可能会注意到它传递了一个 SQLAlchemy Connection 对象,而不是Session 与其他与刷新相关的事件一样.映射器级刷新事件通常用于直接在给定的 Connection 上调用 SQL:

the after_insert() event is one way to do this, and you might notice it is passed a SQLAlchemy Connection object, instead of a Session as is the case with other flush related events. The mapper-level flush events are intended to be used normally to invoke SQL directly on the given Connection:

@event.listens_for(Comment, "after_insert")
def after_insert(mapper, connection, target):
    thread_table = Thread.__table__
    thread = target.thread
    connection.execute(
            thread_table.update().
             where(thread_table.c.id==thread.id).
             values(word_count=sum(c.word_count for c in thread.comments))
    )
    print "updated cached word count to", thread.word_count

这里值得注意的是,直接调用 UPDATE 语句也比再次通过整个工作单元运行该属性更改的性能要高得多.

what is notable here is that invoking an UPDATE statement directly is also a lot more performant than running that attribute change through the whole unit of work process again.

然而,这里并不真正需要像 after_insert() 这样的事件,因为我们甚至在刷新发生之前就知道word_count"的值.我们实际上知道它是因为 Comment 和 Thread 对象彼此关联,我们也可以使用属性事件在内存中始终保持 Thread.word_count 完全新鲜:

However, an event like after_insert() isn't really needed here, as we know the value of "word_count" before the flush even happens. We actually know it as Comment and Thread objects are associated with each other, and we could just as well keep Thread.word_count completely fresh in memory at all times using attribute events:

def _word_count(msg):
    return len(msg.split())

@event.listens_for(Comment.message, "set")
def set(target, value, oldvalue, initiator):
    if target.thread is not None:
        target.thread.word_count += (_word_count(value) - _word_count(oldvalue))

@event.listens_for(Comment.thread, "set")
def set(target, value, oldvalue, initiator):
    # the new Thread, if any
    if value is not None:
        value.word_count += _word_count(target.message)

    # the old Thread, if any
    if oldvalue is not None:
        oldvalue.word_count -= _word_count(target.message)

这种方法的一大优点是也不需要遍历 thread.comments,这对于卸载的集合意味着发出另一个 SELECT.

the great advantage of this method is that there's also no need to iterate through thread.comments, which for an unloaded collection means another SELECT is emitted.

还有一种方法是在 before_flush() 中进行.下面是一个快速而肮脏的版本,可以对其进行细化以更仔细地分析发生了什么变化,以确定是否需要更新 word_count:

still another method is to do it in before_flush(). Below is a quick and dirty version, which can be refined to more carefully analyze what has changed in order to determine if the word_count needs to be updated or not:

@event.listens_for(Session, "before_flush")
def before_flush(session, flush_context, instances):
    for obj in session.new | session.dirty:
        if isinstance(obj, Thread):
            obj.word_count = sum(c.word_count for c in obj.comments)
        elif isinstance(obj, Comment):
            obj.thread.word_count = sum(c.word_count for c in obj.comments)

我会使用属性事件方法,因为它是最高效和最新的.

I'd go with the attribute event method as it is the most performant and up-to-date.

这篇关于可以使用 SQLAlchemy 事件更新非规范化数据缓存吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆