SQLAlchemy与辅助表连接行为之间的关系在延迟加载和急切加载之间发生了变化 [英] SQLAlchemy relationship with secondary table joining behaviour changes between lazy and eager loading

查看:95
本文介绍了SQLAlchemy与辅助表连接行为之间的关系在延迟加载和急切加载之间发生了变化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用SQL Alchemy几个月了,到目前为止,它给我留下了深刻的印象。

I've been playing with SQL Alchemy for a couple of months now and so far been really impressed with it.

我遇到了一个问题现在这似乎是一个错误,但是我不确定自己在做正确的事情。我们在这里使用MS SQL,通过表反射来定义表类,但是我可以使用内存SQLite数据库来复制问题,我在此处包含了代码。

There is one issue I've run into now that seems to be a bug, but I'm not sure that I'm doing the right thing. We use MS SQL here, with table reflection to define the table classes, however I can replicate the problem using an in-memory SQLite database, code for which I have included here.

如果我定义了与延迟加载的关系,那么

What I am doing is defining a many to many relationship between two tables using a linking table between them. There is one extra piece of information that the linking table contains which I want to use for filtering the links, requiring the use of a primaryjoin statement on the relationship. This works perfectly for lazy loading, however for performance reasons we need eager loading and thats where it all falls over.

>

If I define the relationship with lazy loading:

activefunds = relationship('Fund', secondary='fundbenchmarklink',
                           primaryjoin='and_(FundBenchmarkLink.isactive==True,'
                                       'Benchmark.id==FundBenchmarkLink.benchmarkid,'
                                       'Fund.id==FundBenchmarkLink.fundid)')

并正常查询数据库:

query = session.query(Benchmark)

我需要的行为正是我想要的,尽管由于额外的SQL性能确实很差在遍历所有基准及其各自的资金时查询。

The behaviour I need is exactly what I want, though performance is really bad, due to the extra SQL queries when iterating through all of the benchmarks and their respective funds.

如果我定义了急切加载的关系:

If I define the relationship with eager loading:

activefunds = relationship('Fund', secondary='fundbenchmarklink',
                           primaryjoin='and_(FundBenchmarkLink.isactive==True,'
                                       'Benchmark.id==FundBenchmarkLink.benchmarkid,'
                                       'Fund.id==FundBenchmarkLink.fundid)',
                           lazy='joined')

和正常查询数据库:

query = session.query(Benchmark)

它在我的脸上炸毁:

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such column: fund.id 
    [SQL: 'SELECT benchmark.id AS benchmark_id,
                   benchmark.name AS benchmark_name,
                   fund_1.id AS fund_1_id,
                   fund_1.name AS fund_1_name,
                   fund_2.id AS fund_2_id,
                   fund_2.name AS fund_2_name 
            FROM benchmark
            LEFT OUTER JOIN (fundbenchmarklink AS fundbenchmarklink_1
                             JOIN fund AS fund_1 ON fund_1.id = fundbenchmarklink_1.fundid) ON benchmark.id = fundbenchmarklink_1.benchmarkid
            LEFT OUTER JOIN (fundbenchmarklink AS fundbenchmarklink_2
                             JOIN fund AS fund_2 ON fund_2.id = fundbenchmarklink_2.fundid) ON fundbenchmarklink_2.isactive = 1
            AND benchmark.id = fundbenchmarklink_2.benchmarkid
            AND fund.id = fundbenchmarklink_2.fundid']

上面的SQL清楚地表明,在尝试从链接表访问列之前,未联接该链接表。

The SQL above clearly shows the linked table is not being joined before attempting to access columns from it.

如果我查询数据库,特别是加入链接表:

If I query the DB, specifically joining the linked table:

query = session.query(Benchmark).join(FundBenchmarkLink, Fund, isouter=True)

它有效,但是这意味着我现在必须确保每当查询基准表时,我总是必须定义联接以添加两个额外的表。

It works, however it means I now have to make sure that whenever I query the Benchmark table, I always have to define the join to add both of the extra tables.

我是否缺少某些东西,这是一个潜在的错误,还是仅仅是库的方式

Is there something I'm missing, is this a potential bug, or is it simply the way the library works?

完整的示例代码可以复制问题:

Full working sample code to replicate issue:

import logging

logging.basicConfig(level=logging.INFO)
logging.getLogger('sqlalchemy.engine.base').setLevel(logging.INFO)

from sqlalchemy import Column, DateTime, String, Integer, Boolean, ForeignKey, create_engine
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()


class FundBenchmarkLink(Base):
    __tablename__ = 'fundbenchmarklink'

    fundid = Column(Integer, ForeignKey('fund.id'), primary_key=True, autoincrement=False)
    benchmarkid = Column(Integer, ForeignKey('benchmark.id'), primary_key=True, autoincrement=False)
    isactive = Column(Boolean, nullable=False, default=True)

    fund = relationship('Fund')
    benchmark = relationship('Benchmark')

    def __repr__(self):
        return "<FundBenchmarkLink(fundid='{}', benchmarkid='{}', isactive='{}')>".format(self.fundid, self.benchmarkid, self.isactive)


class Benchmark(Base):
    __tablename__ = 'benchmark'

    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)

    funds = relationship('Fund', secondary='fundbenchmarklink', lazy='joined')

    # activefunds has additional filtering on the secondary table, requiring a primaryjoin statement.
    activefunds = relationship('Fund', secondary='fundbenchmarklink',
                               primaryjoin='and_(FundBenchmarkLink.isactive==True,'
                                           'Benchmark.id==FundBenchmarkLink.benchmarkid,'
                                           'Fund.id==FundBenchmarkLink.fundid)',
                               lazy='joined')

    def __repr__(self):
        return "<Benchmark(id='{}', name='{}')>".format(self.id, self.name)


class Fund(Base):
    __tablename__ = 'fund'

    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)

    def __repr__(self):
        return "<Fund(id='{}', name='{}')>".format(self.id, self.name)


if '__main__' == __name__:
    engine = create_engine('sqlite://')
    Base.metadata.create_all(engine)
    maker = sessionmaker(bind=engine)

    session = maker()

    # Create some data
    for bmkname in ['foo', 'bar', 'baz']:
        bmk = Benchmark(name=bmkname)
        session.add(bmk)

    for fname in ['fund1', 'fund2', 'fund3']:
        fnd = Fund(name=fname)
        session.add(fnd)

    session.add(FundBenchmarkLink(fundid=1, benchmarkid=1))
    session.add(FundBenchmarkLink(fundid=2, benchmarkid=1))
    session.add(FundBenchmarkLink(fundid=1, benchmarkid=2))
    session.add(FundBenchmarkLink(fundid=2, benchmarkid=2, isactive=False))

    session.commit()

    # This code snippet works when activefunds doesn't exist, or doesn't use eager loading
    # query = session.query(Benchmark)
    # print(query)

    # for bmk in query:
    #     print(bmk)
    #     for fund in bmk.funds:
    #         print('\t{}'.format(fund))

    # This code snippet works for activefunds with eager loading
    query = session.query(Benchmark).join(FundBenchmarkLink, Fund, isouter=True)
    print(query)

    for bmk in query:
        print(bmk)
        for fund in bmk.activefunds:
            print('\t{}'.format(fund))


推荐答案

我认为您混用了主连接二级联接。您的主要对象目前似乎包含两者。删除Fund的谓词,它应该起作用:

I think you've mixed the primary join and the secondary join a bit. Your primary would seem to contain both at the moment. Remove the predicate for Fund and it should work:

activefunds = relationship(
    'Fund',
    secondary='fundbenchmarklink',
    primaryjoin='and_(FundBenchmarkLink.isactive==True,'
                'Benchmark.id==FundBenchmarkLink.benchmarkid)',
    lazy='joined')

显式联接似乎可以解决查询的原因是它在隐性渴望之前引入了表基金正在加载联接,因此他们可以引用它。这并不是真正的解决方法,而是隐藏了错误。如果您确实想使用显式的 Query.join()进行预先加载,请通过 contains_eager() 。请注意要选择包含哪个关系,具体取决于相关查询;无需其他过滤,您也可以将 activefunds 填充为无效。

The reason why your explicit join seems to fix the query is that it introduces the table fund before the implicit eager loading joins and so they can refer to it. It's not really a fix, rather than it hides the error. If you really want to use explicit Query.join() with eagerloading, inform the query about it with contains_eager(). Just be careful which relationship you choose as being contained, depending on the query in question; without additional filtering you could fill activefunds with inactive also.

最后,考虑使用 Query.outerjoin() 而不是 Query.join(...,isouter = True)

这篇关于SQLAlchemy与辅助表连接行为之间的关系在延迟加载和急切加载之间发生了变化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆