用对象的关系完成对象并避免在sqlalchemy中进行不必要的查询 [英] Completing object with its relations and avoiding unnecessary queries in sqlalchemy

查看:85
本文介绍了用对象的关系完成对象并避免在sqlalchemy中进行不必要的查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据库结构;由于大多数内容与我们无关,因此我仅介绍一些相关内容。让我们以Lake对象为例:

I have some database structure; as most of it is irrelevant for us, i'll describe just some relevant pieces. Let's lake Item object as example:

items_table = Table("invtypes", gdata_meta,
                    Column("typeID", Integer, primary_key = True),
                    Column("typeName", String, index=True),
                    Column("marketGroupID", Integer, ForeignKey("invmarketgroups.marketGroupID")),
                    Column("groupID", Integer, ForeignKey("invgroups.groupID"), index=True))

mapper(Item, items_table,
       properties = {"group" : relation(Group, backref = "items"),
                     "_Item__attributes" : relation(Attribute, collection_class = attribute_mapped_collection('name')),
                     "effects" : relation(Effect, collection_class = attribute_mapped_collection('name')),
                     "metaGroup" : relation(MetaType,
                                            primaryjoin = metatypes_table.c.typeID == items_table.c.typeID,
                                            uselist = False),
                     "ID" : synonym("typeID"),
                     "name" : synonym("typeName")})

我想在sqlalchemy /数据库层中实现一些性能改进,并且有一些想法:
1)两次请求相同的项目:

I want to achieve some performance improvements in the sqlalchemy/database layer, and have couple of ideas: 1) Requesting the same item twice:

item = session.query(Item).get(11184)
item = None (reference to item is lost, object is garbage collected)
item = session.query(Item).get(11184)

每个请求都会生成并发出SQL查询。为了避免这种情况,我为项目对象使用了2个自定义地图:

Each request generates and issues SQL query. To avoid it, i use 2 custom maps for an item object:

itemMapId = {}
itemMapName = {}

@cachedQuery(1, "lookfor")
def getItem(lookfor, eager=None):
    if isinstance(lookfor, (int, float)):
        id = int(lookfor)
        if eager is None and id in itemMapId:
            item = itemMapId[id]
        else:
            item = session.query(Item).options(*processEager(eager)).get(id)
            itemMapId[item.ID] = item
            itemMapName[item.name] = item
    elif isinstance(lookfor, basestring):
        if eager is None and lookfor in itemMapName:
            item = itemMapName[lookfor]
        else:
            # Items have unique names, so we can fetch just first result w/o ensuring its uniqueness
            item = session.query(Item).options(*processEager(eager)).filter(Item.name == lookfor).first()
            itemMapId[item.ID] = item
            itemMapName[item.name] = item
    return item

我相信sqlalchemy至少通过主键(item.ID)进行了类似的对象跟踪。如果可以,我可以擦除两个映射(尽管擦除名称映射将需要对使用这些查询的应用程序进行较小的修改)以不重复功能并使用备用方法。实际的问题是:sqlalchemy中是否有这样的功能,如何访问它?

I believe sqlalchemy does similar object tracking, at least by primary key (item.ID). If it does, i can wipe both maps (although wiping name map will require minor modifications to application which uses these queries) to not duplicate functionality and use stock methods. Actual question is: if there's such functionality in sqlalchemy, how to access it?

2)提前加载关系通常有助于将大量请求保存到数据库中。说,我肯定需要下列item = Item()属性集:

2) Eager loading of relationships often helps to save alot of requests to database. Say, i'll definitely need following set of item=Item() properties:

item.group (Group object, according to groupID of our item)
item.group.items (fetch all items from items list of our group)
item.group.items.metaGroup (metaGroup object/relation for every item in the list)

如果我有一些项目ID并且尚未加载任何项目,我可以从数据库中请求它,并热切加载我需要的所有内容:sqlalchemy将在单个查询中加入组,其项目和相应的metaGroups 。如果我使用默认的延迟加载访问它们,则sqlalchemy将需要发出1个查询以获取一个项目+ 1以获取组+ 1 *#items用于列表中的所有项目+ 1 *#items以获取每个项目的metaGroup,

If i have some item ID and no item is loaded yet, i can request it from the database, eagerly loading everything i need: sqlalchemy will join group, its items and corresponding metaGroups within single query. If i'd access them with default lazy loading, sqlalchemy would need to issue 1 query to grab an item + 1 to get group + 1*#items for all items in the list + 1*#items to get metaGroup of each item, which is wasteful.

2.1)但是,如果我已经获取了Item对象,并且已经加载了一些我想加载的属性,该怎么办?据我了解,当我从数据库中重新获取某些对象时-它已经加载的关系不会被卸载,我正确吗?

2.1) But what if i already have Item object fetched, and some of the properties which i want to load are already loaded? As far as i understand, when i re-fetch some object from the database - its already loaded relations do not become unloaded, am i correct?

2.2)如果我有获取了Item对象,并想访问其组,我可以使用item.groupID来获取getGroup,并应用我需要的所有急切语句( items和 items.metaGroup)。它应该正确地加载组及其请求的关系,而不会涉及项目内容。 sqlalchemy会正确地将此获取的组映射到item.group,以便在我访问item.group时不会从基础数据库获取任何内容吗?

2.2) If i have Item object fetched, and want to access its group, i can just getGroup using item.groupID, applying any eager statements i'll need ("items" and "items.metaGroup"). It should properly load group and its requested relations w/o touching item stuff. Will sqlalchemy properly map this fetched group to item.group, so that when i access item.group it won't fetch anything from the underlying database?

2.3)如果我从数据库中获取了以下内容:原始项目,item.group和来自item.group.items的项目的某些部分列出了其中一些可能已加载metaGroup的方法,这是与渴望完成数据结构相同的最佳策略上面的列表:重新加载( items, items.metaGroup)渴望加载的组,或者单独检查项目列表中的每个项目,以及是否未加载项目或其metaGroup-加载它们?这似乎取决于情况,因为如果所有内容都已在一段时间前加载-发出如此繁重的查询毫无意义。 sqlalchemy是否提供一种跟踪是否加载了某些对象关系的方法,并且能够看起来比一个层次更深?

2.3) If i have following things fetched from the database: original item, item.group and some portion of the items from the item.group.items list some of which may have metaGroup loaded, what would be best strategy for completing data structure to the same as eager list above: re-fetch group with ("items", "items.metaGroup") eager load, or check each item from items list individually, and if item or its metaGroup is not loaded - load them? It seems to depend on the situation, because if everything has already been loaded some time ago - issuing such heavy query is pointless. Does sqlalchemy provide a way to track if some object relation is loaded, with the ability to look deeper than just one level?

作为2.3的说明,我可以获取组ID为83,急切地获取 items和 items.metaGroup。有没有一种方法可以使用sqlalchemy工具从项目(组ID为83)确定是否加载了 group, group.items和 group.items.metaGroup(在这种情况下,全部

As an illustration to 2.3 - i can fetch group with ID 83, eagerly fetching "items" and "items.metaGroup". Is there a way to determine from an item (which has groupID of an 83), does it have "group", "group.items" and "group.items.metaGroup" loaded or not, using sqlalchemy tools (in this case all of them should be loaded)?

推荐答案

要强制加载惰性属性,只需访问它们即可。这是最简单的方法,适用于关系,但是对于 Column s效率不高(对于同一表中的每个列,您将获得单独的SQL查询)。您可以从 sqlalchemy.orm.attributes.instance_state(obj).unloaded

To force loading lazy attributes just access them. This the simplest way and it works fine for relations, but is not as efficient for Columns (you will get separate SQL query for each column in the same table). You can get a list of all unloaded properties (both relations and columns) from sqlalchemy.orm.attributes.instance_state(obj).unloaded.

您在示例中不使用延迟列,但是为了完整起见,我将在这里描述它们。处理延迟列的典型方案如下:

You don't use deferred columns in your example, but I'll describe them here for completeness. The typical scenario for handling deferred columns is the following:


  • 使用 deferred()。通过使用 group 参数将 deferred()组合成一个或几个组。

  • 在需要时在查询中使用 undefer() undefer_group()选项。

  • 访问放入组中的延迟列将加载该组中的所有列。

  • Decorate selected columns with deferred(). Combine them into one or several groups by using group parameter to deferred().
  • Use undefer() and undefer_group() options in query when desired.
  • Accessing deferred column put in group will load all columns in this group.

不幸的是,这无法正常进行:您可以将列合并为组,而默认情况下不使用 column_property(Column(...),group = ...),但 defer()选项不会影响它们(它仅对 Column s有效,而不适用于列属性,至少在0.6.7中有效)。

Unfortunately this doesn't work reverse: you can combine columns into groups without deferring loading of them by default with column_property(Column(…), group=…), but defer() option won't affect them (it works for Columns only, not column properties, at least in 0.6.7).

强制装入延迟的列属性 session.refresh(obj,attribute_names =…)可能是最好的解决方案。我看到的唯一缺点是它首先使属性失效,因此您必须确保在传递为 attribute_names 参数的传递的属性中没有加载的属性(例如,通过与 state.unloaded )。

To force loading deferred column properties session.refresh(obj, attribute_names=…) suggested by Nathan Villaescusa is probably the best solution. The only disadvantage I see is that it expires attributes first so you have to insure there is not loaded attributes among passed as attribute_names argument (e.g. by using intersection with state.unloaded).

更新

1)SQLAlchemy会跟踪加载的对象。这就是ORM的工作方式:会话中每个标识必须只有一个对象。默认情况下,它的内部缓存是弱的(使用 weak_identity_map = False 进行更改),因此一旦代码中没有对其进行引用,该对象就会从缓存中删除。 。当会话中已有对象时,SQLAlchemy不会对 query.get(pk)进行SQL请求。但这仅适用于 get()方法,因此 query.filter_by(id = pk).first()

1) SQLAlchemy does track loaded objects. That's how ORM works: there must be the only object in the session for each identity. Its internal cache is weak by default (use weak_identity_map=False to change this), so the object is expunged from the cache as soon as there in no reference to it in your code. SQLAlchemy won't do SQL request for query.get(pk) when object is already in the session. But this works for get() method only, so query.filter_by(id=pk).first() will do SQL request and refresh object in the session with loaded data.

2)关系的较早加载将导致较少的请求,但并不总是更快。您必须检查数据库和数据。

2) Eager loading of relations will lead to fewer requests, but it's not always faster. You have to check this for your database and data.

2.1)从数据库中重新获取数据不会卸载通过关系绑定的对象。

2.1) Refetching data from database won't unload objects bound via relations.

2.2) item.group 是使用 query.get()方法加载的,因此如果对象已经在会话中,则不会导致SQL请求。

2.2) item.group is loaded using query.get() method, so there won't lead to SQL request if object is already in the session.

2.3)是的,这取决于情况。在大多数情况下,最好是希望SQLAlchemy使用正确的策略:)。对于已经加载的关系,可以检查是否通过 state.unloaded 加载了相关对象的关系,然后递归到任何深度。但是,当尚未加载关联时,您将无法知道相关对象及其关系是否已经加载:即使尚未加载关联,相关对象也可能已经在会话中(假设您请求第一项,加载其组,然后请求具有相同组的其他项目)。对于您的特定示例,我认为递归检查 state.unloaded 没问题。

2.3) Yes, it depends on situation. For most cases it's the best is to hope SQLAlchemy will use the right strategy :). For already loaded relation you can check if related objects' relations are loaded via state.unloaded and so recursively to any depth. But when relation is not loaded yet you can't get know whether related objects and their relations are already loaded: even when relation is not yet loaded the related object[s] might be already in the session (just imagine you request first item, load its group and then request other item that has the same group). For your particular example I see no problem to just check state.unloaded recursively.

这篇关于用对象的关系完成对象并避免在sqlalchemy中进行不必要的查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆