PonyORM:在不知道哪些项目已经存在的情况下向 pony 数据库添加新项目的最有效方法是什么? [英] PonyORM: What is the most efficient way to add new items to a pony database without knowing which items already exist?

查看:43
本文介绍了PonyORM:在不知道哪些项目已经存在的情况下向 pony 数据库添加新项目的最有效方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果这是一个明显的问题,请原谅我,但我是小马和数据库的新手,并且没有找到回答这个问题的文档的正确部分.

Forgive me if this is an obvious question but I'm new to pony and databases in general and didn't find the right part of the documentation that answers this question.

我正在尝试创建一个包含公司和这些公司设有办事处的位置的数据库.这是一种多对多关系,因为每家公司都位于多个位置,并且每个位置都可以托管多个公司.我这样定义我的实体:

I'm trying to create a database with companies and the locations where those companies have offices. This is a many-to-many relationship since each company is in multiple locations and each location can be host to multiple companies. I'm defining my entities as such:

from pony import orm

class Company(db.Entity):
    '''A company entry in database'''
    name = orm.PrimaryKey(str)
    locations = orm.Set('Location')

class Location(db.Entity):
    '''A location for a company'''
    name = orm.PrimaryKey(str)
    companies = orm.Set('Company')

理想情况下,我希望能够编写一个函数,将公司添加到数据库中,同时添加该公司所在位置的列表,同时确保添加新的位置实例(如果它们尚不存在).我很快就想到了两种方法.

Ideally, I'd like to be able to write a function that adds a company to the database while also adding the list of locations where that company exists while also being sure to add new location instances if they don't already exist. I can quickly think of two ways to do so.

首先尝试输入该位置,即使它存在并处理异常:

First would be to try to enter the location even if it exists and handle the exception:

@orm.db_session
def add_company(name, locations):
    loc_entities = []
    for l in locations:
        try:
            loc = Location[l]
        except orm.core.ObjectNotFound:
            loc = Location(name=l)
        else:
            loc_entities.append(loc)
    comp = Company(name=name, locations=loc_entities)

第二种是查询数据库并询问位置是否存在:

Second would be to query the database and ask whether the locations exist yet:

@orm.db_session
def add_company2(name, locations):
    old_loc_entities = orm.select(l for l in Location if l.name in locations)[:]
    old_locations = [l.name for l in old_loc_entities]
    new_locations = set(locations) - (set(locations) & set(old_locations))
    loc_entities = [Location(name=l) for l in new_locations] + old_loc_entities
    comp = Company(name=name, locations=loc_entities)

在这两个中,我猜想更 Pythonic 的方法是简单地处理异常,但这会遇到 N+1 问题吗?我注意到通过使用名称作为主键,我每次使用索引访问实体时都会进行查询.当我让 pony 选择顺序 ID 时,我似乎不需要查询.我还没有用任何大型数据集对此进行测试,所以我还没有进行基准测试.

Of these two, I'd guess that the more pythonic way to do it would be to simply handle the exception but does this run into the N+1 problem? I'm noticing that by using the name as a primary key, I'm making a query every time I access the entity using an index. When I just let pony pick sequential ids, I don't seem to need to query. I haven't tested this with any large datasets yet so I haven't benchmarked yet.

推荐答案

我注意到,通过使用名称作为主键,我每次使用索引访问实体时都会进行查询.当我只是让 pony 选择顺序 id 时,我似乎不需要查询.

I'm noticing that by using the name as a primary key, I'm making a query every time I access the entity using an index. When I just let pony pick sequential ids, I don't seem to need to query.

Pony 内部以与字符串主键相同的方式缓存顺序主键,所以我认为应该没有区别.每个 db_session 都有单独的缓存(称为身份映射").读取对象后,在同一 db_session 内通过主键(或任何其他唯一键)进行的任何访问都应直接从身份映射返回相同的对象,而无需发出新查询.db_session 结束后,同一键的另一个访问将发出一个新的查询,因为数据库中的对象可能会被并发事务修改.

Internally Pony caches sequential primary keys in the same way as a string primary keys, so I think there should be no difference. Each db_session have separate cache (which is called "identity map"). After an object is read, any access by primary key (or any other unique key) within the same db_session should return the same object directly from the identity map without issuing a new query. After the db_session is over, another access by the same key will issue a new query, because the object could be modified in the database by a concurrent transaction.

关于你的方法,我认为它们都是有效的.如果一家公司只有几个位置(比如大约十个),我会使用第一种方法,因为对我来说它感觉更像 Python.确实是导致N+1查询,但是通过主键检索对象的查询非常快速且易于服务器执行.使用 get 方法可以更紧凑地表达代码:

Regarding your approaches, I think both of them are valid. If a company have just a few location (say, around ten), I'd use the first approach, because it feels more pythonic to me. It is indeed causes N+1 query, but a query which retrieves an object by a primary key is very fast and easy to the server to execute. The code can be expressed a little more compact by using a get method:

@orm.db_session
def add_company(name, locations):
    loc_entities = [Location.get(name=l) or Location(name=l)
                    for l in locations]
    comp = Company(name=name, locations=loc_entities)

使用单个查询检索所有现有位置的第二种方法对我来说似乎是一种过早的优化,但如果您每秒创建数百家公司,并且每家公司有数百个位置,则可能会使用它.

The second approach of retrieving all existing locations with a single query looks like a premature optimization to me, but if you create hundreds a companies per second, and each company has hundreds of locations, it may be used.

这篇关于PonyORM:在不知道哪些项目已经存在的情况下向 pony 数据库添加新项目的最有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆