SQLAlchemy:使用ORM扫描大型表吗? [英] SQLAlchemy: Scan huge tables using ORM?

查看:103
本文介绍了SQLAlchemy:使用ORM扫描大型表吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用SQLAlchemy,这确实很整洁.

I am currently playing around with SQLAlchemy a bit, which is really quite neat.

为了进行测试,我创建了一个巨大的表,其中包含我的图片存档,并通过SHA1哈希索引(以消除重复的:-).令人印象深刻的是...

For testing I created a huge table containing my pictures archive, indexed by SHA1 hashes (to remove duplicates :-)). Which was impressingly fast...

出于娱乐目的,我对生成的SQLite数据库进行了select *的等效操作:

For fun I did the equivalent of a select * over the resulting SQLite database:

session = Session()
for p in session.query(Picture):
    print(p)

我期望哈希值会滚动,但是它只是继续扫描磁盘.同时,内存使用量猛增,几秒钟后达到1GB.这似乎来自SQLAlchemy的身份映射功能,我认为该功能仅保留弱引用.

I expected to see hashes scrolling by, but instead it just kept scanning the disk. At the same time, memory usage was skyrocketing, reaching 1GB after a few seconds. This seems to come from the identity map feature of SQLAlchemy, which I thought was only keeping weak references.

有人可以向我解释吗?我以为散列写完后会收集每个图片p!?

Can somebody explain this to me? I thought that each Picture p would be collected after the hash is written out!?

推荐答案

好吧,我刚刚找到了一种自己做的方法.将代码更改为

Okay, I just found a way to do this myself. Changing the code to

session = Session()
for p in session.query(Picture).yield_per(5):
    print(p)

一次仅加载5张图片.默认情况下,查询似乎一次加载所有行.但是,我还不了解该方法的免责声明.引用 SQLAlchemy文档

loads only 5 pictures at a time. It seems like the query will load all rows at a time by default. However, I don't yet understand the disclaimer on that method. Quote from SQLAlchemy docs

警告:请谨慎使用此方法;如果同一实例存在于多于一行的行中,则最终用户对属性的更改将被覆盖. 特别是,通常无法将此设置用于急切加载的集合(即任何lazy = False),因为在后续结果批处理中遇到这些集合时,这些集合将被清除以进行新的加载.

WARNING: use this method with caution; if the same instance is present in more than one batch of rows, end-user changes to attributes will be overwritten. In particular, it’s usually impossible to use this setting with eagerly loaded collections (i.e. any lazy=False) since those collections will be cleared for a new load when encountered in a subsequent result batch.

因此,如果实际上使用yield_per使用ORM扫描大量SQL数据的正确方法(tm),那么何时安全使用它?

So if using yield_per is actually the right way (tm) to scan over copious amounts of SQL data while using the ORM, when is it safe to use it?

这篇关于SQLAlchemy:使用ORM扫描大型表吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆