SQLAlchemy:使用ORM扫描大型表吗? [英] SQLAlchemy: Scan huge tables using ORM?

查看：103 发布时间：2020/5/22 18:54:24 python performance orm sqlalchemy

本文介绍了SQLAlchemy:使用ORM扫描大型表吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在使用SQLAlchemy，这确实很整洁.

I am currently playing around with SQLAlchemy a bit, which is really quite neat.

为了进行测试，我创建了一个巨大的表，其中包含我的图片存档，并通过SHA1哈希索引(以消除重复的:-).令人印象深刻的是...

For testing I created a huge table containing my pictures archive, indexed by SHA1 hashes (to remove duplicates :-)). Which was impressingly fast...

出于娱乐目的，我对生成的SQLite数据库进行了select *的等效操作:

For fun I did the equivalent of a select * over the resulting SQLite database:

session = Session()
for p in session.query(Picture):
    print(p)

我期望哈希值会滚动，但是它只是继续扫描磁盘.同时，内存使用量猛增，几秒钟后达到1GB.这似乎来自SQLAlchemy的身份映射功能，我认为该功能仅保留弱引用.

I expected to see hashes scrolling by, but instead it just kept scanning the disk. At the same time, memory usage was skyrocketing, reaching 1GB after a few seconds. This seems to come from the identity map feature of SQLAlchemy, which I thought was only keeping weak references.

有人可以向我解释吗?我以为散列写完后会收集每个图片p！?

Can somebody explain this to me? I thought that each Picture p would be collected after the hash is written out!?

推荐答案

好吧，我刚刚找到了一种自己做的方法.将代码更改为

Okay, I just found a way to do this myself. Changing the code to

session = Session()
for p in session.query(Picture).yield_per(5):
    print(p)

一次仅加载5张图片.默认情况下，查询似乎一次加载所有行.但是，我还不了解该方法的免责声明.引用 SQLAlchemy文档

loads only 5 pictures at a time. It seems like the query will load all rows at a time by default. However, I don't yet understand the disclaimer on that method. Quote from SQLAlchemy docs

警告:请谨慎使用此方法；如果同一实例存在于多于一行的行中，则最终用户对属性的更改将被覆盖. 特别是，通常无法将此设置用于急切加载的集合(即任何lazy = False)，因为在后续结果批处理中遇到这些集合时，这些集合将被清除以进行新的加载.

WARNING: use this method with caution; if the same instance is present in more than one batch of rows, end-user changes to attributes will be overwritten. In particular, it’s usually impossible to use this setting with eagerly loaded collections (i.e. any lazy=False) since those collections will be cleared for a new load when encountered in a subsequent result batch.

因此，如果实际上使用yield_per是使用ORM扫描大量SQL数据的正确方法(tm)，那么何时安全使用它?

So if using yield_per is actually the right way (tm) to scan over copious amounts of SQL data while using the ORM, when is it safe to use it?

这篇关于SQLAlchemy:使用ORM扫描大型表吗?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

SQLAlchemy:使用ORM扫描大型表吗? [英] SQLAlchemy: Scan huge tables using ORM?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

SQLAlchemy:使用ORM扫描大型表吗? [英] SQLAlchemy: Scan huge tables using ORM?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭