SQLAlchemy:使用 ORM 扫描大表? [英] SQLAlchemy: Scan huge tables using ORM?

查看：25 发布时间：2021/12/30 14:26:36 python performance orm sqlalchemy

本文介绍了SQLAlchemy:使用 ORM 扫描大表?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在尝试使用 SQLAlchemy，这真的很不错.

I am currently playing around with SQLAlchemy a bit, which is really quite neat.

为了测试，我创建了一个包含我的图片存档的大表，由 SHA1 哈希索引(以删除重复项:-)).这是令人印象深刻的快...

For testing I created a huge table containing my pictures archive, indexed by SHA1 hashes (to remove duplicates :-)). Which was impressingly fast...

为了好玩，我对生成的 SQLite 数据库做了相当于 select * 的操作:

For fun I did the equivalent of a select * over the resulting SQLite database:

session = Session()
for p in session.query(Picture):
    print(p)

我希望看到散列滚动，但它只是继续扫描磁盘.与此同时，内存使用量暴涨，几秒钟后达到1GB.这似乎来自 SQLAlchemy 的身份映射功能，我认为它只是保留弱引用.

I expected to see hashes scrolling by, but instead it just kept scanning the disk. At the same time, memory usage was skyrocketing, reaching 1GB after a few seconds. This seems to come from the identity map feature of SQLAlchemy, which I thought was only keeping weak references.

有人可以向我解释一下吗?还以为写出hash后会收集每张图片p！?

Can somebody explain this to me? I thought that each Picture p would be collected after the hash is written out!?

推荐答案

好的，我刚刚找到了自己解决这个问题的方法.将代码更改为

Okay, I just found a way to do this myself. Changing the code to

session = Session()
for p in session.query(Picture).yield_per(5):
    print(p)

一次只加载 5 张图片.默认情况下，查询似乎会一次加载所有行.但是，我还不明白该方法的免责声明.引用自 SQLAlchemy 文档

loads only 5 pictures at a time. It seems like the query will load all rows at a time by default. However, I don't yet understand the disclaimer on that method. Quote from SQLAlchemy docs

警告:谨慎使用此方法；如果同一实例存在于多个行中，最终用户对属性的更改将被覆盖.特别是，通常不可能将此设置用于急切加载的集合(即 any lazy=False)，因为在后续结果批次中遇到这些集合时，这些集合将被清除以进行新的加载.

WARNING: use this method with caution; if the same instance is present in more than one batch of rows, end-user changes to attributes will be overwritten. In particular, it’s usually impossible to use this setting with eagerly loaded collections (i.e. any lazy=False) since those collections will be cleared for a new load when encountered in a subsequent result batch.

因此，如果使用 yield_per 实际上是 在使用 ORM 时扫描大量 SQL 数据的正确方法 (tm)，那么何时使用它是安全的?

So if using yield_per is actually the right way (tm) to scan over copious amounts of SQL data while using the ORM, when is it safe to use it?

这篇关于SQLAlchemy:使用 ORM 扫描大表?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

SQLAlchemy:使用 ORM 扫描大表? [英] SQLAlchemy: Scan huge tables using ORM?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

SQLAlchemy:使用 ORM 扫描大表? [英] SQLAlchemy: Scan huge tables using ORM?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭