如何使用SQLAlchemy将SQL文件从查询表达式中转储到DBMS中? [英] How to use SQLAlchemy to dump an SQL file from query expressions to bulk-insert into a DBMS?

查看:170
本文介绍了如何使用SQLAlchemy将SQL文件从查询表达式中转储到DBMS中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请在我解释问题,尝试解决问题的方式时,与我保持一致, 最后,我的问题是如何改进它.

Please bear with me as I explain the problem, how I tried to solve it, and my question on how to improve it is at the end.

我有一个来自离线批处理作业的100,000行csv文件,我需要 将其作为适当的模型插入数据库.通常,如果这是相当简单的加载,则可以通过简单地调整CSV文件以适合模式来轻松加载.但是,我必须进行一些需要查询的外部处理,使用SQLAlchemy生成所需的数据要方便得多.

I have a 100,000 line csv file from an offline batch job and I needed to insert it into the database as its proper models. Ordinarily, if this is a fairly straight-forward load, this can be trivially loaded by just munging the CSV file to fit a schema; but, I had to do some external processing that requires querying and it's just much more convenient to use SQLAlchemy to generate the data I want.

我这里想要的数据是代表3个预先存在的表的3个模型 数据库中的每个后续模型都取决于先前的模型. 例如:

The data I want here is 3 models that represent 3 pre-exiting tables in the database and each subsequent model depends on the previous model. For example:

Model C --> Foreign Key --> Model B --> Foreign Key --> Model A

因此,必须按A,B和C的顺序插入模型.我出现了 采用生产者/消费者方法:

So, the models must be inserted in the order A, B, and C. I came up with a producer/consumer approach:

 - instantiate a multiprocessing.Process which contains a
 threadpool of 50 persister threads that have a threadlocal 
 connection to a database

 - read a line from the file using the csv DictReader

 - enqueue the dictionary to the process, where each thread creates
 the appropriate models by querying the right values and each
 thread persists the models in the appropriate order

这比非线程读取/持久性要快,但是比非线程读/持久性要慢 将文件批量加载到数据库中.这项工作坚持不懈 在大约 45分钟之后.为了好玩,我决定用SQL编写它 声明,花了5分钟.

This was faster than a non-threaded read/persist but it is way slower than bulk-loading a file into the database. The job finished persisting after about 45 minutes. For fun, I decided to write it in SQL statements, it took 5 minutes.

但是,编写SQL语句花了我几个小时.所以我的 问题是,我可以使用 faster 方法插入行吗? SQLAlchemy?据我了解,SQLAlchemy不是为批量设计的 插入操作,因此不理想.

Writing the SQL statements took me a couple of hours, though. So my question is, could I have used a faster method to insert rows using SQLAlchemy? As I understand it, SQLAlchemy is not designed for bulk insert operations, so this is less than ideal.

这是我的问题,有没有一种方法可以使用SQLAlchemy生成SQL语句,抛出 他们在一个文件中,然后只是使用批量加载到数据库中?一世 知道str(model_object),但不显示插值 值.

This follows to my question, is there a way to generate the SQL statements using SQLAlchemy, throw them in a file, and then just use a bulk-load into the database? I know about str(model_object) but it does not show the interpolated values.

对于如何更快地执行此操作,我将不胜感激.

I would appreciate any guidance for how to do this faster.

谢谢!

推荐答案

首先,除非您实际上拥有一台具有50个CPU内核的计算机,否则使用50个线程/进程将无助于提高性能,这实际上会使事情变慢.

First, unless you actually have a machine with 50 CPU cores, using 50 threads/processes won't help performance -- it will actually make things slower.

第二,我感觉到,如果您使用SQLAlchemy的

Second, I've a feeling that if you used SQLAlchemy's way of inserting multiple values at once, it would be much faster than creating ORM objects and persisting them one-by-one.

这篇关于如何使用SQLAlchemy将SQL文件从查询表达式中转储到DBMS中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆