如何使用SQLAlchemy将SQL文件从查询表达式中转储到DBMS中? [英] How to use SQLAlchemy to dump an SQL file from query expressions to bulk-insert into a DBMS?

查看：170 发布时间：2020/5/22 19:02:12 python performance orm sqlalchemy bulkinsert

本文介绍了如何使用SQLAlchemy将SQL文件从查询表达式中转储到DBMS中?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

请在我解释问题，尝试解决问题的方式时，与我保持一致，最后，我的问题是如何改进它.

Please bear with me as I explain the problem, how I tried to solve it, and my question on how to improve it is at the end.

我有一个来自离线批处理作业的100,000行csv文件，我需要将其作为适当的模型插入数据库.通常，如果这是相当简单的加载，则可以通过简单地调整CSV文件以适合模式来轻松加载.但是，我必须进行一些需要查询的外部处理，使用SQLAlchemy生成所需的数据要方便得多.

I have a 100,000 line csv file from an offline batch job and I needed to insert it into the database as its proper models. Ordinarily, if this is a fairly straight-forward load, this can be trivially loaded by just munging the CSV file to fit a schema; but, I had to do some external processing that requires querying and it's just much more convenient to use SQLAlchemy to generate the data I want.

我这里想要的数据是代表3个预先存在的表的3个模型数据库中的每个后续模型都取决于先前的模型. 例如:

The data I want here is 3 models that represent 3 pre-exiting tables in the database and each subsequent model depends on the previous model. For example:

Model C --> Foreign Key --> Model B --> Foreign Key --> Model A

因此，必须按A，B和C的顺序插入模型.我出现了采用生产者/消费者方法:

So, the models must be inserted in the order A, B, and C. I came up with a producer/consumer approach:

 - instantiate a multiprocessing.Process which contains a
 threadpool of 50 persister threads that have a threadlocal 
 connection to a database

 - read a line from the file using the csv DictReader

 - enqueue the dictionary to the process, where each thread creates
 the appropriate models by querying the right values and each
 thread persists the models in the appropriate order

这比非线程读取/持久性要快，但是比非线程读/持久性要慢将文件批量加载到数据库中.这项工作坚持不懈在大约 45分钟之后.为了好玩，我决定用SQL编写它声明，花了5分钟.

This was faster than a non-threaded read/persist but it is way slower than bulk-loading a file into the database. The job finished persisting after about 45 minutes. For fun, I decided to write it in SQL statements, it took 5 minutes.

但是，编写SQL语句花了我几个小时.所以我的问题是，我可以使用 faster 方法插入行吗? SQLAlchemy?据我了解，SQLAlchemy不是为批量设计的插入操作，因此不理想.

Writing the SQL statements took me a couple of hours, though. So my question is, could I have used a faster method to insert rows using SQLAlchemy? As I understand it, SQLAlchemy is not designed for bulk insert operations, so this is less than ideal.

这是我的问题，有没有一种方法可以使用SQLAlchemy生成SQL语句，抛出他们在一个文件中，然后只是使用批量加载到数据库中?一世知道str(model_object)，但不显示插值值.

This follows to my question, is there a way to generate the SQL statements using SQLAlchemy, throw them in a file, and then just use a bulk-load into the database? I know about str(model_object) but it does not show the interpolated values.

对于如何更快地执行此操作，我将不胜感激.

I would appreciate any guidance for how to do this faster.

谢谢！

如何使用SQLAlchemy将SQL文件从查询表达式中转储到DBMS中? [英] How to use SQLAlchemy to dump an SQL file from query expressions to bulk-insert into a DBMS?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用SQLAlchemy将SQL文件从查询表达式中转储到DBMS中? [英] How to use SQLAlchemy to dump an SQL file from query expressions to bulk-insert into a DBMS?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭