从HashMap高效批量插入/复制到表中 [英] Efficient Bulk INSERT/COPY into table from HashMap

查看：1393 发布时间：2020/5/30 1:40:05 java postgresql

本文介绍了从HashMap高效批量插入/复制到表中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

任务：

鉴于此HashMap结构： Map< String，Map< String，String>> mainMap = new HashMap<>（）

Given this HashMap structure: Map<String, Map<String, String>> mainMap = new HashMap<>()

我要 INSERT 或 COPY 内部 Map 的每个值在我的数据库中。

I want to INSERT or COPY each value of the inner Map into its own cell in my database.

的 size（） mainMap （如果为50,000）。

内部 Map 的 size（）是50。

要插入的表有50列。

每个列的标题都是内部 Map 的键。

The size() of mainMap if 50,000.
The size() of the inner Map is 50.
The table to be inserted into has 50 columns.
Each column's header is the key for the inner Map.

编辑：最初，用户上载具有50列中的35列的大型电子表格。然后，我使用各种格式清理该数据，并为每个 mainMap 条目将自己的15对新数据添加到innerMap中。如果不进行清理/格式化/添加，就无法直接从用户源文件中将 COPY 从我的源文件复制到数据库中。

Originally the user uploads a large spreadsheet with 35 of the 50 columns. I then "cleanse" that data with various formatting, and I add my own 15 new pairs into the innerMap for each mainMap entry. I can't directly COPY from the user's source file to my database without cleansing/formatting/adding.

我已经完成了电子表格的迭代并构建了 mainMap ，那时我需要有效地插入数据库表中。

Once I'm done iterating the spreadsheet and building mainMap, that's when I need to insert into my database table efficiently.

研究：

我阅读，认为 COPY 是最初批量填充表的最佳方法，但是我对我的要求是否允许该命令感到困惑。

I've read that COPY is the best approach to initially bulk populate a table, however I'm stuck on whether my requirements warrant that command.

这篇文章指出Postgres有一个Prepared Statement参数限制的查询34464。

This post states that Postgres has a Prepared Statement parameter limit of 34464 for a query.

我假设我总共需要50 x 50,000 = 2,500,000个参数。
这相当于〜73个独立查询！

I'm assuming I need 50 x 50,000 = 2,500,000 parameters in total. This equals out to ~ 73 individual queries!

问题：

是 COPY 此处使用适当的方法代替所有这些参数？

如果是，我是否要转换 HashMap 值转换为 .sql 文件，将其保存在Web应用程序服务器的磁盘上，然后在我的 COPY 命令，然后删除临时文件？还是我可以直接将串联的 String 传递给它，而又不冒SQL注入的风险？



Is COPY the proper approach here instead of all these parameters?
If so, do I convert the HashMap values into a .sql file, save it on disk on my web app server, and then reference that in my COPY command, and then delete the temp file? Or can I directly pass a concatenated String into it, without risking SQL injection?

该命令经常发生，因此需要进行优化。
This command will be happening often, hence the need to be optimized.
我找不到将Java对象转换为兼容的Postgres文本文件格式的任何示例，因此任何反馈都会有所帮助。
I can't find any examples of converting Java objects into compatible Postgres text file formats, so any feedback helps.
您将如何解决此问题？
 其他信息： 
我的表已经存在，并且不能删除，因为它是我的Web应用程序的后端，并且在任何给定时间都已连接多个用户。
My table is pre-existing and can't be deleted since it's the back-end for my webapp and multiple users are connected at any given time.
我了解在使用 COPY 之前临时删除索引可以提高性能，但是我最多只需要插入50,000行或一次复制，而不是数百万。 
I understand temporarily removing indexes prior to using COPY can increase performance, but I'm only requiring max 50,000 rows to be inserted or copied at a time, not millions. 
 StackExchange告诉我在这里问。
StackExchange told me to ask here.
推荐答案
虽然Java是当然不是执行此类ETL的最佳选择，使用标准 INSERT 语句和预准备的查询，当然可以并且开销很小：
While Java is certainly not the best option to do this kind of ETL, it certainly is possible and with rather little overhead using standard INSERT statements and prepared queries:
conn.setAutoCommit(false);
PreparedStatement stmt = conn.prepareStatement(
        "INSERT INTO my_table (col_a, col_b, ...)"
        + " VALUES (?, ?, ...)");
int batchSize = 1000;
int rows = 0;
for (Map<String, String> values : mainMap.values()) {
    int i = 0;
    stmt.setString(++i, values.get("col_a"));
    stmt.setString(++i, values.get("col_b"));
    // ...
    stmt.addBatch(); // add the row to the batch
    if (++rows % batchSize == 0) {
        // batch-sizing: execute...
        stmt.executeBatch();
    }
}

if (rows % batchSize != 0) {
    // a last execution if necessary...
    stmt.executeBatch();
}
conn.commit(); // atomic action - if any record fails, the whole import will fail

或者，您可以写出将地图映射到文件中，然后使用  CopyManager  ，但我严重怀疑这样做的速度是否比批量插入的速度快（不过，成百万行的记录会有所不同）。
Alternatively, you could write out the Map into a file and use the CopyManager, but I seriously doubt this would be any faster than with the batched inserts (would be different for millions of rows, though).

                        这篇关于从HashMap高效批量插入/复制到表中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

从HashMap高效批量插入/复制到表中 [英] Efficient Bulk INSERT/COPY into table from HashMap

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

从HashMap高效批量插入/复制到表中 [英] Efficient Bulk INSERT/COPY into table from HashMap

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭