PostgreSQL-超过3000个值的IN子句优化 [英] Postgresql - IN clause optimization for more than 3000 values

查看:105
本文介绍了PostgreSQL-超过3000个值的IN子句优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个应用程序,用户将在其中上传具有10,000行以上单列"partId"的excel文件(.xlsx或.csv).包含要在数据库中查找的值

I have an application where the user will be uploading an excel file(.xlsx or .csv) with more than 10,000 rows with a single column "partId" containing the values to look for in database

我将读取excel值并将其存储在list对象中,并将列表作为参数传递给内部构建IN子句查询的Spring Boot JPA存储库find方法:

I will be reading the excel values and store it in list object and pass the list as parameter to the Spring Boot JPA repository find method that builds IN clause query internally:

                // Read excel file
                stream = new   ByteArrayInputStream(file.getBytes());
                wb = WorkbookFactory.create(stream);
                org.apache.poi.ss.usermodel.Sheet sheet = wb.getSheetAt(wb.getActiveSheetIndex());
                
                Iterator<Row> rowIterator = sheet.rowIterator();
                
                while(rowIterator.hasNext()) {
                    Row row = rowIterator.next();
                    Cell cell = row.getCell(0);
                    System.out.println(cell.getStringCellValue());
                    vinList.add(cell.getStringCellValue());
                }
                
                //JPA repository method that I used
                  findByPartIdInAndSecondaryId(List<String> partIds);

我阅读了许多文章,在上述情况下,我也遇到了同样的情况,即使用IN查询无法处理大量数据.

I read in many articles and experienced the same in above case that using IN query is inefficient for huge list of data.

如何优化上述方案或编写新的优化查询?

How can I optimize the above scenario or write a new optimized query?

另外,请让我知道,除了上述代码段之外,是否还有优化的读取excel文件的方式

Also, please let me know if there is optimized way of reading an excel file than the above mentioned code snippet

那会很有帮助!!预先感谢!

It would be much helpful!! Thanks in advance!

推荐答案

如果列表确实很大,那么您永远不会快如闪电.

If the list is truly huge, you will never be lightning fast.

我看到几个选项:

    正如您在问题中提到的,
  1. 发送带有大 IN 列表的查询.

  1. Send a query with a large IN list, as you mention in your question.

构造一个带有大 VALUES 子句的联接语句:

Construct a statement that is a join with a large VALUES clause:

SELECT ... FROM mytable
   JOIN (VALUES (42), (101), (43), ...) AS tmp(col)
      ON mytable.id = tmp.col;

  • 使用这些值创建一个临时表并与之连接:

  • Create a temporary table with the values and join with that:

    BEGIN;
    CREATE TEMP TABLE tmp(col bigint) ON COMMIT DROP;
    

    然后选择

    COPY tmp FROM STDIN; -- if Spring supports COPY
    

    INSERT INTO tmp VALUES (42), (101), (43), ...; -- if not
    

    然后

    ANALYZE tmp;  -- for good statistics
    SELECT ... FROM mytable
       JOIN tmp ON mytable.id = tmp.col;
    COMMIT;  -- drops the temporary table
    

  • 其中最快的最好由您的案例的反复试验确定;我认为不能说其中一种方法会永远胜过其他方法.

    Which of these is fastest is best determined by trial and error for your case; I don't think that it can be said that one of the methods will always beat the others.

    一些注意事项:

    • 解决方案1.和2.可能会产生非常大的语句,而解决方案3.可能会分成较小的块.

    • Solutions 1. and 2. may result in very large statements, while solution 3. can be split in smaller chunks.

    解决方案3.除非列表很大,否则很有可能会变慢.

    Solution 3. will very likely be slower unless the list is truly large.

    这篇关于PostgreSQL-超过3000个值的IN子句优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆