有效的方式来找到CSV中的哪些值不在数据库? [英] Efficient way to find which values in CSV are NOT in DB?

查看:190
本文介绍了有效的方式来找到CSV中的哪些值不在数据库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

供应商向我们提供其产品的CSV文件。文件上的特定列(例如列3)是样式号。此文件在条目上有数千个。

A vendor is feeding us a CSV file of their products. A particular column on the file (eg column 3) is the style number. This file has thousands on entries.

我们有一个数据库产品表,其中有一个名为manufacturer_num的列,这是供应商样式号。

We have a data-base table of products with a column called manufacturer_num which is the vendors style number.

我需要找到我们目前没有的供应商的产品。

I need to find which of the vendor's products we do not currently have.

我知道我可以循环抛出CSV文件中的每一行,提取style_number并检查它是否在我们的数据库中。但是,然后我正在调用每一行的数据库。这将是数以千计的对数据库的调用。我认为这是低效的。

I know I can loop throw each line in the CSV file and extract the style_number and check to see if it is in our data-base. But then I am making a call to the data-base for each line. This would be thousands of calls to the data-base. I think this is inefficient.

我也可以建立一个风格号列表(作为一个字符串或数组)进行一个DB调用。
类似于: WHERE manufactuer_num IN(...)但是如果列表太大,PHP将不会用完内存?

I could also build a list of the style numbers (either as a string or array) to make one DB call. Something like: WHERE manufactuer_num IN(...) But won't PHP run out of memory if the list is too big? And actually this would give me the ones we do have, not the ones we don't have.

有效的方法是什么?

推荐答案

将CSV批量加载到临时表中,执行 LEFT JOIN 其中连接的RHS为 NULL

Bulk load the CSV into a temporary table, do a LEFT JOIN, then get the records where the RHS of the join is NULL.

这篇关于有效的方式来找到CSV中的哪些值不在数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆