导入大型 CSV 文件的最佳实践 [英] Best practices for importing large CSV files

查看：33 发布时间：2021/12/25 19:57:13 csv import

本文介绍了导入大型 CSV 文件的最佳实践的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的公司每个月都会收到一组包含银行帐户信息的 CSV 文件，我需要将这些文件导入到数据库中.其中一些文件可能非常大.例如，一个大约 33MB，大约 65,000 行.

My company gets a set of CSV files full of bank account info each month that I need to import into a database. Some of these files can be pretty big. For example, one is about 33MB and about 65,000 lines.

现在我有一个 symfony/Doctrine 应用程序 (PHP)，它可以读取这些 CSV 文件并将它们导入到数据库中.我的数据库有大约 35 个不同的表，在导入过程中，我将这些行拆分为组成对象并将它们插入到数据库中.除了慢(每行大约需要四分之一秒)并且使用大量内存之外，一切都运行良好.

Right now I have a symfony/Doctrine app (PHP) that reads these CSV files and imports them into a database. My database has about 35 different tables and on the process of importing, I take these rows, split them up into their constituent objects and insert them into the database. It all works beautifully, except it's slow (each row takes about a quarter second) and it uses a lot of memory.

内存使用情况很糟糕，我不得不拆分我的 CSV 文件.一个 20,000 行的文件勉强能进入.当它接近尾声时，我的内存使用率大约为 95%.根本不可能导入 65,000 行文件.

The memory use is so bad that I have to split up my CSV files. A 20,000-line file barely makes it in. By the time it's near the end, I'm at like 95% memory usage. Importing that 65,000 line file is simply not possible.

我发现 symfony 是用于构建应用程序的特殊框架，我通常不会考虑使用其他任何东西，但在这种情况下，我愿意以性能的名义将我所有的先入之见抛诸脑后.我不致力于任何特定的语言、DBMS 或任何东西.

I've found symfony to be an exceptional framework for building applications and I normally wouldn't consider using anything else, but in this case I'm willing to throw all my preconceptions out the window in the name of performance. I'm not committed to any specific language, DBMS, or anything.

Stack Overflow 不喜欢主观问题，所以我将尽量使问题不主观:对于那些不仅有意见而且有导入大型 CSV 文件经验的人，您过去使用过哪些成功的工具/实践?

Stack Overflow doesn't like subjective questions so I'm going to try to make this as un-subjective as possible: for those of you have not just an opinion but experience importing large CSV files, what tools/practices have you used in the past that have been successful?

比如你是不是只用了Django的ORM/OOP就没有任何问题?或者您是否将整个 CSV 文件读入内存并准备一些庞大的 INSERT 语句?

For example, do you just use Django's ORM/OOP and you haven't had any problems? Or do you read the entire CSV file into memory and prepare a few humongous INSERT statements?

再说一次，我想要的不仅仅是一个意见，而是一些过去对你有用的东西.

Again, I want not just an opinion, but something that's actually worked for you in the past.

我不仅仅是将一个 85 列的 CSV 电子表格导入到一个 85 列的数据库表中.我正在规范化数据并将其放入数十个不同的表中.出于这个原因，我不能只使用 LOAD DATA INFILE(我使用的是 MySQL)或任何其他只读取 CSV 文件的 DBMS 功能.

I'm not just importing an 85-column CSV spreadsheet into one 85-column database table. I'm normalizing the data and putting it into dozens of different tables. For this reason, I can't just use LOAD DATA INFILE (I'm using MySQL) or any other DBMS's feature that just reads in CSV files.

此外，我无法使用任何 Microsoft 特定的解决方案.

Also, I can't use any Microsoft-specific solutions.

导入大型 CSV 文件的最佳实践 [英] Best practices for importing large CSV files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

导入大型 CSV 文件的最佳实践 [英] Best practices for importing large CSV files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭