如何处理PHP中有100k +行的CSV? [英] How to process CSV with 100k+ lines in PHP?
问题描述
我有一个CSV文件超过100.000行,每行有3个值用分号分隔。文件大小约为。 5MB。
I have a CSV file with more than 100.000 lines, each line has 3 values separated by semicolon. Total filesize is approx. 5MB.
CSV档案格式为:
stock_id;product_id;amount
==========================
1;1234;0
1;1235;1
1;1236;0
...
2;1234;3
2;1235;2
2;1236;13
...
3;1234;0
3;1235;2
3;1236;0
...
我们有10个股票,在CSV中被索引1-10。在数据库中我们将它们保存为22-31。
We have 10 stocks which are indexed 1-10 in CSV. In database we have them saved as 22-31.
CSV按stock_id,product_id排序,但我认为没有关系。
CSV is sorted by stock_id, product_id but I think it doesn't matter.
我有什么
<?php
session_start();
require_once ('db.php');
echo '<meta charset="iso-8859-2">';
// convert table: `CSV stock id => DB stock id`
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);
$sql = $mysqli->query("SELECT product_id FROM table WHERE fielddef_id = 1");
while ($row = $sql->fetch_assoc()) {
$products[$row['product_id']] = 1;
}
$csv = file('export.csv');
// go thru CSV file and prepare SQL UPDATE query
foreach ($csv as $row) {
$data = explode(';', $row);
// $data[0] - stock_id
// $data[1] - product_id
// $data[2] - amount
if (isset($products[$data[1]])) {
// in CSV are products which aren't in database
// there is echo which should show me queries
echo " UPDATE t
SET value = " . (int)$data[2] . "
WHERE fielddef_id = " . (int)$stocks[$data[0]] . " AND
product_id = '" . $data[1] . "' -- product_id isn't just numeric
LIMIT 1<br>";
}
}
问题是, $ c> echo 太慢了,需要很长时间。我不知道MySQL会做什么,如果它会更快,或采取±同一时间。我在这里没有测试机,所以我担心测试在prod服务器上。
Problem is that writing down 100k lines by echo
is soooo slow, takes long minutes. I'm not sure what MySQL will do, if it will be faster, or take ± the same time. I have no testing machine here, so I'm worry about testing in on prod server.
我的想法是将CSV文件加载到更多的变量,但我不知道为什么。
My idea was to load CSV file into more variables (better array) like below, but I don't know why.
$csv[0] = lines 0 - 10.000;
$csv[1] = lines 10.001 - 20.000;
$csv[2] = lines 20.001 - 30.000;
$csv[3] = lines 30.001 - 40.000;
etc.
我发现eg。 有效计算文本文件的行数。 (200mb +),但我不知道它如何可以帮助我。
I found eg. Efficiently counting the number of lines of a text file. (200mb+), but I'm not sure how it can help me.
当我替换 foreach
为 print_r
转储1秒。任务是使数据库更新的foreach循环更快。
When I replace foreach
for print_r
, I get dump in < 1 sec. The task is to make the foreach loop with database update faster.
任何想法如何更新数据库中的这么多记录?
谢谢。 p>
Any ideas how to updates so many records in database?
Thanks.
推荐答案
由于问题的答案和评论,我有解决方案。基础是来自@Dave,我只更新它来更好地通过问题。
Due to answers and comments for the question, I have the solution. The base for that is from @Dave, I've only updated it to pass better to question.
<?php
require_once 'include.php';
// stock convert table (key is ID in CSV, value ID in database)
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);
// product IDs in CSV (value) and Database (product_id) are different. We need to take both IDs from database and create an array of e-shop products
$sql = mysql_query("SELECT product_id, value FROM cms_module_products_fieldvals WHERE fielddef_id = 1") or die(mysql_error());
while ($row = mysql_fetch_assoc($sql)) {
$products[$row['value']] = $row['product_id'];
}
$handle = fopen('import.csv', 'r');
$i = 1;
while (($data = fgetcsv($handle, 1000, ';')) !== FALSE) {
$p_id = (int)$products[$data[1]];
if ($p_id > 0) {
// if product exists in database, continue. Without this condition it works but we do many invalid queries to database (... WHERE product_id = 0 updates nothing, but take a time)
if ($i % 300 === 0) {
// optional, we'll see what it do with the real traffic
sleep(1);
}
$updatesql = "UPDATE table SET value = " . (int)$data[2] . " WHERE fielddef_id = " . $stocks[$data[0]] . " AND product_id = " . (int)$p_id . " LIMIT 1";
echo "$updatesql<br>";//for debug only comment out on live
$i++;
}
}
// cca 1.5sec to import 100.000k+ records
fclose($handle);
这篇关于如何处理PHP中有100k +行的CSV?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!