如何处理PHP中有100k +行的CSV? [英] How to process CSV with 100k+ lines in PHP?

查看:160
本文介绍了如何处理PHP中有100k +行的CSV?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个CSV文件超过100.000行,每行有3个值用分号分隔。文件大小约为。 5MB。

I have a CSV file with more than 100.000 lines, each line has 3 values separated by semicolon. Total filesize is approx. 5MB.

CSV档案格式为

stock_id;product_id;amount
==========================
1;1234;0
1;1235;1
1;1236;0
...
2;1234;3
2;1235;2
2;1236;13
...
3;1234;0
3;1235;2
3;1236;0
...

我们有10个股票,在CSV中被索引1-10。在数据库中我们将它们保存为22-31。

We have 10 stocks which are indexed 1-10 in CSV. In database we have them saved as 22-31.

CSV按stock_id,product_id排序,但我认为没有关系。

CSV is sorted by stock_id, product_id but I think it doesn't matter.

我有什么

<?php

session_start();

require_once ('db.php');

echo '<meta charset="iso-8859-2">';

// convert table: `CSV stock id => DB stock id`
$stocks = array(
    1  => 22,
    2  => 23,
    3  => 24,
    4  => 25,
    5  => 26,
    6  => 27,
    7  => 28,
    8  => 29,
    9  => 30,
    10 => 31
);

$sql = $mysqli->query("SELECT product_id FROM table WHERE fielddef_id = 1");

while ($row = $sql->fetch_assoc()) {
    $products[$row['product_id']] = 1;
}

$csv = file('export.csv');

// go thru CSV file and prepare SQL UPDATE query
foreach ($csv as $row) {
    $data = explode(';', $row);
    // $data[0] - stock_id
    // $data[1] - product_id
    // $data[2] - amount

    if (isset($products[$data[1]])) {
        // in CSV are products which aren't in database
        // there is echo which should show me queries
        echo "  UPDATE t 
                SET value = " . (int)$data[2] . " 
                WHERE   fielddef_id = " . (int)$stocks[$data[0]] . " AND 
                        product_id = '" . $data[1] . "' -- product_id isn't just numeric
                LIMIT 1<br>";
    }
}

问题是, $ c> echo 太慢了,需要很长时间。我不知道MySQL会做什么,如果它会更快,或采取±同一时间。我在这里没有测试机,所以我担心测试在prod服务器上。

Problem is that writing down 100k lines by echo is soooo slow, takes long minutes. I'm not sure what MySQL will do, if it will be faster, or take ± the same time. I have no testing machine here, so I'm worry about testing in on prod server.

我的想法是将CSV文件加载到更多的变量,但我不知道为什么。

My idea was to load CSV file into more variables (better array) like below, but I don't know why.

$csv[0] = lines 0      - 10.000;
$csv[1] = lines 10.001 - 20.000;
$csv[2] = lines 20.001 - 30.000;
$csv[3] = lines 30.001 - 40.000;
etc. 

我发现eg。 有效计算文本文件的行数。 (200mb +),但我不知道它如何可以帮助我。

I found eg. Efficiently counting the number of lines of a text file. (200mb+), but I'm not sure how it can help me.

当我替换 foreach print_r 转储1秒。任务是使数据库更新的foreach循环更快。

When I replace foreach for print_r, I get dump in < 1 sec. The task is to make the foreach loop with database update faster.

任何想法如何更新数据库中的这么多记录?

谢谢。 p>

Any ideas how to updates so many records in database?
Thanks.

推荐答案

由于问题的答案和评论,我有解决方案。基础是来自@Dave,我只更新它来更好地通过问题。

Due to answers and comments for the question, I have the solution. The base for that is from @Dave, I've only updated it to pass better to question.

<?php

require_once 'include.php';

// stock convert table (key is ID in CSV, value ID in database)
$stocks = array(
    1  => 22,
    2  => 23,
    3  => 24,
    4  => 25,
    5  => 26,
    6  => 27,
    7  => 28,
    8  => 29,
    9  => 30,
    10 => 31
);

// product IDs in CSV (value) and Database (product_id) are different. We need to take both IDs from database and create an array of e-shop products
$sql = mysql_query("SELECT product_id, value FROM cms_module_products_fieldvals WHERE fielddef_id = 1") or die(mysql_error());

while ($row = mysql_fetch_assoc($sql)) {
    $products[$row['value']] = $row['product_id'];
}

$handle = fopen('import.csv', 'r');
$i = 1;

while (($data = fgetcsv($handle, 1000, ';')) !== FALSE) {
    $p_id = (int)$products[$data[1]];

    if ($p_id > 0) {
        // if product exists in database, continue. Without this condition it works but we do many invalid queries to database (... WHERE product_id = 0 updates nothing, but take a time)
        if ($i % 300 === 0) {
            // optional, we'll see what it do with the real traffic
            sleep(1);
        }

        $updatesql = "UPDATE table SET value = " . (int)$data[2] . " WHERE fielddef_id = " . $stocks[$data[0]] . " AND product_id = " . (int)$p_id . " LIMIT 1";
        echo "$updatesql<br>";//for debug only comment out on live
        $i++;
    }
}

// cca 1.5sec to import 100.000k+ records
fclose($handle);

这篇关于如何处理PHP中有100k +行的CSV?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆