如何使用PHPExcel从大型Excel文件(27MB +)中读取大型工作表? [英] How to read large worksheets from large Excel files (27MB+) with PHPExcel?

查看:90
本文介绍了如何使用PHPExcel从大型Excel文件(27MB +)中读取大型工作表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大型Excel工作表,希望能够使用PHPExcel读入MySQL.

I have large Excel worksheets that I want to be able to read into MySQL using PHPExcel.

我正在使用最新补丁,该补丁可以让您在工作表中阅读而无需打开整个文件.这样,我可以一次阅读一个工作表.

I am using the recent patch which allows you to read in Worksheets without opening the whole file. This way I can read one worksheet at a time.

但是,一个Excel文件的大小为27MB.我可以成功读取第一个工作表,因为它很小,但是第二个工作表太大,以致于22:00开始该流程的cron作业在8:00 AM尚未完成,工作表太简单了.

However, one Excel file is 27MB large. I can successfully read in the first worksheet since it is small, but the second worksheet is so large that the cron job that started the process at 22:00 was not finished at 8:00 AM, the worksheet is simple too big.

有什么方法可以逐行读取工作表,例如像这样的东西:

Is there any way to read in a worksheet line by line, e.g. something like this:

$inputFileType = 'Excel2007';
$inputFileName = 'big_file.xlsx';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$worksheetNames = $objReader->listWorksheetNames($inputFileName);

foreach ($worksheetNames as $sheetName) {
    //BELOW IS "WISH CODE":
    foreach($row = 1; $row <=$max_rows; $row+= 100) {
        $dataset = $objReader->getWorksheetWithRows($row, $row+100);
        save_dataset_to_database($dataset);
    }
}


附录

@mark,我使用您发布的代码创建了以下示例:


Addendum

@mark, I used the code you posted to create the following example:

function readRowsFromWorksheet() {

    $file_name = htmlentities($_POST['file_name']);
    $file_type = htmlentities($_POST['file_type']);

    echo 'Read rows from worksheet:<br />';
    debug_log('----------start');
    $objReader = PHPExcel_IOFactory::createReader($file_type);
    $chunkSize = 20;
    $chunkFilter = new ChunkReadFilter();
    $objReader->setReadFilter($chunkFilter);

    for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
        $chunkFilter->setRows($startRow, $chunkSize);
        $objPHPExcel = $objReader->load('data/' . $file_name);
        debug_log('reading chunk starting at row '.$startRow);
        $sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
        var_dump($sheetData);
        echo '<hr />';
    }
    debug_log('end');
}

如以下日志文​​件所示,它可以在较小的 8K Excel文件上正常运行,但是当我在 3 MB Excel文件上运行时,它永远不会过去第一个块有什么方法可以优化此代码以提高性能,否则看起来它的性能不足以将块从大型Excel文件中删除:

As the following log file shows, it runs fine on a small 8K Excel file, but when I run it on a 3 MB Excel file, it never gets past the first chunk, is there any way I can optimize this code for performance, otherwise it doesn't look like it is not performant enough to get chunks out of a large Excel file:

2011-01-12 11:07:15: ----------start
2011-01-12 11:07:15: reading chunk starting at row 2
2011-01-12 11:07:15: reading chunk starting at row 22
2011-01-12 11:07:15: reading chunk starting at row 42
2011-01-12 11:07:15: reading chunk starting at row 62
2011-01-12 11:07:15: reading chunk starting at row 82
2011-01-12 11:07:15: reading chunk starting at row 102
2011-01-12 11:07:15: reading chunk starting at row 122
2011-01-12 11:07:15: reading chunk starting at row 142
2011-01-12 11:07:15: reading chunk starting at row 162
2011-01-12 11:07:15: reading chunk starting at row 182
2011-01-12 11:07:15: reading chunk starting at row 202
2011-01-12 11:07:15: reading chunk starting at row 222
2011-01-12 11:07:15: end
2011-01-12 11:07:52: ----------start
2011-01-12 11:08:01: reading chunk starting at row 2
(...at 11:18, CPU usage at 93% still running...)


附录2

当我注释掉时:


Addendum 2

When I comment out:

//$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
//var_dump($sheetData);

然后以可接受的速度(每秒 2行)进行解析,是否有提高toArray()性能的条件?

Then it parses at an acceptable speed (about 2 rows per second), is there anyway to increase the performance of toArray()?

2011-01-12 11:40:51: ----------start
2011-01-12 11:40:59: reading chunk starting at row 2
2011-01-12 11:41:07: reading chunk starting at row 22
2011-01-12 11:41:14: reading chunk starting at row 42
2011-01-12 11:41:22: reading chunk starting at row 62
2011-01-12 11:41:29: reading chunk starting at row 82
2011-01-12 11:41:37: reading chunk starting at row 102
2011-01-12 11:41:45: reading chunk starting at row 122
2011-01-12 11:41:52: reading chunk starting at row 142
2011-01-12 11:42:00: reading chunk starting at row 162
2011-01-12 11:42:07: reading chunk starting at row 182
2011-01-12 11:42:15: reading chunk starting at row 202
2011-01-12 11:42:22: reading chunk starting at row 222
2011-01-12 11:42:22: end


附录3

例如,至少在 3 MB 文件上,这似乎可以正常工作:


Addendum 3

This seems to work adequately, for instance, at least on the 3 MB file:

for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
    echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ', $startRow, ' to ', ($startRow + $chunkSize - 1), '<br />';
    $chunkFilter->setRows($startRow, $chunkSize);
    $objPHPExcel = $objReader->load('data/' . $file_name);
    debug_log('reading chunk starting at row ' . $startRow);
    foreach ($objPHPExcel->getActiveSheet()->getRowIterator() as $row) {
        $cellIterator = $row->getCellIterator();
        $cellIterator->setIterateOnlyExistingCells(false);
        echo '<tr>';
        foreach ($cellIterator as $cell) {
            if (!is_null($cell)) {
                //$value = $cell->getCalculatedValue();
                $rawValue = $cell->getValue();
                debug_log($rawValue);
            }
        }
    }
}

推荐答案

尽管我不能保证效率,但仍可以使用读取过滤器"以块"形式读取工作表.

It is possible to read a worksheet in "chunks" using Read Filters, although I can make no guarantees about efficiency.

$inputFileType = 'Excel5';
$inputFileName = './sampleData/example2.xls';


/**  Define a Read Filter class implementing PHPExcel_Reader_IReadFilter  */
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
    private $_startRow = 0;

    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */
    public function setRows($startRow, $chunkSize) {
        $this->_startRow    = $startRow;
        $this->_endRow        = $startRow + $chunkSize;
    }

    public function readCell($column, $row, $worksheetName = '') {
        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
            return true;
        }
        return false;
    }
}


echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory with a defined reader type of ',$inputFileType,'<br />';
/**  Create a new Reader of the type defined in $inputFileType  **/

$objReader = PHPExcel_IOFactory::createReader($inputFileType);



echo '<hr />';


/**  Define how many rows we want to read for each "chunk"  **/
$chunkSize = 20;
/**  Create a new Instance of our Read Filter  **/
$chunkFilter = new chunkReadFilter();

/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/
$objReader->setReadFilter($chunkFilter);

/**  Loop to read our worksheet in "chunk size" blocks  **/
/**  $startRow is set to 2 initially because we always read the headings in row #1  **/

for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
    echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startRow,' to ',($startRow+$chunkSize-1),'<br />';
    /**  Tell the Read Filter, the limits on which rows we want to read this iteration  **/
    $chunkFilter->setRows($startRow,$chunkSize);
    /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  **/
    $objPHPExcel = $objReader->load($inputFileName);

    //    Do some processing here

    $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
    var_dump($sheetData);
    echo '<br /><br />';
}

请注意,此读取过滤器将始终读取工作表的第一行以及块规则定义的行.

Note that this Read Filter will always read the first row of the worksheet, as well as the rows defined by the chunk rule.

使用读取过滤器时,PHPExcel仍会解析整个文件,但只会加载与定义的读取过滤器匹配的单元格,因此它仅使用该数量的单元格所需的内存.但是,它将对文件进行多次分析,每个块一次,因此它会变慢.此示例一次读取20行:要逐行读取,只需将$ chunkSize设置为1.

When using a read filter, PHPExcel still parses the entire file, but only loads those cells that match the defined read filter, so it only uses the memory required by that number of cells. However, it will parse the file multiple times, once for each chunk, so it will be slower. This example reads 20 rows at a time: to read line by line, simply set $chunkSize to 1.

如果您有引用不同块"中的单元格的公式,这也会引起问题,因为数据根本无法用于当前块"之外的单元格.

This can also cause problems if you have formulae that reference cells in different "chunks", because the data simply isn't available for cells outside of the current "chunk".

这篇关于如何使用PHPExcel从大型Excel文件(27MB +)中读取大型工作表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆