OpenXML的花费比OLEDB更​​长的时间来读取Excel工作表中的行 [英] OpenXML takes much longer than OLEDB to read rows from Excel sheet

查看:544
本文介绍了OpenXML的花费比OLEDB更​​长的时间来读取Excel工作表中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用OLEDB,只需要2 - 3秒钟,从一个Excel工作表读取3200行。我改了的OpenXML格式,现在的时间超过1分钟,从一个Excel工作表读取3200行。

When I used OLEDB, it takes only 2 - 3 seconds to read 3200 rows from an Excel Sheet. I changed to the OpenXML format and now it takes more than 1 minute to read 3200 rows from an Excel Sheet.

下面是我的code:

public static DataTable ReadExcelFileDOM(string filename)
{
    DataTable table;

    using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open(filename, true))
    {
        WorkbookPart workbookPart = myDoc.WorkbookPart;
        Sheet worksheet = workbookPart.Workbook.Descendants<Sheet>().First();
        WorksheetPart worksheetPart =
         (WorksheetPart)(workbookPart.GetPartById(worksheet.Id));
        SheetData sheetData =
            worksheetPart.Worksheet.Elements<SheetData>().First();
        List<List<string>> totalRows = new List<List<string>>();
        int maxCol = 0;

        foreach (Row r in sheetData.Elements<Row>())
        {
            // Add the empty row.
            string value = null;
            while (totalRows.Count < r.RowIndex - 1)
            {
                List<string> emptyRowValues = new List<string>();
                for (int i = 0; i < maxCol; i++)
                {
                    emptyRowValues.Add("");
                }
                totalRows.Add(emptyRowValues);
            }


            List<string> tempRowValues = new List<string>();
            foreach (Cell c in r.Elements<Cell>())
            {
                #region get the cell value of c.
                if (c != null)
                {
                    value = c.InnerText;

                    // If the cell represents a numeric value, you are done. 
                    // For dates, this code returns the serialized value that 
                    // represents the date. The code handles strings and Booleans
                    // individually. For shared strings, the code looks up the 
                    // corresponding value in the shared string table. For Booleans, 
                    // the code converts the value into the words TRUE or FALSE.
                    if (c.DataType != null)
                    {
                        switch (c.DataType.Value)
                        {
                            case CellValues.SharedString:
                                // For shared strings, look up the value in the shared 
                                // strings table.
                                var stringTable = workbookPart.
                                    GetPartsOfType<SharedStringTablePart>().FirstOrDefault();

                                // If the shared string table is missing, something is 
                                // wrong. Return the index that you found in the cell.
                                // Otherwise, look up the correct text in the table.
                                if (stringTable != null)
                                {
                                    value = stringTable.SharedStringTable.
                                        ElementAt(int.Parse(value)).InnerText;
                                }
                                break;

                            case CellValues.Boolean:
                                switch (value)
                                {
                                    case "0":
                                        value = "FALSE";
                                        break;
                                    default:
                                        value = "TRUE";
                                        break;
                                }
                                break;
                        }
                    }

                    Console.Write(value + "  ");
                }
                #endregion

                // Add the cell to the row list.
                int i = Convert.ToInt32(c.CellReference.ToString().ToCharArray().First() - 'A');

                // Add the blank cell in the row.
                while (tempRowValues.Count < i)
                {
                    tempRowValues.Add("");
                }
                tempRowValues.Add(value);
            }

            // add the row to the totalRows.
            maxCol = processList(tempRowValues, totalRows, maxCol);

            Console.WriteLine();
        }

        table = ConvertListListStringToDataTable(totalRows, maxCol);
    }
    return table;
}

/// <summary>
/// Add each row to the totalRows.
/// </summary>
/// <param name="tempRows"></param>
/// <param name="totalRows"></param>
/// <param name="MaxCol">the max column number in rows of the totalRows</param>
/// <returns></returns>
private static int processList(List<string> tempRows, List<List<string>> totalRows, int MaxCol)
{
    if (tempRows.Count > MaxCol)
    {
        MaxCol = tempRows.Count;
    }

    totalRows.Add(tempRows);
    return MaxCol;
}

private static DataTable ConvertListListStringToDataTable(List<List<string>> totalRows, int maxCol)
{
    DataTable table = new DataTable();
    for (int i = 0; i < maxCol; i++)
    {
        table.Columns.Add();
    }
    foreach (List<string> row in totalRows)
    {
        while (row.Count < maxCol)
        {
            row.Add("");
        }
        table.Rows.Add(row.ToArray());
    }
    return table;
}

有没有一种有效的方式来某处改变这种code所以这个阅读过程中可以快一点?如何我可以改变这code读取速度更快?

Is there an efficient way to change this code somewhere so that the read process can be a little faster? How I can change this to code to read faster?

推荐答案

你试过了SAX方法? DOM方法比较慢,因为它加载了,好了,DOM。

Have you tried the SAX approach? The DOM approach is slower because it loads in the, well, DOM.

<一个href="http://blogs.msdn.com/b/brian_jones/archive/2010/05/27/parsing-and-reading-large-excel-files-with-the-open-xml-sdk.aspx" rel="nofollow">http://blogs.msdn.com/b/brian_jones/archive/2010/05/27/parsing-and-reading-large-excel-files-with-the-open-xml-sdk.aspx

如果你确定每个单元都有一个单元格引用(如A1),然后通过所有的细胞类只是解析(而不是通过排类,那么子细胞类解析)。我相信,Microsoft Excel中做到这一点。该单元格引用是根据开放XML规范一个可选属性。

If you're sure every cell has a cell reference (such as "A1"), then just parse through all the Cell classes (instead of parsing through Row classes, then child Cell classes). I believe Microsoft Excel does this. The cell reference is an optional attribute according to Open XML specifications.

这篇关于OpenXML的花费比OLEDB更​​长的时间来读取Excel工作表中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆