OpenXML的花费比OLEDB更长的时间来读取Excel工作表中的行 [英] OpenXML takes much longer than OLEDB to read rows from Excel sheet
问题描述
当我使用OLEDB,只需要2 - 3秒钟,从一个Excel工作表读取3200行。我改了的OpenXML格式,现在的时间超过1分钟,从一个Excel工作表读取3200行。
When I used OLEDB, it takes only 2 - 3 seconds to read 3200 rows from an Excel Sheet. I changed to the OpenXML format and now it takes more than 1 minute to read 3200 rows from an Excel Sheet.
下面是我的code:
public static DataTable ReadExcelFileDOM(string filename)
{
DataTable table;
using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open(filename, true))
{
WorkbookPart workbookPart = myDoc.WorkbookPart;
Sheet worksheet = workbookPart.Workbook.Descendants<Sheet>().First();
WorksheetPart worksheetPart =
(WorksheetPart)(workbookPart.GetPartById(worksheet.Id));
SheetData sheetData =
worksheetPart.Worksheet.Elements<SheetData>().First();
List<List<string>> totalRows = new List<List<string>>();
int maxCol = 0;
foreach (Row r in sheetData.Elements<Row>())
{
// Add the empty row.
string value = null;
while (totalRows.Count < r.RowIndex - 1)
{
List<string> emptyRowValues = new List<string>();
for (int i = 0; i < maxCol; i++)
{
emptyRowValues.Add("");
}
totalRows.Add(emptyRowValues);
}
List<string> tempRowValues = new List<string>();
foreach (Cell c in r.Elements<Cell>())
{
#region get the cell value of c.
if (c != null)
{
value = c.InnerText;
// If the cell represents a numeric value, you are done.
// For dates, this code returns the serialized value that
// represents the date. The code handles strings and Booleans
// individually. For shared strings, the code looks up the
// corresponding value in the shared string table. For Booleans,
// the code converts the value into the words TRUE or FALSE.
if (c.DataType != null)
{
switch (c.DataType.Value)
{
case CellValues.SharedString:
// For shared strings, look up the value in the shared
// strings table.
var stringTable = workbookPart.
GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
// If the shared string table is missing, something is
// wrong. Return the index that you found in the cell.
// Otherwise, look up the correct text in the table.
if (stringTable != null)
{
value = stringTable.SharedStringTable.
ElementAt(int.Parse(value)).InnerText;
}
break;
case CellValues.Boolean:
switch (value)
{
case "0":
value = "FALSE";
break;
default:
value = "TRUE";
break;
}
break;
}
}
Console.Write(value + " ");
}
#endregion
// Add the cell to the row list.
int i = Convert.ToInt32(c.CellReference.ToString().ToCharArray().First() - 'A');
// Add the blank cell in the row.
while (tempRowValues.Count < i)
{
tempRowValues.Add("");
}
tempRowValues.Add(value);
}
// add the row to the totalRows.
maxCol = processList(tempRowValues, totalRows, maxCol);
Console.WriteLine();
}
table = ConvertListListStringToDataTable(totalRows, maxCol);
}
return table;
}
/// <summary>
/// Add each row to the totalRows.
/// </summary>
/// <param name="tempRows"></param>
/// <param name="totalRows"></param>
/// <param name="MaxCol">the max column number in rows of the totalRows</param>
/// <returns></returns>
private static int processList(List<string> tempRows, List<List<string>> totalRows, int MaxCol)
{
if (tempRows.Count > MaxCol)
{
MaxCol = tempRows.Count;
}
totalRows.Add(tempRows);
return MaxCol;
}
private static DataTable ConvertListListStringToDataTable(List<List<string>> totalRows, int maxCol)
{
DataTable table = new DataTable();
for (int i = 0; i < maxCol; i++)
{
table.Columns.Add();
}
foreach (List<string> row in totalRows)
{
while (row.Count < maxCol)
{
row.Add("");
}
table.Rows.Add(row.ToArray());
}
return table;
}
有没有一种有效的方式来某处改变这种code所以这个阅读过程中可以快一点?如何我可以改变这code读取速度更快?
Is there an efficient way to change this code somewhere so that the read process can be a little faster? How I can change this to code to read faster?
推荐答案
你试过了SAX方法? DOM方法比较慢,因为它加载了,好了,DOM。
Have you tried the SAX approach? The DOM approach is slower because it loads in the, well, DOM.
<一个href="http://blogs.msdn.com/b/brian_jones/archive/2010/05/27/parsing-and-reading-large-excel-files-with-the-open-xml-sdk.aspx" rel="nofollow">http://blogs.msdn.com/b/brian_jones/archive/2010/05/27/parsing-and-reading-large-excel-files-with-the-open-xml-sdk.aspx
如果你确定每个单元都有一个单元格引用(如A1),然后通过所有的细胞类只是解析(而不是通过排类,那么子细胞类解析)。我相信,Microsoft Excel中做到这一点。该单元格引用是根据开放XML规范一个可选属性。
If you're sure every cell has a cell reference (such as "A1"), then just parse through all the Cell classes (instead of parsing through Row classes, then child Cell classes). I believe Microsoft Excel does this. The cell reference is an optional attribute according to Open XML specifications.
这篇关于OpenXML的花费比OLEDB更长的时间来读取Excel工作表中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!