OpenXML的萨克斯方法导出100K +行到Excel快 [英] OpenXML Sax method for exporting 100K+ rows to Excel fast

查看:339
本文介绍了OpenXML的萨克斯方法导出100K +行到Excel快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力提高SAX方法的性能写入一个XLSX。我知道有是在Excel 1048576行的限制。我也遇到这个限制只有几次。在大多数情况下,虽然我只写了约125K至250K行(大数据)。我已经尝试的代码似乎并没有那么快,因为它可能是因为很多时候它会写入文件。我希望有一些涉及的缓存,但它仍然好像有代码现在工作的方式太多的磁盘访问。

I have been trying to improve the performance of the SAX method for writing to an xlsx. I know there is a limit of 1048576 rows in Excel. I have hit this limit only a few times. In most cases though I only write out about 125K to 250K rows (a large dataset). The code that I have tried doesn't seem to be as fast as it could be because of the many times it will write to the file. I would hope that there is some caching involved but it still seems like there is way too much disk access in the way the code works now.

下面的代码是类似使用与处理OpenXML和SAX 模板,因为我已经写入文件使用ClosedXML然后切换到SAX的大型内容。尝试使用ClosedXML为这么多行时,内存熄灭图表。所以这就是为什么我使用SAX

The code below is similar to Using a template with OpenXML and SAX because I have written to a file using ClosedXML and then switch to SAX for the large content. The memory goes off the charts when trying to use ClosedXML for this many rows. So that is why I am using SAX.

        int numCols = dt.Columns.Count;
        int rowCnt = 0;
        //for (curRec = 0; curRec < totalRecs; curRec++)
        foreach (DataRow row in dt.Rows)
        {
            Row xlr = new Row();

            //starting of new row.
            //writer.WriteStartElement(xlr);

            for (int col = 0; col < numCols; ++col)
            {
                Cell cell = new Cell();
                CellValue v = new CellValue(row[col].ToString());

                {
                    string objDataType = row[col].GetType().ToString();
                    if (objDataType.Contains(TypeCode.Int32.ToString()) || objDataType.Contains(TypeCode.Int64.ToString()))
                    {
                        cell.DataType = new EnumValue<CellValues>(CellValues.Number);
                        //cell.CellValue = new CellValue(row[col].ToString());
                        cell.Append(v);
                    }
                    else if (objDataType.Contains(TypeCode.Decimal.ToString()) || objDataType.Contains("Single"))
                    {
                        cell.DataType = new EnumValue<CellValues>(CellValues.Number);
                        cell.Append(v);
                        //TODO: set the decimal qualifier - May be fixed elsewhere
                        cell.StyleIndex = 2;
                    }
                    else
                    {
                        //Add text to text cell
                        cell.DataType = new EnumValue<CellValues>(CellValues.String);
                        cell.Append(v);
                    }
                }

                if (colStyles != null && col < colStyles.Count)
                {
                    cell.StyleIndex = (UInt32Value)colStyles[col];
                }

                //writer.WriteElement(cell);
                xlr.Append(cell);
            }
            writer.WriteElement(xlr);
            //end row element
            //writer.WriteEndElement();
            ++rowCnt;
        }

这代码是非常接近我看到那里的例子。但问题是,它仍然是相当缓慢。从单个细胞写作更改为追加到该行并写入该行似乎10%,提高了工艺上的125K行。

This code is very close to examples I have seen out there. But the problem is it is still pretty slow. Changing from the individual cell writing to appending to the row and writing the row seems to improved the process by 10% on 125K rows.

有没有人找到了一种方法来改善作家或设置一个方式来写更少的时间?是否有可能加速这一进程的方法呢?

Has anyone found a way to improve the writer or setup a way to write fewer times? Are there methods that could speed up this process?

有没有人试图建立某种形式的缓存以提高性能?​​

Has anyone tried to setup some form of caching to improve performance?

推荐答案

一般的问题是,你不应该DOM和SAX方法混合在一起。一旦它们混合,其性能类似于只使用DOM。当你所有的SAX的性能优势碰巧先回答你的问题:

The general issue is that you shouldn't mix DOM and SAX methods together. Once you mix them, the performance is akin to just using DOM. The performance benefits of SAX happen when you go all in. To answer your questions first:

有没有人找到了一种方法来提高作家或设置一个方式来写
更少的时间?是否有可能加速这一进程的方法呢?

Has anyone found a way to improve the writer or setup a way to write fewer times? Are there methods that could speed up this process?

不要混合使用DOM操作的SAX的作家。这意味着你不应该有SDK类属性或功能的操作都没有。所以cell.Append()是的。所以是cell.DataType或cell.StyleIndex

Don't mix the SAX writer with DOM manipulations. This means you shouldn't have manipulations of the SDK class properties or functions at all. So cell.Append() is out. So is cell.DataType or cell.StyleIndex.

当你SAX,你去所有(听起来稍微挑衅...),例如:

When you do SAX, you go all in. (that sounds slightly provocative...) For example:

for (int i = 1; i <= 50000; ++i)
{
    oxa = new List<OpenXmlAttribute>();
    // this is the row index
    oxa.Add(new OpenXmlAttribute("r", null, i.ToString()));

    oxw.WriteStartElement(new Row(), oxa);

    for (int j = 1; j <= 100; ++j)
    {
        oxa = new List<OpenXmlAttribute>();
        // this is the data type ("t"), with CellValues.String ("str")
        oxa.Add(new OpenXmlAttribute("t", null, "str"));

        // it's suggested you also have the cell reference, but
        // you'll have to calculate the correct cell reference yourself.
        // Here's an example:
        //oxa.Add(new OpenXmlAttribute("r", null, "A1"));

        oxw.WriteStartElement(new Cell(), oxa);

        oxw.WriteElement(new CellValue(string.Format("R{0}C{1}", i, j)));

        // this is for Cell
        oxw.WriteEndElement();
    }

    // this is for Row
    oxw.WriteEndElement();
}



,其中氧是一个列表和oxw是SAX作家班OpenXmlWriter。在我的文章的这里

有对缓存操作SAX没有真正的方法。他们就像一系列的printf语句。你也许可以写一个辅助函数,只是做WriteStartElement(),WriteElement()和对writeEndElement()函数中的一大块(写例如一个完整的细胞类)。

There's no real way to cache the SAX operations. They're like a series of printf statements. You can probably write a helper function that just do the WriteStartElement(), WriteElement() and WriteEndElement() functions in a chunk (to write a complete Cell class for example).

这篇关于OpenXML的萨克斯方法导出100K +行到Excel快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆