OpenXML Sax方法用于快速导出100K行到Excel [英] OpenXML Sax method for exporting 100K+ rows to Excel fast

查看:305
本文介绍了OpenXML Sax方法用于快速导出100K行到Excel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力提高写入xlsx的SAX方法的性能。我知道Excel中有1048576行的限制。我已经达到这个限制只有几次。在大多数情况下,我只写出大约125K到250K行(大数据集)。我尝试的代码似乎并没有像它会写入文件的次数那么快。我希望有一些缓存涉及,但仍然似乎有这样的方式太多的磁盘访问代码的工作方式现在。

I have been trying to improve the performance of the SAX method for writing to an xlsx. I know there is a limit of 1048576 rows in Excel. I have hit this limit only a few times. In most cases though I only write out about 125K to 250K rows (a large dataset). The code that I have tried doesn't seem to be as fast as it could be because of the many times it will write to the file. I would hope that there is some caching involved but it still seems like there is way too much disk access in the way the code works now.

下面的代码类似于使用模板与OpenXML和SAX ,因为我已经写入一个文件使用ClosedXML然后切换到SAX的大内容。当尝试对此行使用ClosedXML时,内存将关闭图表。所以这就是为什么我使用SAX。

The code below is similar to Using a template with OpenXML and SAX because I have written to a file using ClosedXML and then switch to SAX for the large content. The memory goes off the charts when trying to use ClosedXML for this many rows. So that is why I am using SAX.

        int numCols = dt.Columns.Count;
        int rowCnt = 0;
        //for (curRec = 0; curRec < totalRecs; curRec++)
        foreach (DataRow row in dt.Rows)
        {
            Row xlr = new Row();

            //starting of new row.
            //writer.WriteStartElement(xlr);

            for (int col = 0; col < numCols; ++col)
            {
                Cell cell = new Cell();
                CellValue v = new CellValue(row[col].ToString());

                {
                    string objDataType = row[col].GetType().ToString();
                    if (objDataType.Contains(TypeCode.Int32.ToString()) || objDataType.Contains(TypeCode.Int64.ToString()))
                    {
                        cell.DataType = new EnumValue<CellValues>(CellValues.Number);
                        //cell.CellValue = new CellValue(row[col].ToString());
                        cell.Append(v);
                    }
                    else if (objDataType.Contains(TypeCode.Decimal.ToString()) || objDataType.Contains("Single"))
                    {
                        cell.DataType = new EnumValue<CellValues>(CellValues.Number);
                        cell.Append(v);
                        //TODO: set the decimal qualifier - May be fixed elsewhere
                        cell.StyleIndex = 2;
                    }
                    else
                    {
                        //Add text to text cell
                        cell.DataType = new EnumValue<CellValues>(CellValues.String);
                        cell.Append(v);
                    }
                }

                if (colStyles != null && col < colStyles.Count)
                {
                    cell.StyleIndex = (UInt32Value)colStyles[col];
                }

                //writer.WriteElement(cell);
                xlr.Append(cell);
            }
            writer.WriteElement(xlr);
            //end row element
            //writer.WriteEndElement();
            ++rowCnt;
        }

这段代码非常接近我在那里看到的例子。但问题仍然很慢。从单独的单元格写入到追加到行和写入行的更改似乎在125K行上改进了10%的进程。

This code is very close to examples I have seen out there. But the problem is it is still pretty slow. Changing from the individual cell writing to appending to the row and writing the row seems to improved the process by 10% on 125K rows.

有没有人找到一种方法来改进作家还是设置一种写较少次的方式?有没有方法可以加速这个过程?

Has anyone found a way to improve the writer or setup a way to write fewer times? Are there methods that could speed up this process?

有没有人试图设置某种形式的缓存来提高性能?

Has anyone tried to setup some form of caching to improve performance?

推荐答案

一般问题是您不应该将DOM和SAX方法混合在一起。一旦混合使用,性能类似于使用DOM。 SAX的性能优势发生在您全部进入时。首先要回答您的问题:

The general issue is that you shouldn't mix DOM and SAX methods together. Once you mix them, the performance is akin to just using DOM. The performance benefits of SAX happen when you go all in. To answer your questions first:


有没有人找到一种方法来改善作者或设置一个方式来写
更少的次数?有没有可以加快这个过程的方法?

Has anyone found a way to improve the writer or setup a way to write fewer times? Are there methods that could speed up this process?

不要将SAX writer与DOM操作混合在一起。这意味着你根本不应该对SDK类的属性或函数进行操作。所以cell.Append()出来了。所以是cell.DataType或cell.StyleIndex。

Don't mix the SAX writer with DOM manipulations. This means you shouldn't have manipulations of the SDK class properties or functions at all. So cell.Append() is out. So is cell.DataType or cell.StyleIndex.

当你做SAX时,你全部进入(听起来有点挑衅...)例如:

When you do SAX, you go all in. (that sounds slightly provocative...) For example:

for (int i = 1; i <= 50000; ++i)
{
    oxa = new List<OpenXmlAttribute>();
    // this is the row index
    oxa.Add(new OpenXmlAttribute("r", null, i.ToString()));

    oxw.WriteStartElement(new Row(), oxa);

    for (int j = 1; j <= 100; ++j)
    {
        oxa = new List<OpenXmlAttribute>();
        // this is the data type ("t"), with CellValues.String ("str")
        oxa.Add(new OpenXmlAttribute("t", null, "str"));

        // it's suggested you also have the cell reference, but
        // you'll have to calculate the correct cell reference yourself.
        // Here's an example:
        //oxa.Add(new OpenXmlAttribute("r", null, "A1"));

        oxw.WriteStartElement(new Cell(), oxa);

        oxw.WriteElement(new CellValue(string.Format("R{0}C{1}", i, j)));

        // this is for Cell
        oxw.WriteEndElement();
    }

    // this is for Row
    oxw.WriteEndElement();
}

其中,oxa是List,oxw是SAX writer类OpenXmlWriter。关于我的文章的更多细节这里

where oxa is a List and oxw is the SAX writer class OpenXmlWriter. More details on my article here.

没有真正的方法缓存SAX操作。它们就像一系列的printf语句。你可以编写一个帮助函数,它只是在一个块中写入WriteStartElement(),WriteElement()和WriteEndElement()函数(例如写一个完整的Cell类)。

There's no real way to cache the SAX operations. They're like a series of printf statements. You can probably write a helper function that just do the WriteStartElement(), WriteElement() and WriteEndElement() functions in a chunk (to write a complete Cell class for example).

这篇关于OpenXML Sax方法用于快速导出100K行到Excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆