在实时(未保存)Excel 数据和 C# 对象之间进行交互的最快方法 [英] Fastest way to interface between live (unsaved) Excel data and C# objects

查看:35
本文介绍了在实时(未保存)Excel 数据和 C# 对象之间进行交互的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道在打开的 Excel 工作簿和 C# 对象之间读取和写入数据的最快方法是什么.背景是我想开发一个从 Excel 中使用并使用 Excel 中保存的数据的 c# 应用程序.

I want to know what the fastest way is of reading and writing data to and from an open Excel workbook to c# objects. The background is that I want to develop a c# application that is used from Excel and uses data held in excel.

业务逻辑将驻留在 c# 应用程序中,但数据将驻留在 Excel 工作簿中.用户将使用 Excel 并单击 Excel 工作簿上的按钮(或执行类似操作)以启动 c# 应用程序.然后,c# 应用程序将从 Excel 工作簿中读取数据,处理数据,然后将数据写回 Excel 工作簿.
可能有许多数据块需要读取并写回 Excel 工作簿,但它们通常具有相对较小的大小,例如 10 行和 20 列.有时可能需要处理大量数据,大约 50,000 行和 40 列.

The business logic will reside in the c# application but the data will reside in an Excel workbook. The user will be using Excel and will click a button (or do something similar) on the excel workbook to initiate the c# application. The c# application will then read data off the excel workbook, process the data, and then write data back to the excel workbook.
There may be numerous blocks of data that are required to be read off and written back to the excel workbook but they will normally be of a relatively small size, say 10 rows and 20 columns. Occasionally a large list of data may need to be processed, of the order of 50,000 rows and 40 columns.

我知道使用 VSTO 说起来相对容易,但我想知道最快(但仍然健壮和优雅)的解决方案是什么,并了解速度.我不介意该解决方案是推荐使用第三方产品还是使用 C++.

I know that this is relatively easy to do say using VSTO but I want to know what the fastest (but still robust and elegant) solution is and get an idea of the speed. I don't mind if the solution recommends using third party products or uses C++.

显而易见的解决方案是使用 VSTO 或互操作,但我不知道与我目前用于读取数据的 VBA 相比性能如何,或者是否有任何其他解决方案.

The obvious solution is using VSTO or interop but I don't know what the performance is like versus VBA which I'm currently using to read in the data, or if there are any other solutions.

这是在专家交流上发布的,说 VSTO 比 VBA 慢得多,但那是几年前的事了,我不知道性能是否有所提高.

This was posted on experts exchange saying that VSTO was dramatically slower than VBA but that was a couple of years ago and I don't know if the performance has improved.

http://www.experts-exchange.com/Microsoft/开发/VSTO/Q_23635459.html

谢谢.

推荐答案

如果 C# 应用程序是一个独立的应用程序,那么您将始终涉及到跨进程封送处理,这将淹没您可以通过切换语言进行的任何优化,比如说,C# 到 C++.在这种情况下坚持使用您最喜欢的语言,这听起来像是 C#.

If the C# application is a stand-alone application, then you will always have cross-process marshaling involved that will overwhelm any optimizations you can do by switching languages from, say, C# to C++. Stick to your most preferred language in this situation, which sounds like is C#.

如果您愿意制作一个在 Excel 运行的加载项,那么您的操作将避免跨进程调用并且运行速度提高约 50 倍.

If you are willing to make an add-in that runs within Excel, however, then your operations will avoid cross-process calls and run about 50x faster.

如果您在 Excel 中作为加载项运行,那么 VBA 是最快的选项之一,但它仍然涉及 COM,因此使用 XLL 加载项的 C++ 调用将是最快的.但是在调用 Excel 对象模型方面,VBA 仍然相当快.然而,就实际计算速度而言,VBA 作为 pcode 运行,而不是作为完全编译的代码运行,因此执行速度比本机代码慢 2-3 倍.这听起来很糟糕,但这并不是因为典型 Excel 加载项或应用程序所花费的绝大多数执行时间都涉及对 Excel 对象模型的调用,因此 VBA 与完全编译的 COM 加载项相比,例如使用本机编译的 VB 6.0,只会慢 5-15%,这并不明显.

If you run within Excel as an add-in, then VBA is among the fastest options, but it does still involve COM and so C++ calls using an XLL add-in would be fastest. But VBA is still quite fast in terms of calls to the Excel object model. As for actual calculation speed, however, VBA runs as pcode, not as fully compiled code, and so executes about 2-3x slower than native code. This sounds very bad, but it isn't because the vast majority of the execution time taken with a typical Excel add-in or application involves calls to the Excel object model, so VBA vs. a fully compiled COM add-in, say using natively compiled VB 6.0, would only be about 5-15% slower, which is not noticeable.

VB 6.0 是一种已编译的 COM 方法,对于非 Excel 相关调用,其运行速度比 VBA 快 2-3 倍,但此时 VB 6.0 已有大约 12 年的历史,并且不会在 64 位模式下运行,比如安装Office 2010,可以安装运行 32 位或 64 位.目前 64 位 Excel 的使用很少,但使用量会增加,因此我会避免使用 VB 6.0.

VB 6.0 is a compiled COM approach, and runs 2-3x faster than VBA for non-Excel related calls, but VB 6.0 is about 12 years old at this point and won't run in 64 bit mode, say if installing Office 2010, which can be installed to run 32 bit or 64 bit. Usage of 64 bit Excel is tiny at the moment, but will grow in usage, and so I would avoid VB 6.0 for this reason.

C#,如果在进程内作为 Excel 加载项运行,则对 Excel 对象模型的调用执行速度与 VBA 一样快,并且执行非 Excel 调用的速度比 VBA 快 2-3 倍——如果未进行垫片运行.然而,Microsoft 推荐的方法是完全填充运行,例如,通过使用 COM 垫片向导.通过填充,Excel 可以免受您的代码(如果它有问题)的影响,并且您的代码完全不受其他 3rd 方加载项的影响,否则这些加载项可能会导致问题.然而,这样做的缺点是,垫片解决方案在单独的 AppDomain 中运行,这需要跨 AppDomain 封送处理,这会导致执行速度降低约 40 倍——这在许多情况下非常明显.

C#, if running in-process as an Excel add-in would execute calls to the Excel object model as fast as VBA, and execute non-Excel calls 2-3x faster than VBA -- if running unshimmed. The approach recommended by Microsoft, however, is to run fully shimmed, for example, by making use of the COM Shim Wizard. By being shimmed, Excel is protected from your code (if it's faulty) and your code is fully protected from other 3rd party add-ins that could otherwise potentially cause problems. The down-side to this, however, is that a shimmed solution runs within a separate AppDomain, which requires cross-AppDomain marshaling that incurrs an execution speed penalty of about 40x -- which is very noticeable in many contexts.

使用 Visual Studio Tools for Office (VSTO) 的加载项会自动加载到 shim 中并在单独的 AppDomain 中执行.如果使用 VSTO,则无法避免这种情况.因此,对 Excel 对象模型的调用也会导致执行速度降低大约 40 倍.VSTO 是一个用于制作非常丰富的 Excel 加载项的华丽系统,但执行速度对于您这样的应用程序来说是它的弱点.

Add-ins using Visual Studio Tools for Office (VSTO) are automatically loaded within a shim and executes within a separate AppDomain. There is no avoiding this if using VSTO. Therefore, calls to the Excel object model would also incur an approximately 40x execution speed degradation. VSTO is a gorgeous system for making very rich Excel add-ins, but execution speed is its weakness for applications such as yours.

ExcelDna 是一个免费的开源项目,允许您使用 C# 代码,然后将其转换为使用 C++ 代码的 XLL 加载项.也就是说,ExcelDna 会解析您的 C# 代码并为您创建所需的 C++ 代码.我自己没用过,但对流程很熟悉,印象非常深刻.ExcelDna 从使用它的人那里得到了非常好的评价.

ExcelDna is a free, open source project that allows you to use C# code, which is then converted for you to an XLL add-in that uses C++ code. That is, ExcelDna parses your C# code and creates the required C++ code for you. I've not used it myself, but I am familiar with the process and it's very impressive. ExcelDna gets very good reviews from those that use it.

您可能还想查看 Add-in Express.它不是免费的,但它允许您使用 C# 进行编码,尽管它将您的解决方案填充到一个单独的 AppDomain 中,但我相信它的执行速度非常出色.如果我正确理解了它的执行速度,那么我不确定 Add-in Express 是如何做到这一点的,但它可能正在利用称为 FastPath AppDomain 封送处理的东西.但是,请不要引用我的任何内容,因为我对 Add-in Express 不是很熟悉.不过,您应该检查一下并进行自己的研究.

You also might want to look at Add-in Express. It's not free, but it would allow you to code in C# and although it shims your solution into a separate AppDomain, I believe that it's execution speed is outstanding. If I am understanding its execution speed correctly, then I'm not sure how Add-in Express doing this, but it might be taking advantage of something called FastPath AppDomain marshaling. Don't quote me on any of this, however, as I'm not very familiar with Add-in Express. You should check it out though and do your own research.

我的建议是研究 Add-in Express 和 ExcelDna.这两种方法都允许您使用您似乎最熟悉的 C# 进行编码.

My advice would be to research Add-in Express and ExcelDna. Both approaches would allow you to code using C#, which you seem most familiar with.

另一个主要问题是您如何拨打电话.例如,Excel 在处理作为数组来回传递的整个数据范围时非常快.这比单独循环遍历单元格要高效得多.例如,以下代码使用 Excel.Range.set_Value 访问器方法将 10 x 10 值数组一次性分配给 10 x 10 单元格范围:

The other main issue is how you make your calls. For example, Excel is very fast when handling an entire range of data passed back-and-forth as an array. This is vastly more efficient than looping through the cells individually. For example, the following code makes use of the Excel.Range.set_Value accessor method to assign a 10 x 10 array of values to a 10 x 10 range of cells in one shot:

void AssignArrayToRange()
{
    // Create the array.
    object[,] myArray = new object[10, 10];

    // Initialize the array.
    for (int i = 0; i < myArray.GetLength(0); i++)
    {
        for (int j = 0; j < myArray.GetLength(1); j++)
        {
            myArray[i, j] = i + j;
        }
    }

    // Create a Range of the correct size:
    int rows = myArray.GetLength(0);
    int columns = myArray.GetLength(1);
    Excel.Range range = myWorksheet.get_Range("A1", Type.Missing);
    range = range.get_Resize(rows, columns);

    // Assign the Array to the Range in one shot:
    range.set_Value(Type.Missing, myArray);
}

同样可以使用 Excel.Range.get_Value 访问器方法在一个步骤中从范围中读取一组值.这样做然后循环遍历数组中的值比单独循环遍历区域单元格中的值要快得多.

One can similarly make use of the Excel.Range.get_Value accessor method to read an array of values from a range in one step. Doing this and then looping through the values within the array is vastly faster than looping trough the values within the cells of the range individually.

这篇关于在实时(未保存)Excel 数据和 C# 对象之间进行交互的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆