如何使这个自定义工作表初始化更快? [英] How to make this custom worksheet initialization faster?
问题描述
摘要
这个问题不知何故这个问题:
如何实施列从其索引中自我命名?
测试了上述链接问题答案中提供的代码后,我终于遇到了严重的性能问题。 >
性能问题
性能问题发生在Sheet初始化时,也就是当我初始化Sheet的单元格时。
'''< summary> ;
'''初始化Company.Project.Sheet类的一个实例。
'''< / summary>
'''< param name =nativeSheet>要初始化的本机工作表。< / param>
朋友子新(ByVal nativeSheet As Microsoft.Office.Interop.Excel.Worksheet)
_nativeSheet = nativeSheet
Dim cells As IDictionary(Of String,ICell)=新字典(Of String,ICell )()
'这些迭代会影响API的性能...'
对于rowIndex As Integer = 1 To _nativeSheet.Rows.Count步骤1
对于colIndex As Integer = 1到_nativeSheet.Columns.Count步骤1
Dim c As ICell = New Cell(_nativeSheet.Cells(rowIndex,colIndex))
cellules.Add(c.Name,c)
下一步
下一个
_cellules =新的ReadOnlyDictionary(Of String,ICell)(cells)
End Sub
- ReadOnlyDictionary(TKey,TValue):
一个自定义的只读字典,它只包含一个IDictionary(的TKey,TValue)以防止修改。
讨论
我以这种方式工作,因为底层的每个单元格电子表格工作表从工作表的初始化初始化,直到结束,即工作表被处理或完成时。因此,我希望初始化Sheet的单元格的方式相同,但是我也希望保持在命名的(A1)单元格上使用索引的单元格的性能提升,同时保持API用户的易用性指的是一个单元格,它的名称就是我打算使用字典,所以当我提到单元格A1时,我访问这个键到我的字典,并相应地对单元格(1,1)寻址。 >
-
另外,我知道使用Worksheet.UsedRange属性从工作表读取更快的方式,将所有使用的单元格返回一个2D矩阵。
如果与我可以初始化我的Cell类的多个实例的单元格集合有所不同,或者大致相同,这将是非常好的,而且性能优异! -
我也想到在内存中只有100×100矩阵单元格进行初始化,而将其与我的字典进行映射,因为很少使用整个工作表的单元格。因此,我仍然在想一种方式,我必须访问一个尚未初始化的单元格,我们假设单元格(120,120)。理想情况下,我认为,程序必须初始化最大初始化的单元格(100,100)到单元格(120,120)之间的所有单元格。我在这里清楚了吗随意要求澄清! =)
-
另一个选择可能是我只将单元格的名称初始化为字典,并在内存中保留行和列索引,而不是初始化单元格实例与其nativeCell,说一个范围。这是我的Cell类的代码,以说明我的意思。
'''
'''表示工作表中的单元格。
'''
'''
朋友类单元格
实现ICellPrivate _nativeCell As Microsoft.Office.Interop.Excel.Range
Private _name As String
'''< summary>
'''初始化Company.Project.Cell类的新实例。
'''< / summary>
'''< param name =nativeCell>要包装的Microsoft.Office.Interop.Excel.Range。< / param>
朋友子新(ByVal nativeCell As Microsoft.Office.Interop.Excel.Range)
_nativeCell = nativeCell
End Sub
公共ReadOnly属性NativeCell()作为Microsoft .Office.Interop.Excel.Range实现ICellule.NativeCell
获取
返回_nativeCell
结束获取
结束属性
公共ReadOnly属性列()As整数实现ICell.Column
获取
返回_nativeCell.Column
结束获取
结束属性
公共ReadOnly属性Row()作为整数实现ICell.Row
获取
返回_nativeCell.Row
结束获取
结束属性
公共ReadOnly属性名称()As String实现ICellule.Name
获取
If(String.IsNullOrEmpty(_name)OrElse _name.Trim()。Length = 0)Then _
_name = GetColumnName()
返回_nom
结束获取
结束属性
公共属性值()作为对象实现ICellule.Va lue
获取
返回_nativeCell.Value2
结束Get
Set(ByVal value As Object)
_nativeCell.Value2 = value
End Set
End Property
公共ReadOnly属性FormattedValue()As String实现ICellule.FormattedValue
获取
返回_nativeCell.Text
结束获取
结束属性
公共ReadOnly属性NumericValue()As Double?实现ICellule.NumericValue
获取
返回值
结束获取
结束属性
问题
-
我的其他选择是什么?
-
有没有其他方法可以走? / p>
-
有没有办法让实际的方法可行,因为性能问题?
对于您的信息,此问题在测试中超时,所以测试从未在可实现的几个世纪的可接受的时间范围内结束。
欢迎任何想法!我打开了其他解决方案或方法,这将帮助我在解决这个性能问题时达到这个目标。
感谢大家! =)
编辑#1
感谢 Maxim Gueivandov ,他的解决方案解决了我在这个问题上已经解决的问题。另外,另外还有一个问题来自于这个解决方案: SystemOutOfMemoryException
,这将在另一个问题中解决。
我非常感谢Maxim Gueivandov。
以一跳的方式获取所使用的单元格,从而避免在每次迭代迭代中调用 Cells(rowIndex,colIndex)
(我猜这个code>
Dim usedRange As Range = nativeSheet.UsedRange
Dim cells(,)As Object = DirectCast(usedRange.get_Value(_
XlRangeValueDataType.xlRangeValueDefault),Object(,))
[...做你的行/ col迭代...]
你会发现一些性能提示我在以下文章中基于这些假设: C#Excel Interop Use 。最值得注意的是,检查基准部分:
=== C#中的Excel interop基准===
单元格[]:30.0秒
get_Range(),Cells []:15.0秒
UsedRange,get_Value():1.5秒
[最快]
Summary
This question is somehow the follow-up to this question:
How to implement column self-naming from its index?
Having tested the code provided in this above-linked question's answers, I finally encountered a serious performance issue.
Performance issue
The performance issue occurs upon a Sheet initialization, that is, when I initialize the Sheet's Cells.
''' <summary>
''' Initialize an instance of the Company.Project.Sheet class.
''' </summary>
''' <param name="nativeSheet">The native worksheet from which to initialize.</param>
Friend Sub New(ByVal nativeSheet As Microsoft.Office.Interop.Excel.Worksheet)
_nativeSheet = nativeSheet
Dim cells As IDictionary(Of String, ICell) = New Dictionary(Of String, ICell)()
'These iterations hurt the performance of the API...'
For rowIndex As Integer = 1 To _nativeSheet.Rows.Count Step 1
For colIndex As Integer = 1 To _nativeSheet.Columns.Count Step 1
Dim c As ICell = New Cell(_nativeSheet.Cells(rowIndex, colIndex))
cellules.Add(c.Name, c)
Next
Next
_cellules = New ReadOnlyDictionary(Of String, ICell)(cells)
End Sub
- ReadOnlyDictionary(Of TKey, TValue) :
A custom read-only dictionary that simply wraps a IDictionary(Of TKey, TValue) to prevent modifications.
Discussion
I'm working this way since each cell in an underlying spreadsheet worksheet is initialized from the initialization of the worksheet until the end, that is, when the worksheet is disposed or finalized. Hence, the same way I wish to initialize the cells of a Sheet, but I also wish to keep the performance boost of using the indexed cells over the named ("A1") cells, while keeping the ease of use to the API user to refer to a cell with its name, that is how I intend to use the dictionary, so that when I refer to cell "A1", I access this key into my dictionary and address the cell (1, 1) accordingly.
Aside, I know of an even faster way to read from a worksheet using the Worksheet.UsedRange property that returns all of the used cells into a 2D matrix.
If there was anyhow the same or about the same for the set of cells with which I could initialize multiple instances of my Cell class with, this would be great, and performant!I also thought of initializing like only a 100 x 100 matrix cells in memory while mapping them with my dictionary, as one will rarely use the whole sheet's cells. As such, I am still thinking of a way where I would have to access a not yet initialized cell, let's say Cells(120, 120). Ideally, I think, the program would have to initialize all the cells between the maximum initially initialized Cell(100, 100) until Cell (120, 120). Am I clear enough here? Feel free to ask for clarification! =)
Another option could be that I only initialize the cells' names into the dictionary and keeping there row and column index in memory, not initializing a Cell instance with its nativeCell, say a Range. Here's the code of my Cell class to illustrate what I mean.
''' ''' Represents a cell in a worksheet. ''' ''' Friend Class Cell Implements ICell
Private _nativeCell As Microsoft.Office.Interop.Excel.Range Private _name As String ''' <summary> ''' Initializes a new instance of the Company.Project.Cell class. ''' </summary> ''' <param name="nativeCell">The Microsoft.Office.Interop.Excel.Range to wrap.</param> Friend Sub New(ByVal nativeCell As Microsoft.Office.Interop.Excel.Range) _nativeCell = nativeCell End Sub Public ReadOnly Property NativeCell() As Microsoft.Office.Interop.Excel.Range Implements ICellule.NativeCell Get Return _nativeCell End Get End Property Public ReadOnly Property Column() As Integer Implements ICell.Column Get Return _nativeCell.Column End Get End Property Public ReadOnly Property Row() As Integer Implements ICell.Row Get Return _nativeCell.Row End Get End Property Public ReadOnly Property Name() As String Implements ICellule.Name Get If (String.IsNullOrEmpty(_name) OrElse _name.Trim().Length = 0) Then _ _name = GetColumnName() Return _nom End Get End Property Public Property Value() As Object Implements ICellule.Value Get Return _nativeCell.Value2 End Get Set(ByVal value As Object) _nativeCell.Value2 = value End Set End Property Public ReadOnly Property FormattedValue() As String Implements ICellule.FormattedValue Get Return _nativeCell.Text End Get End Property Public ReadOnly Property NumericValue() As Double? Implements ICellule.NumericValue Get Return Value End Get End Property
Questions
What are my other options?
Are there any other ways to walk through?
Is there a way I can make the actual approach viable as for performance concerns?
For your information, this issue timed out on testing, so the test never ended within an acceptable time range which actually take centuries...
Any thoughts are welcome! I'm open minded to other solutions or approach that will help me achieve this objective while addressing this performance issue.
Thanks to you all! =)
EDIT #1
Thanks to Maxim Gueivandov, his solution solves the issue I have addressed in this question.
Aside, there's another problem that arose from this solution: SystemOutOfMemoryException
, and that will be addressed in another question.
My Sincerest Thanks to Maxim Gueivandov.
You could try to get all cells in the used range in one hop, thus avoiding to call Cells(rowIndex, colIndex)
on each iteration of iteration (I guess that Cells
hides an interop call, which may have a performance impact).
Dim usedRange As Range = nativeSheet.UsedRange
Dim cells(,) As Object = DirectCast(usedRange.get_Value( _
XlRangeValueDataType.xlRangeValueDefault), Object(,))
[... do your row/col iterations ...]
You'll find some performance tips on which I based these assumptions in the following article: C# Excel Interop Use. Most notably, check the benchmark part:
=== Excel interop benchmark in C# ===
Cells[]: 30.0 seconds
get_Range(), Cells[]: 15.0 seconds
UsedRange, get_Value(): 1.5 seconds [fastest]
这篇关于如何使这个自定义工作表初始化更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!