如何使这个自定义工作表初始化更快? [英] How to make this custom worksheet initialization faster?

查看:117
本文介绍了如何使这个自定义工作表初始化更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


摘要


这个问题不知何故这个问题:

如何实施列从其索引中自我命名?



测试了上述链接问题答案中提供的代码后,我终于遇到了严重的性能问题。 >


性能问题


性能问题发生在Sheet初始化时,也就是当我初始化Sheet的单元格时。

 '''< summary> ; 
'''初始化Company.Project.Sheet类的一个实例。
'''< / summary>
'''< param name =nativeSheet>要初始化的本机工作表。< / param>
朋友子新(ByVal nativeSheet As Microsoft.Office.Interop.Excel.Worksheet)
_nativeSheet = nativeSheet
Dim cells As IDictionary(Of String,ICell)=新字典(Of String,ICell )()

'这些迭代会影响API的性能...'
对于rowIndex As Integer = 1 To _nativeSheet.Rows.Count步骤1
对于colIndex As Integer = 1到_nativeSheet.Columns.Count步骤1
Dim c As ICell = New Cell(_nativeSheet.Cells(rowIndex,colIndex))
cellules.Add(c.Name,c)
下一步
下一个

_cellules =新的ReadOnlyDictionary(Of String,ICell)(cells)
End Sub





  • ReadOnlyDictionary(TKey,TValue)

    一个自定义的只读字典,它只包含一个IDictionary(的TKey,TValue)以防止修改。



讨论


我以这种方式工作,因为底层的每个单元格电子表格工作表从工作表的初始化初始化,直到结束,即工作表被处理或完成时。因此,我希望初始化Sheet的单元格的方式相同,但是我也希望保持在命名的(A1)单元格上使用索引的单元格的性能提升,同时保持API用户的易用性指的是一个单元格,它的名称就是我打算使用字典,所以当我提到单元格A1时,我访问这个键到我的字典,并相应地对单元格(1,1)寻址。 >


  • 另外,我知道使用Worksheet.UsedRange属性从工作表读取更快的方式,将所有使用的单元格返回一个2D矩阵。


    如果与我可以初始化我的Cell类的多个实例的单元格集合有所不同,或者大致相同,这将是非常好的,而且性能优异!


  • 我也想到在内存中只有100×100矩阵单元格进行初始化,而将其与我的字典进行映射,因为很少使用整个工作表的单元格。因此,我仍然在想一种方式,我必须访问一个尚未初始化的单元格,我们假设单元格(120,120)。理想情况下,我认为,程序必须初始化最大初始化的单元格(100,100)到单元格(120,120)之间的所有单元格。我在这里清楚了吗随意要求澄清! =)


  • 另一个选择可能是我只将单元格的名称初始化为字典,并在内存中保留行和列索引,而不是初始化单元格实例与其nativeCell,说一个范围。这是我的Cell类的代码,以说明我的意思。



    '''
    '''表示工作表中的单元格。
    '''
    '''
    朋友类单元格
    实现ICell

      Private _nativeCell As Microsoft.Office.Interop.Excel.Range 
    Private _name As String

    '''< summary>
    '''初始化Company.Project.Cell类的新实例。
    '''< / summary>
    '''< param name =nativeCell>要包装的Microsoft.Office.Interop.Excel.Range。< / param>
    朋友子新(ByVal nativeCell As Microsoft.Office.Interop.Excel.Range)
    _nativeCell = nativeCell
    End Sub

    公共ReadOnly属性NativeCell()作为Microsoft .Office.Interop.Excel.Range实现ICellule.NativeCell
    获取
    返回_nativeCell
    结束获取
    结束属性

    公共ReadOnly属性列()As整数实现ICell.Column
    获取
    返回_nativeCell.Column
    结束获取
    结束属性

    公共ReadOnly属性Row()作为整数实现ICell.Row
    获取
    返回_nativeCell.Row
    结束获取
    结束属性

    公共ReadOnly属性名称()As String实现ICellule.Name
    获取
    If(String.IsNullOrEmpty(_name)OrElse _name.Trim()。Length = 0)Then _
    _name = GetColumnName()

    返回_nom
    结束获取
    结束属性

    公共属性值()作为对象实现ICellule.Va lue
    获取
    返回_nativeCell.Value2
    结束Get
    Set(ByVal value As Object)
    _nativeCell.Value2 = value
    End Set
    End Property

    公共ReadOnly属性FormattedValue()As String实现ICellule.FormattedValue
    获取
    返回_nativeCell.Text
    结束获取
    结束属性

    公共ReadOnly属性NumericValue()As Double?实现ICellule.NumericValue
    获取
    返回值
    结束获取
    结束属性





问题





  1. 我的其他选择是什么?


  2. 有没有其他方法可以走? / p>


  3. 有没有办法让实际的方法可行,因为性能问题?


对于您的信息,此问题在测试中超时,所以测试从未在可实现的几个世纪的可接受的时间范围内结束。



欢迎任何想法!我打开了其他解决方案或方法,这将帮助我在解决这个性能问题时达到这个目标。



感谢大家! =)


编辑#1


感谢 Maxim Gueivandov ,他的解决方案解决了我在这个问题上已经解决的问题。另外,另外还有一个问题来自于这个解决方案: SystemOutOfMemoryException ,这将在另一个问题中解决。



我非常感谢Maxim Gueivandov。

解决方案

以一跳的方式获取所使用的单元格,从而避免在每次迭代迭代中调用 Cells(rowIndex,colIndex)(我猜这个code>

  Dim usedRange As Range = nativeSheet.UsedRange 
Dim cells(,)As Object = DirectCast(usedRange.get_Value(_
XlRangeValueDataType.xlRangeValueDefault),Object(,))
[...做你的行/ col迭代...]

你会发现一些性能提示我在以下文章中基于这些假设: C#Excel Interop Use 。最值得注意的是,检查基准部分:


=== C#中的Excel interop基准===



单元格[]:30.0秒



get_Range(),Cells []:15.0秒



UsedRange,get_Value():1.5秒
[最快]



Summary

This question is somehow the follow-up to this question:
How to implement column self-naming from its index?

Having tested the code provided in this above-linked question's answers, I finally encountered a serious performance issue.

Performance issue

The performance issue occurs upon a Sheet initialization, that is, when I initialize the Sheet's Cells.

    ''' <summary>
    ''' Initialize an instance of the Company.Project.Sheet class.
    ''' </summary>
    ''' <param name="nativeSheet">The native worksheet from which to initialize.</param>
    Friend Sub New(ByVal nativeSheet As Microsoft.Office.Interop.Excel.Worksheet)
        _nativeSheet = nativeSheet
        Dim cells As IDictionary(Of String, ICell) = New Dictionary(Of String, ICell)()

        'These iterations hurt the performance of the API...'
        For rowIndex As Integer = 1 To _nativeSheet.Rows.Count Step 1
            For colIndex As Integer = 1 To _nativeSheet.Columns.Count Step 1
                Dim c As ICell = New Cell(_nativeSheet.Cells(rowIndex, colIndex))
                cellules.Add(c.Name, c)
            Next
        Next

        _cellules = New ReadOnlyDictionary(Of String, ICell)(cells)
    End Sub

  • ReadOnlyDictionary(Of TKey, TValue) :
    A custom read-only dictionary that simply wraps a IDictionary(Of TKey, TValue) to prevent modifications.

Discussion

I'm working this way since each cell in an underlying spreadsheet worksheet is initialized from the initialization of the worksheet until the end, that is, when the worksheet is disposed or finalized. Hence, the same way I wish to initialize the cells of a Sheet, but I also wish to keep the performance boost of using the indexed cells over the named ("A1") cells, while keeping the ease of use to the API user to refer to a cell with its name, that is how I intend to use the dictionary, so that when I refer to cell "A1", I access this key into my dictionary and address the cell (1, 1) accordingly.

  • Aside, I know of an even faster way to read from a worksheet using the Worksheet.UsedRange property that returns all of the used cells into a 2D matrix.

    If there was anyhow the same or about the same for the set of cells with which I could initialize multiple instances of my Cell class with, this would be great, and performant!

  • I also thought of initializing like only a 100 x 100 matrix cells in memory while mapping them with my dictionary, as one will rarely use the whole sheet's cells. As such, I am still thinking of a way where I would have to access a not yet initialized cell, let's say Cells(120, 120). Ideally, I think, the program would have to initialize all the cells between the maximum initially initialized Cell(100, 100) until Cell (120, 120). Am I clear enough here? Feel free to ask for clarification! =)

  • Another option could be that I only initialize the cells' names into the dictionary and keeping there row and column index in memory, not initializing a Cell instance with its nativeCell, say a Range. Here's the code of my Cell class to illustrate what I mean.

    ''' ''' Represents a cell in a worksheet. ''' ''' Friend Class Cell Implements ICell

    Private _nativeCell As Microsoft.Office.Interop.Excel.Range
    Private _name As String
    
    ''' <summary>
    ''' Initializes a new instance of the Company.Project.Cell class.
    ''' </summary>
    ''' <param name="nativeCell">The Microsoft.Office.Interop.Excel.Range to wrap.</param>
    Friend Sub New(ByVal nativeCell As Microsoft.Office.Interop.Excel.Range)
        _nativeCell = nativeCell
    End Sub
    
    Public ReadOnly Property NativeCell() As Microsoft.Office.Interop.Excel.Range Implements ICellule.NativeCell
        Get
            Return _nativeCell 
        End Get
    End Property
    
    Public ReadOnly Property Column() As Integer Implements ICell.Column
        Get
            Return _nativeCell.Column
        End Get
    End Property
    
    Public ReadOnly Property Row() As Integer Implements ICell.Row
        Get
            Return _nativeCell.Row
        End Get
    End Property
    
    Public ReadOnly Property Name() As String Implements ICellule.Name
        Get
            If (String.IsNullOrEmpty(_name) OrElse _name.Trim().Length = 0) Then _
                _name = GetColumnName()
    
            Return _nom
        End Get
    End Property
    
    Public Property Value() As Object Implements ICellule.Value
        Get
            Return _nativeCell.Value2
        End Get
        Set(ByVal value As Object)
            _nativeCell.Value2 = value
        End Set
    End Property
    
    Public ReadOnly Property FormattedValue() As String Implements ICellule.FormattedValue
        Get
            Return _nativeCell.Text
        End Get
    End Property
    
    Public ReadOnly Property NumericValue() As Double? Implements ICellule.NumericValue
        Get
            Return Value
        End Get
    End Property
    

Questions

  1. What are my other options?

  2. Are there any other ways to walk through?

  3. Is there a way I can make the actual approach viable as for performance concerns?

For your information, this issue timed out on testing, so the test never ended within an acceptable time range which actually take centuries...

Any thoughts are welcome! I'm open minded to other solutions or approach that will help me achieve this objective while addressing this performance issue.

Thanks to you all! =)

EDIT #1

Thanks to Maxim Gueivandov, his solution solves the issue I have addressed in this question.

Aside, there's another problem that arose from this solution: SystemOutOfMemoryException, and that will be addressed in another question.

My Sincerest Thanks to Maxim Gueivandov.

解决方案

You could try to get all cells in the used range in one hop, thus avoiding to call Cells(rowIndex, colIndex) on each iteration of iteration (I guess that Cells hides an interop call, which may have a performance impact).

Dim usedRange As Range = nativeSheet.UsedRange
Dim cells(,) As Object = DirectCast(usedRange.get_Value( _
    XlRangeValueDataType.xlRangeValueDefault), Object(,))
[... do your row/col iterations ...]

You'll find some performance tips on which I based these assumptions in the following article: C# Excel Interop Use. Most notably, check the benchmark part:

=== Excel interop benchmark in C# ===

Cells[]: 30.0 seconds

get_Range(), Cells[]: 15.0 seconds

UsedRange, get_Value(): 1.5 seconds [fastest]

这篇关于如何使这个自定义工作表初始化更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆