使用什么架构来解决这个SystemOutOfMemoryException,同时允许我实例化一个工作表的单元格? [英] What architecture to use to address this SystemOutOfMemoryException while allowing me to instantiate the cells of a sheet?

查看:123
本文介绍了使用什么架构来解决这个SystemOutOfMemoryException,同时允许我实例化一个工作表的单元格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


摘要


这个问题是aa的后续行动希望构建一个简单的电子表格API,同时保持对那些了解Excel的用户友好。



总而言之,这个问题与以下两个有关: br>
1. 如何实施列自从其索引命名?;

2. 如何使自定义工作表初始化更快?


目标


提供一个简化的Excel API,作为包含在nevralgic组件的包装,如应用程序工作簿工作表范围类/接口


使用示例


这个用法示例是从单元测试中得到启发的,这使我能够将此解决方案提升到现在的位置。

  Dim file as String =C:\Temp\WriteTest.xls

使用mgr As ISpreadsheetManager =新的SpreadsheetManager()
Dim wb as IWorkbook = mgr.CreateWorkbook()
wb.Sheets(Sheet1)。单元格(A1)。值= 3.1415926
wb.SaveAs(file)
结束使用

现在我们打开它:

  Dim file as String =C:\Temp\WriteTest.xls

使用mgr As ISpreadsheetManager = New SpreadsheetManager()
Dim wb as IWorkbook = mgr.OpenWorkbook(file)
//在这里使用工作簿
结束使用




讨论


实例化Excel工作簿时:


  1. 工作表的一个实例是在Workbook.Sheets集合中自动初始化;

  2. 初始化时,Worksheet可以通过 Range 对象初始化其单元格,该对象可以表示一个或多个单元格。

一旦工作表存在,这些单元格就可以立即被所有的属性访问。



我的愿望是重现这种行为,以便


  1. Workbook类构造函数初始化Workbook.Sheets集合属性本机表;

  2. Worksheet类构造函数使用本机单元格初始化Worksheet.Cells集合属性。

我的问题来自Worksheet类构造函数,同时初始化#2中说明的Worksheet.Cells集合属性。


汝ghts


以下这些上述链接的问题遇到问题,我希望找出另一种架构,让我:


  1. 需要时访问单元格范围的特定功能

  2. 通过我的 ICell 界面提供最常用的属性;

  3. 访问所有范围工作表的单元格从其初始化。

同时记住访问 Range.Value 属性是使用Interop的基础Excel应用程序实例可能最快的交互。



所以,我想到初始化我的 ReadonlyOnlyDictionary(Of String,ICell)与单元格的名称,而不会立即包装 Range 接口的实例所以我只需要生成行和列索引以及单元格的名称来索引我的字典t母鸡,只有当想要访问或格式化特定的单元格或单元格范围时才分配 Cell.NativeCell 属性。



这样,字典中的数据将使用从 Worksheet 类构造函数中生成的列索引获取的单元格的名称进行索引。然后,当你这样做时:

 使用mgr As ISpreadsheetManager =新的SpreadsheetManager()
Dim wb As IWorkbook = mgr.CreateWorkbook()
wb.Sheet(1).Cells(A1)。Value = 3.1415926 //#1:
End Using
pre>

#1:这将允许我使用我的单元格将给定的值写入特定的单元格,这比使用其名称直接对 Range 更快。


问题和疑虑


此外, code> UsedRange.get_Value()或 Cells.get_Value(),这将返回Object(,)数组。 / p>

1。所以我应该对使用 Object(,)数组的单元格感到高兴,而不用将其格式化为某种方式?



2。如何构建这些Worksheet和Cell类,以便在使用 Object(,)数组时提供最佳性能,同时保持Cell实例可能表示或包装的可能性单个单元格范围?


感谢您有任何时间阅读我的文章和我的真诚感谢那些回答的人。



解决方案

我命名为 CellCollection 的对象类。这是它的作用:



根据这些假设:



  1. 鉴于Excel工作表有256列和65536行;


  2. 鉴于需要16,777,216(256 * 65536)个单元格一次被实例化;


  3. 鉴于工作表的最常用用法少于1,000行,少于100列; >


  4. 鉴于我需要它能够使用他们的地址(A1)来引用单元格;和


  5. 鉴于它是基准测试,一次访问所有值,并将它们加载到对象[,] 作为处理底层Excel工作表的最快方式,*



我已经考虑不实例化任何单元格,让我的 IWorksheet中的 CellCollection 属性接口被初始化并且在实例化之后为空,除了现有的工作簿。所以,当打开一个工作簿时,我验证 NativeSheet.UsedRange 是空的或返回null(在Visual Basic中没有),否则,我已经得到了使用的本机单元格在内存中,只有在我们的内部 CellCollection 字典中添加它们时,只能将它们添加到相应地址中。



最后,懒惰初始化设计模式抢救! =)

  public class Sheet:ISheet {
public Worksheet(Microsoft.Office.Interop.Excel.Worksheet nativeSheet) {
NativeSheet = nativeSheet;
Cells = new CellCollection(this);
}

public Microsoft.Office.Interop.Excel.Worksheet NativeSheet {get;私人集}

public CellCollection Cells {get;私人集}
}

public sealed class CellCollection {
private IDictionary< string,ICell> _细胞;
private ReadOnlyDictionary< string,ICell> _readonlyCells;

public CellCollection(ISheet sheet){
_cells = new Dictionary< string,ICell>();
_readonlyCells = new ReadonlyDictionary< string,ICell>(_ cells);
Sheet = sheet;
}

public readonly ReadOnlyDictionary< string,ICell>单元格(字符串地址){
get {
if(string.IsNullOrEmpty(addresses)|| 0 = address.Trim()。Length)
throw new ArgumentNullException(addresses);如果(!Regex.IsMatch(addresses,(([A-Za-z] {1,2,3} [0-9] *)[:,] *)))
throw new FormatException(addresses);

foreach(address.Split(,)中的字符串地址{
Microsoft.Office.Interop.Excel.Range range = Sheet.NativeSheet.Range(address)

foreach(范围内的Microsoft.Office.Interop.Excel.Range单元格){
ICell c = null;
if(!_cells.TryGetValue(cell.Address(false,false),c) ){
c = new Cell(cell);
_cells.Add(c.Name,c);
}
}
}

return _readonlyCells;
}
}

public readonly ISheet Sheet {get; private set;}
}
/ pre>

显然,这是第一次尝试,迄今为止,它的工作原理很好,超过了可接受的性能,但是,我觉得可以使用一些尽管我现在将以这种方式使用它,但是如果需要,可以稍后进行优化。



写了这个集合,我能够达到预期的行为。现在,我将尝试实现一些.NET接口,使其可用于一些 IEnumerable IEnumerable ICollection ICollection< T> 等,以便分别被认为是一个真正的.NET集合。



随意发表评论,并为此代码提供建设性的替代方案和/或更改,以使其可能比现在更大。



我希望这有一天会有用的。



感谢阅读! =)


Summary

This question is the follow-up of a a desire to architect a simple spreadsheet API while keeping it user-friendly to those who know Excel well.

To sum it up, this question is related to these below two:
1. How to implement column self-naming from its index?;
2. How to make this custom worksheet initialization faster?.

Objective

To provide a simplified Excel API used as a wrapper over the nevralgic components such as the Application, the Workbook, the Worksheet and the Range classes/interfaces while exposing only the most commonly used object properties for each of these.

Usage example

This usage example is inspired from the unit tests that allowed me to bring this solution up to where it stands now.

Dim file as String = "C:\Temp\WriteTest.xls"

Using mgr As ISpreadsheetManager = New SpreadsheetManager()
    Dim wb as IWorkbook = mgr.CreateWorkbook()
    wb.Sheets("Sheet1").Cells("A1").Value = 3.1415926
    wb.SaveAs(file)
End Using

And now we open it:

Dim file as String = "C:\Temp\WriteTest.xls"

Using mgr As ISpreadsheetManager = New SpreadsheetManager()
    Dim wb as IWorkbook = mgr.OpenWorkbook(file)
    // Working with workbook here...
End Using

Discussion

While instantiating an Excel Workbook:

  1. An instance of a Worksheet is automatically initialized in the Workbook.Sheets collection;
  2. Upon initialization, a Worksheet initializes its Cells through the Range object that can represent one or multiple cells.

These Cells are immediately accessible with all their properties as soon as the Worksheet exists.

My wish is to reproduce this behaviour so that

  1. The Workbook class constructor initializes the Workbook.Sheets collection property with the native sheets;
  2. The Worksheet class constructor initializes the Worksheet.Cells collection property with the native cells.

My problem comes from the Worksheet class constructor while initializing the Worksheet.Cells collection property illustrated at #2.

Thoughts

Following these above-linked questions encountered issues, I wish to figure out another architecture that would allow me to:

  1. Access specific feature of a cell Range when required;
  2. Deliver most commonly used properties through my ICell interface;
  3. Having access to all of the Range cells of a worksheet from its initialization.

While keeping in mind that accessing a Range.Value property is the fastest interaction possible with the underlying Excel application instance using the Interop.

So, I thought of initializing my ReadonlyOnlyDictionary(Of String, ICell) with the name of the cells without immediately wrapping an instance of the Range interface so that I would simply generate the row and column indexes along with the cell's name to index my dictionary, then, assigning the Cell.NativeCell property only when one wants to access or format a specific cell or cell range.

That way, the data in the dictionary would be indexed with the name of the cells obtained from the column indexes generated in the Worksheet class constructor. Then, when one would do this:

Using mgr As ISpreadsheetManager = New SpreadsheetManager()
    Dim wb As IWorkbook = mgr.CreateWorkbook()
    wb.Sheet(1).Cells("A1").Value = 3.1415926 // #1:
End Using

#1: This would allow me to use the indexes from my Cell class to write the given value to the specific cell, which is faster then using its name directly against the Range.

Questions and Concerns

Besides, when working with UsedRange.get_Value() or Cells.get_Value(), this returns Object(,) arrays.

1. So should I just be happy with working with Object(,) arrays for cells, without having the possibility to format it somehow?

2. How to architect these Worksheet and Cell classes so that I have the best performance offered while working with Object(,) arrays, while keeping the possibility that a Cell instance may represent or wrap a single cell Range?

Thanks to any of you who takes the time to read my post and my sincerest thanks to those who answer.

解决方案

The used architecture has gone through an object class that I named CellCollection. Here's what it does:

Based on these hypothesis:

  1. Given that an Excel worksheet has 256 columns and 65536 lines;

  2. Given that 16,777,216 (256 * 65536) cells needed to be instantiated at a time;

  3. Given that the most common use of a worksheet takes less then 1,000 lines and less than 100 columns;

  4. Given that I needed it to be able to refer to the cells with their addresses ("A1"); and

  5. Given that it is benchmarked that accessing all the values at once and load them into a object[,] in memory as being the fastest way to work with an underlying Excel worksheet,*

I have considered not to instantiate any of the cells, letting my CellCollection property within my IWorksheet interface initialized and empty upon instantiation, except for an existing workbook. So, when opening a workbook, I verify that NativeSheet.UsedRange is empty or return null (Nothing in Visual Basic), otherwise, I have already gotten the used "native cells" in memory so that only remains to add them in my internal CellCollection dictionary while indexing them with their respective address.

Finally, Lazy Initialization Design Pattern to the rescue! =)

public class Sheet : ISheet {
    public Worksheet(Microsoft.Office.Interop.Excel.Worksheet nativeSheet) {
        NativeSheet = nativeSheet;
        Cells = new CellCollection(this);
    }

    public Microsoft.Office.Interop.Excel.Worksheet NativeSheet { get; private set; }

    public CellCollection Cells { get; private set; }
}

public sealed class CellCollection {
    private IDictionary<string, ICell> _cells;
    private ReadOnlyDictionary<string, ICell> _readonlyCells;

    public CellCollection(ISheet sheet) {
        _cells = new Dictionary<string, ICell>();
        _readonlyCells = new ReadonlyDictionary<string, ICell>(_cells);
        Sheet = sheet;
    }

    public readonly ReadOnlyDictionary<string, ICell> Cells(string addresses) {
        get {
            if (string.IsNullOrEmpty(addresses) || 0 = address.Trim().Length)
                throw new ArgumentNullException("addresses");

            if (!Regex.IsMatch(addresses, "(([A-Za-z]{1,2,3}[0-9]*)[:,]*)"))
                throw new FormatException("addresses");

            foreach(string address in addresses.Split(",") {
                Microsoft.Office.Interop.Excel.Range range = Sheet.NativeSheet.Range(address)

                foreach(Microsoft.Office.Interop.Excel.Range cell in range) {
                    ICell c = null;
                    if (!_cells.TryGetValue(cell.Address(false, false), c)) { 
                        c = new Cell(cell);
                        _cells.Add(c.Name, c);
                    }
                }
            }

            return _readonlyCells;
        }
    }

    public readonly ISheet Sheet { get; private set; }
}

Obviously, this is a first try shot, and it works just fine so far, with more than acceptable performance. Humbly though, I feel like it could use some optimizations, though I will use it this way for now, and optimize it later if needed.

After having written this collection, I was able to come to the expected behaviour. Now, I shall try to implement some of the .NET interfaces to make it useable against some IEnumerable, IEnumerable<T>, ICollection, ICollection<T>, etc. so that it may respectively be considered as a true .NET collection.

Feel free to comment and bring constructive alternatives and/or changes to this code so that it may become even greater than it currently is.

I DO hope this will serve one's purpose someday.

Thanks for reading! =)

这篇关于使用什么架构来解决这个SystemOutOfMemoryException,同时允许我实例化一个工作表的单元格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆