用于Excel克隆的正确数据结构 [英] The right data structure to use for an Excel clone

查看:193
本文介绍了用于Excel克隆的正确数据结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请说我正在C#中的Excel克隆工作。
我的网格表示如下:

  private struct CellValue 
{
private int column ;
private int row;
私人字符串文字;
}
private List< CellValue> cellValues = new List< CellValue>();

每次用户添加一个文本,我只是将其打包成CellValue并将其添加到cellValues中。给定一个CellValue类型,我可以确定它的行和列在O(1)时间,这是伟大的。然而,给定列和行,我需要循环遍历整个cellValues,以查找该列和行中的哪个文本,这是非常慢的。另外,给定一个文本,我也需要循环遍及整个事情。是否有任何数据结构,我可以在O(1)时间内获得所有3个任务?



更新:
通过一些答案,认为我找到了一个我喜欢的。我可以:


  1. 不要保留2个以上的CellValue副本,以避免同步。在C世界中,我会很好地使用指针。

  2. 行和列可以动态添加(与Excel不同)。


解决方案

我将选择一个稀疏数组(链表的链表),以最小的存储空间提供最大的灵活性。



在这个例子中,你有一个行列的链接列表,每个元素指向该行中单元格的链接列表(您可以根据需要反转单元格和行)。

  | 
V
+ - + + --- + + --- +
| 1 | - > | 1.1 | ----------> | 1.3 | - :
+ - + + --- + + --- +
|
V
+ - + + --- +
| 7 | ----------> | 7.2 | - :
+ - + + --- +
|
=

每行元素都有行号,每个单元格都有一个指针到其行元素,以便从单元格获取行号为O(1)。



同样,每个单元格都有其列号,使得O(1 )



没有简单的方法可以获得O(1),以便在给定行/列中立即查找单元格,但是稀疏数组的速度与之前的速度一样快要获取,除非您为每个可能的单元格预先分配信息,以便可以对阵列进行索引查找 - 这在存储方面将非常浪费。



有一件事您可以做的是使一维不稀疏,例如使列成为主数组(而不是链表),并将其限制为1,000 - 这将使列查找索引(快速),然后在稀疏行上进行搜索。



我不认为你可以获得O(1)的文本查找,只是因为文本可以在多个单元格中复制s(不像行/列)。我仍然相信,稀疏数组将是搜索文本的最快方法,除非您在另一个数组中维护所有文本值的排序索引(同样可以使其更快,但以牺牲大量内存为代价)。 p>

Let say I'm working on an Excel clone in C#. My grid is represented as follows:

private struct CellValue
{
    private int column;
    private int row;
    private string text;
}
private List<CellValue> cellValues = new List<CellValue>();

Each time user add a text, I just package it as CellValue and add it into cellValues. Given a CellValue type, I can determine its row and column in O(1) time, which is great. However, given a column and a row, I need to loop through the entire cellValues to find which text is in that column and row, which is terribly slow. Also, given a text, I too need to loop through the entire thing. Is there any data structure where I can achive all 3 task in O(1) time?

Updated: Looking through some of the answers, I don't think I had found one that I like. Can I:

  1. Not keeping more than 2 copies of CellValue, in order to avoid sync-ing them. In C world I would have made nice use of pointers.
  2. Rows and Columns can be dynamically added (Unlike Excel).

解决方案

I would opt for a sparse array (a linked list of linked lists) to give maximum flexibility with minimum storage.

In this example, you have a linked list of rows with each element pointing to a linked list of cells in that row (you could reverse the cells and rows depending on your needs).

 |
 V
+-+    +---+             +---+
|1| -> |1.1| ----------> |1.3| -:
+-+    +---+             +---+
 |
 V
+-+             +---+
|7| ----------> |7.2| -:
+-+             +---+
 |
 =

Each row element has the row number in it and each cell element has a pointer to its row element, so that getting the row number from a cell is O(1).

Similarly, each cell element has its column number, making that O(1) as well.

There's no easy way to get O(1) for finding immediately the cell at a given row/column but a sparse array is as fast as it's going to get unless you pre-allocate information for every possible cell so that you can do index lookups on an array - this would be very wasteful in terms of storage.

One thing you could do is make one dimension non-sparse, such as making the columns the primary array (rather than linked list) and limiting them to 1,000 - this would make the column lookup indexed (fast), then a search on the sparse rows.

I don't think you can ever get O(1) for a text lookup simply because text can be duplicated in multiple cells (unlike row/column). I still believe the sparse array will be the fastest way to search for text, unless you maintain a sorted index of all text values in another array (again, that can make it faster but at the expense of copious amounts of memory).

这篇关于用于Excel克隆的正确数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆