为什么我们使用数组而不是其他数据结构? [英] Why do we use arrays instead of other data structures?

查看:106
本文介绍了为什么我们使用数组而不是其他数据结构?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在编程时,我没有看到一个数组比其他形式存储信息更好的一个实例。我确实认为编程语言中增加的功能已经改进了,并且被替换了。我现在看到,他们没有被替换,而是给予新的生活,可以这么说。



所以,基本上,使用数组的要点是什么?



这不是我们从计算机的角度来使用数组,而是为什么要从编程的角度来使用数组(一个微妙的区别)。计算机对阵列做的不是问题的一个重点。

解决方案

时间回到课堂上。虽然我们今天没有以我们高贵的管理语言考虑这些东西,但是它们建立在同一个基础上,所以让我们来看看C中如何管理内存。



在我潜入之前,快速说明指针是什么意思。一个指针只是一个变量,它指向内存中的一个位置。它不包含在这个内存区域的实际值,它包含它的内存地址。想像一个内存块作为邮箱。指针将是该邮箱的地址。



在C中,数组只是一个带有偏移量的指针,该偏移量指定内存中有多远。这提供了 O(1)访问时间。

  MyArray [5] 
^ ^
指针偏移

所有其他数据结构都建立在此基础上,或不使用相邻内存进行存储,导致随机访问查找时间差(尽管还有其他有利于不使用顺序存储器)。



例如,假设我们有一个数组,其中有6个数字(6,4,2,3,1,5)在记忆中,它将如下所示:

  =============== 
| 6 | 4 | 2 | 3 | 1 | 5 |
===================================

在数组中,我们知道每个元素在内存中彼此相邻。 AC数组(这里称为MyArray)只是指向第一个元素的指针:

  ======== 
| 6 | 4 | 2 | 3 | 1 | 5 |
===================================
^
MyArray

如果我们想查找MyArray [4],内部将被访问如下: / p>

  0 1 2 3 4 
================
| 6 | 4 | 2 | 3 | 1 | 5 |
===================================
^
MyArray + 4 --------------- /
(指针+偏移)

因为我们可以通过向指针添加偏移来直接访问数组中的任何元素,我们可以在相同的时间长度内查找任何元素,而不管数组的大小。这意味着获取MyArray [1000]将需要与获取MyArray相同的时间[5]。



另一种数据结构是一个链表。这是一个线性列表的指针,每个指向下一个节点

  ======== ==== ==== ======== ======== ======== 
|数据| |数据| |数据| |数据| |数据|
| | - > | | - > | | - > | | - > | |
| P1 | | P2 | | P3 | | P4 | | P5 |
======== ======== ======== ======== ========

P(X)表示指向下一个节点的指针。

请注意,我将每个节点放入自己的块中。这是因为它们不能保证(并且很可能不会)在内存中相邻。



如果我要访问P3,我不能直接访问它,因为我不知道它在记忆中的位置。所有我知道的是根(P1)的位置,所以我必须从P1开始,并且跟随每个指针到所需的节点。



这是一个O(N)查找时间(查找成本随着添加每个元素而增加)。与获得P4相比,获得P1000的成本要高得多。



更高级别的数据结构(如散列表,堆栈和队列)都可能使用数组(或多个数组),而链接列表和二叉树通常使用节点和指针。



您可能会想知道为什么有人会使用需要线性遍历的数据结构来查找值而不是使用数组,但它们有其用途。



再次使用我们的数组。这一次,我想找到保存值为'5'的数组元素。

  ====== 。。。。。。。。 6 | 4 | 2 | 3 | 1 | 5 | 
===================================
^ ^ ^ ^ ^发现!

在这种情况下,我不知道要添加到指针找到它的偏移量,所以我必须从0开始,一直工作,直到找到它。这意味着我必须执行6个检查。



正因为如此,在数组中搜索一个值被认为是O(N)。随着数组变大,搜索成本增加。



记住上面我说过有时使用非顺序数据结构可以有优势?搜索数据是这些优点之一,最佳示例之一是二叉树。



二进制树是与链表类似的数据结构,但不是链接到单个节点,每个节点可以链接到两个子节点。

  ========== 
|根|
==========
/ \
========= =========
|儿童| |儿童|
========= =========
/ \
========= ======== =
|儿童| |儿童|
========= =========

假设每个连接器真的是一个指针

当数据插入到二叉树中时,它使用几个规则来决定放置新节点的位置。基本概念是,如果新值大于父母,则将其插入左侧,如果它较低,则将其插入到右侧。



意味着二叉树中的值可能如下所示:

  ========== 
| 100 |
==========
/ \
========= =========
| 200 | | 50 |
========= =========
/ \
========= ======== =
| 75 | | 25 |
========= =========

当搜索二进制树的值为75时,我们只需要访问3个节点(O(log N)),因为这个结构:




  • 75是否小于100?看右边节点

  • 75大于50?看看左节点

  • 有75!



即使有5个节点我们的树,我们不需要看剩下的两个,因为我们知道他们(和他们的孩子)不可能包含我们正在寻找的价值。这给我们一个搜索时间,在最坏的情况下,我们必须访问每个节点,但在最好的情况下,我们只需要访问一小部分节点。



尽管O(1)访问时间是数组,但是它们提供了一个恒定的O(N)搜索时间。



这是一个令人难以置信的高级数据概览内存中的结构,跳过了很多细节,但希望与其他数据结构相比,它显示了数组的强弱。


As I was programming, I haven't seen an instance where an array is better for storing information than another form thereof. I had indeed figured the added "features" in programming languages had improved upon this and by that replaced them. I see now that they aren't replaced but rather given new life, so to speak.

So, basically, what's the point of using arrays?

This is not so much why do we use arrays from a computer standpoint, but rather why would we use arrays from a programming standpoint (a subtle difference). What the computer does with the array was not the point of the question.

解决方案

Time to go back in time for a lesson. While we don't think about these things much in our fancy managed languages today, they are built on the same foundation, so let's look at how memory is managed in C.

Before I dive in, a quick explanation of what the term "pointer" means. A pointer is simply a variable that "points" to a location in memory. It doesn't contain the actual value at this area of memory, it contains the memory address to it. Think of a block of memory as a mailbox. The pointer would be the address to that mailbox.

In C, an array is simply a pointer with an offset, the offset specifies how far in memory to look. This provides O(1) access time.

  MyArray   [5]
     ^       ^
  Pointer  Offset

All other data structures either build upon this, or do not use adjacent memory for storage, resulting in poor random access look up time (Though there are other benefits to not using sequential memory).

For example, let's say we have an array with 6 numbers (6,4,2,3,1,5) in it, in memory it would look like this:

=====================================
|  6  |  4  |  2  |  3  |  1  |  5  |
=====================================

In an array, we know that each element is next to each other in memory. A C array (Called MyArray here) is simply a pointer to the first element:

=====================================
|  6  |  4  |  2  |  3  |  1  |  5  |
=====================================
   ^
MyArray

If we wanted to look up MyArray[4], internally it would be accessed like this:

   0     1     2     3     4 
=====================================
|  6  |  4  |  2  |  3  |  1  |  5  |
=====================================
                           ^
MyArray + 4 ---------------/
(Pointer + Offset)

Because we can directly access any element in the array by adding the offset to the pointer, we can look up any element in the same amount of time, regardless of the size of the array. This means that getting MyArray[1000] would take the same amount of time as getting MyArray[5].

An alternative data structure is a linked list. This is a linear list of pointers, each pointing to the next node

========    ========    ========    ========    ========
| Data |    | Data |    | Data |    | Data |    | Data |
|      | -> |      | -> |      | -> |      | -> |      | 
|  P1  |    |  P2  |    |  P3  |    |  P4  |    |  P5  |        
========    ========    ========    ========    ========

P(X) stands for Pointer to next node.

Note that I made each "node" into its own block. This is because they are not guaranteed to be (and most likely won't be) adjacent in memory.

If I want to access P3, I can't directly access it, because I don't know where it is in memory. All I know is where the root (P1) is, so instead I have to start at P1, and follow each pointer to the desired node.

This is a O(N) look up time (The look up cost increases as each element is added). It is much more expensive to get to P1000 compared to getting to P4.

Higher level data structures, such as hashtables, stacks and queues, all may use an array (or multiple arrays) internally, while Linked Lists and Binary Trees usually use nodes and pointers.

You might wonder why anyone would use a data structure that requires linear traversal to look up a value instead of just using an array, but they have their uses.

Take our array again. This time, I want to find the array element that holds the value '5'.

=====================================
|  6  |  4  |  2  |  3  |  1  |  5  |
=====================================
   ^     ^     ^     ^     ^   FOUND!

In this situation, I don't know what offset to add to the pointer to find it, so I have to start at 0, and work my way up until I find it. This means I have to perform 6 checks.

Because of this, searching for a value in an array is considered O(N). The cost of searching increases as the array gets larger.

Remember up above where I said that sometimes using a non sequential data structure can have advantages? Searching for data is one of these advantages and one of the best examples is the Binary Tree.

A Binary Tree is a data structure similar to a linked list, however instead of linking to a single node, each node can link to two children nodes.

         ==========
         |  Root  |         
         ==========
        /          \ 
  =========       =========
  | Child |       | Child |
  =========       =========
                  /       \
            =========    =========
            | Child |    | Child |
            =========    =========

 Assume that each connector is really a Pointer

When data is inserted into a binary tree, it uses several rules to decide where to place the new node. The basic concept is that if the new value is greater than the parents, it inserts it to the left, if it is lower, it inserts it to the right.

This means that the values in a binary tree could look like this:

         ==========
         |   100  |         
         ==========
        /          \ 
  =========       =========
  |  200  |       |   50  |
  =========       =========
                  /       \
            =========    =========
            |   75  |    |   25  |
            =========    =========

When searching a binary tree for the value of 75, we only need to visit 3 nodes ( O(log N) ) because of this structure:

  • Is 75 less than 100? Look at Right Node
  • Is 75 greater than 50? Look at Left Node
  • There is the 75!

Even though there is 5 nodes in our tree, we did not need to look at the remaining two, because we knew that they (and their children) could not possibly contain the value we were looking for. This gives us a search time that at worst case means we have to visit every node, but in the best case we only have to visit a small portion of the nodes.

That is where arrays get beat, they provide a constant O(N) search time, despite O(1) access time.

This is an incredibly high level overview on data structures in memory, skipping over a lot of details, but hopefully it illustrates an array's strength and weakness compared to other data structures.

这篇关于为什么我们使用数组而不是其他数据结构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆