令人惊讶的性能差异:List.Constains,SortedList.ContainsKey,DataRowCollection.Contains,DataTable.Select,DataTable.FindBy [英] Surprising performance differences: List.Constains,SortedList.ContainsKey,DataRowCollection.Contains,DataTable.Select, DataTable.FindBy

查看:313
本文介绍了令人惊讶的性能差异:List.Constains,SortedList.ContainsKey,DataRowCollection.Contains,DataTable.Select,DataTable.FindBy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本来我想要求以最快的方式查询数据表中的特殊行。

Originally i wanted to ask for the fastest way to query a Datatable for a special row.

我已经测试了5种不同的方法,他们以惊人的(对我来说)的结果表现。

I have tested 5 different methods for their performance with a surprising(for me) result.

背景: 我已经创建了一个MS SQL-Server 2005数据库视图。这种观点有6318行目前的总数量。因为我必须,如果存在一个给定id经常检查这一观点,我不知道什么是最有效的方式做到。我在一个强类型数据集返回所有行和填充一个DataTable创建一个DataAdapter。 我的第一个方法是创建(的Int32)在一个共享的泛型列表,并与ID的距离,一旦在应用程序启动的视图填充它。然后使用 List.Contains ,以检查目前的ID是在此列表中。因为所有的行都是不同的我想知道,如果它不是更快地使用一个排序列表和它的的containsKey -metod。 后来我也检查了直接访问的性能的确定时代与它的选择-方法时,其自动生成的(当柱被定义为主键)的 FindBy法以及最后但并非最不重要的的 DatarowCollection.Contains - 方法。 所以我有5种方法来检查,如果我的ID是在视图(或映射表/排序列表)。

Background: I have created a View in a MS Sql-Server 2005 Database. This view has current a total count of 6318 rows. Because i must check very often if a given id exists in this view, i wondered what is the most efficient way to do. I created a DataAdapter in a strongly typed dataset which returns all rows and fills a Datatable. My first approach was to create a shared generic List(of Int32) and fill it with the ID's from the view once on Application start. Then use List.Contains to check if the current ID is in this List. Because all rows are distinct i wondered if its not faster to use a SortedList and its ContainsKey-metod. Then i checked also the performance of direct access to the Datable with its Select-Method, its automatically generated(when column is defined as primary-key) FindBy-method and last but not least the DatarowCollection.Contains-Method. So i have 5 Methods to check if my ID is in that View(or mapped List/SortedList).

我测量了他们的 System.Diagnostics.StopWatch <性能/ A>并得到了一些有趣的结果。我以为SortedList.ContainsKey必须比List.Contains更快,因为它们是不同的,整理,但事实正好相反。 但最令人惊讶的,我是,DataRowCollection.Contains法(我第一次忘记了),是目前为止最快的。它比dataTable.FindBy-方法快,甚至50倍。

I measured their performance with the System.Diagnostics.StopWatch and got some interesting results. I thought the SortedList.ContainsKey must be faster than the List.Contains because they're distinct and sorted but the opposite is true. But the most surprising for me was that the DataRowCollection.Contains-Method(that i first had forgotten) is by far the fastest. It is even 50 times faster than the dataTable.FindBy-method.

  1. 是什么导致这些差异?
  2. 没有忘记一个更好的办法?
  3. 是我的测量方法正确的(我想我应该更好地循环,并采取了值)?
  4. 是值转让或依赖于数据表/集合?
  5. 的大小
  6. 在我的更新(1000000迭代)的containsKey是最快之后。这是因为我总是检查相同的ID或有什么看法?是否有某种排序列表中没有字典的键值对的开销?

结果 [达到1000000次迭代*]

Results [for 1000000 iterations*]

  • Timespan 1 = SortedList.ContainsKey = Ø 0.65634 [238.1095] ms
  • Timespan 2 = List.Contains = Ø 0.06802 [57045.37955] ms
  • Timespan 3 = DataTable.FindByIdData(auto-generated method) = Ø 0.31580 [1542.62345] ms
  • Timespan 4 = DataTable.Select = Ø 0.27790 [26029.39635] ms
  • Timespan 5 = DataRowCollection.Contains = Ø 0.00638 [1202.79735] ms

1.)
Timespan 1: 0,6913 ms
Timespan 2: 0,1053 ms
Timespan 3: 0,3279 ms
Timespan 4: 0,1002 ms
Timespan 5: 0,0056 ms


2.)
Timespan 1: 0,6405 ms
Timespan 2: 0,0588 ms
Timespan 3: 0,3112 ms
Timespan 4: 0,3872 ms
Timespan 5: 0,0067 ms


3.)
Timespan 1: 0,6502 ms
Timespan 2: 0,0588 ms
Timespan 3: 0,3092 ms
Timespan 4: 0,1268 ms
Timespan 5: 0,007 ms


4.)
Timespan 1: 0,6504 ms
Timespan 2: 0,0586 ms
Timespan 3: 0,3092 ms
Timespan 4: 0,3893 ms
Timespan 5: 0,0063 ms


5.)
Timespan 1: 0,6493 ms
Timespan 2: 0,0586 ms
Timespan 3: 0,3215 ms
Timespan 4: 0,386 ms
Timespan 5: 0,0063 ms


Timespan 1: 0,6913 0,6405 0,6502 0,6504 0,6493 = Ø 0,65634
Timespan 2: 0,1053 0,0588 0,0588 0,0586 0,0586 = Ø 0,06802
Timespan 3: 0,3279 0,3112 0,3092 0,3092 0,3215 = Ø 0,31580
Timespan 4: 0,1002 0,3872 0,1268 0,3893 0,3860 = Ø 0,27790
Timespan 5: 0,0056 0,0067 0,0070 0,0063 0,0063 = Ø 0,00638

和为客户着想的 VB.Net源的完整性部分

Dim applies As Boolean
Dim clock As New System.Diagnostics.Stopwatch

clock.Start()
For i As Int32 = 1 To 1000000
    applies = sortedListAC17NextClaims.ContainsKey(myClaim.idData)
Next
clock.Stop()
Dim timeSpan1 As String = "Timespan 1: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"

clock.Reset()
clock.Start()
For i As Int32 = 1 To 1000000
    applies = listAC17NextClaims.Contains(myClaim.idData)
Next
clock.Stop()
Dim timeSpan2 As String = "Timespan 2: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"

clock.Reset()
clock.Start()
For i As Int32 = 1 To 1000000
    applies = Not MyDS.AC17NextClaims.FindByIdData(myClaim.idData) Is Nothing
Next
clock.Stop()
Dim timeSpan3 As String = "Timespan 3: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"

clock.Reset()
clock.Start()
For i As Int32 = 1 To 1000000
    applies = MyDS.AC17NextClaims.Select("idData=" & myClaim.idData).Length > 0
Next
clock.Stop()
Dim timeSpan4 As String = "Timespan 4: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"

clock.Reset()
clock.Start()
For i As Int32 = 1 To 1000000
    applies = MyDS.AC17NextClaims.Rows.Contains(myClaim.idData)
Next
clock.Stop()
Dim timeSpan5 As String = "Timespan 5: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"

  • 更新: 我已经改变了我的结果和上面的源代码。在方括号达到1000000次迭代的值。现在的结果是完全不一样的。最快的方法现在肯定是排序列表的使用containsKey。

    • UPDATE: I have changed my results and the source above. In squared brackets are the values for 1000000 iterations. Now the result is completely different. The fastest method now is definitely is the ContainsKey of the SortedList.

      更新2: 我忘了使用 List.BinarySearch 中的另类。 这似乎是最快的我:

      UPDATE 2: I forgot the alternative to use List.BinarySearch. This seems to be fastest for me:

      clock.Start()
      For i As Int32 = 1 To 1000000
          applies = listAC17NextClaims.BinarySearch(myClaim.idData) > -1
      Next
      clock.Stop()
      

      仅需要219.1805毫秒执行百万次迭代,因此是最快的未经排序列表-KEYVALUE线对的开销。 我可以使用它没有对列表进行排序,因为DataAdapter的填充ORDER BY子句中的数据表。

      needs only 219.1805 ms to perform 1000000 iterations and hence is the fastest without the overhead of a SortedList-KeyValue-Pair. I can use it without to sort the list because the DataAdapter filled the datatable with an Order By clause.

      推荐答案

      你为什么不使用具有的哈希表作为底层的数据结构(词典&LT; TKEY的,TValue&GT; 的HashSet&LT; T&GT; )? 哈希表应提供 O(1)查找时间如果在键没有冲突(如你所述)和不需要开销分类

      Why don't you use a collection that has a HashTable as the underlying data structure (Dictionary<TKey, TValue> or HashSet<T>)? HashTables should provide O(1) look up time if there are no collisions in keys (as you've stated) and doesn't need the overhead to "sort".

      编辑:如果你的只有的要存储密钥,你应该使用的的HashSet&LT; T&GT;这可在.NET 3.5及以上

      If you only want to store the keys, you should use HashSet<T> which is available in .NET 3.5 and above.

      从MSDN 的上排序列表:

      这是一个排序列表对象操作趋向   为比上一个运算慢   因为Hashtable对象   排序。

      Operations on a SortedList object tend to be slower than operations on a Hashtable object because of the sorting.

      要面向.NET 2.0,你可以滚你自己的,或者使用pre建一个像 Wintellect的的Power集合(你可以很容易地只使用源以及)。

      To target .NET 2.0, you could roll your own or use a pre-built one like Wintellect's Power Collections (you could easily just use the source as well).

      这篇关于令人惊讶的性能差异:List.Constains,SortedList.ContainsKey,DataRowCollection.Contains,DataTable.Select,DataTable.FindBy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆