令人惊讶的性能差异:List.Constains,SortedList.ContainsKey,DataRowCollection.Contains,DataTable.Select, DataTable.FindBy [英] Surprising performance differences: List.Constains,SortedList.ContainsKey,DataRowCollection.Contains,DataTable.Select, DataTable.FindBy

查看:37
本文介绍了令人惊讶的性能差异:List.Constains,SortedList.ContainsKey,DataRowCollection.Contains,DataTable.Select, DataTable.FindBy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本来我想寻求一种最快的方式来查询一个特殊行的数据表.

我测试了 5 种不同方法的性能,结果令人惊讶(对我来说).

背景:我在 MS Sql-Server 2005 数据库中创建了一个视图.该视图当前共有 6318 行.因为我必须经常检查此视图中是否存在给定的 id,所以我想知道最有效的方法是什么.我在强类型数据集中创建了一个 DataAdapter,它返回所有行并填充一个数据表.我的第一种方法是创建一个共享的通用列表(Int32),并在应用程序启动时使用视图中的 ID 填充它.然后使用 List.Contains 检查是否当前 ID 在此列表中.因为所有行都是不同的,我想知道使用 SortedList 及其 ContainsKey-方法.然后我还检查了直接访问 Datable 的性能 Select-Method,自动生成(当列被定义为主键时)FindBy-method 以及最后但并非最不重要的 DatarowCollection.Contains-方法.所以我有 5 种方法来检查我的 ID 是否在该视图(或映射列表/排序列表)中.

我使用 System.Diagnostics.StopWatch 测量了它们的性能 并得到了一些有趣的结果.我认为 SortedList.ContainsKey 必须比 List.Contains 快,因为它们是不同的且已排序,但事实恰恰相反.但对我来说最令人惊讶的是 DataRowCollection.Contains-Method(我第一次忘记了)是迄今为止最快的.它甚至比 dataTable.FindBy 方法快 50 倍.

  1. 是什么导致了这些差异?
  2. 我忘记了更好的方法吗?
  3. 我的测量方法是否正确(我认为我最好循环它们并采用这些值)?
  4. 值是否可以转移或取决于数据表/集合的大小?
  5. 在我的更新(1000000 次迭代)之后,ContainsKey 是最快的.这是因为我总是检查相同的 ID 还是一般情况下?是否有某种 SortedList 没有字典的 KeyValue-Pair 的开销?

结果 [1000000 次迭代*]

  • 时间跨度 1 = SortedList.ContainsKey = Ø 0.65634 [238.1095] 毫秒
  • 时间跨度 2 = List.Contains = Ø 0.06802 [57045.37955] 毫秒
  • 时间跨度 3 = DataTable.FindByIdData(自动生成的方法)= Ø 0.31580 [1542.62345] 毫秒
  • 时间跨度 4 = DataTable.Select = Ø 0.27790 [26029.39635] 毫秒
  • 时间跨度 5 = DataRowCollection.Contains = Ø 0.00638 [1202.79735] 毫秒

    1.)时间跨度 1:0,6913 毫秒时间跨度 2:0,1053 毫秒时间跨度 3:0,3279 毫秒时间跨度 4:0,1002 毫秒时间跨度 5:0,0056 毫秒2.)时间跨度 1:0,6405 毫秒时间跨度 2:0,0588 毫秒时间跨度 3:0,3112 毫秒时间跨度 4:0,3872 毫秒时间跨度 5:0,0067 毫秒3.)时间跨度 1:0,6502 毫秒时间跨度 2:0,0588 毫秒时间跨度 3:0,3092 毫秒时间跨度 4:0,1268 毫秒时间跨度 5:0,007 毫秒4.)时间跨度 1:0,6504 毫秒时间跨度 2:0,0586 毫秒时间跨度 3:0,3092 毫秒时间跨度 4:0,3893 毫秒时间跨度 5:0,0063 毫秒5.)时间跨度 1:0,6493 毫秒时间跨度 2:0,0586 毫秒时间跨度 3:0,3215 毫秒时间跨度 4:0,386 毫秒时间跨度 5:0,0063 毫秒时间跨度 1:0,6913 0,6405 0,6502 0,6504 0,6493 = Ø 0,65634时间跨度 2:0,1053 0,0588 0,0588 0,0586 0,0586 = Ø 0,06802时间跨度 3:0,3279 0,3112 0,3092 0,3092 0,3215 = Ø 0,31580时间跨度 4:0,1002 0,3872 0,1268 0,3893 0,3860 = Ø 0,27790时间跨度 5:0,0056 0,0067 0,0070 0,0063 0,0063 = Ø 0,00638

为了完整起见,VB.Net 源代码:

Dim 应用为布尔值将时钟调暗为新 System.Diagnostics.Stopwatch时钟.开始()对于 i As Int32 = 1 到 1000000适用 = sortedListAC17NextClaims.ContainsKey(myClaim.idData)下一个时钟.停止()Dim timeSpan1 As String = "Timespan 1: " &clock.Elapsed.TotalMilliseconds.ToString &"多发性硬化症"时钟.重置()时钟.开始()对于 i As Int32 = 1 到 1000000适用 = listAC17NextClaims.Contains(myClaim.idData)下一个时钟.停止()Dim timeSpan2 As String = "Timespan 2: " &clock.Elapsed.TotalMilliseconds.ToString &"多发性硬化症"时钟.重置()时钟.开始()对于 i As Int32 = 1 到 1000000适用 = 不是 MyDS.AC17NextClaims.FindByIdData(myClaim.idData) 什么都不是下一个时钟.停止()Dim timeSpan3 As String = "Timespan 3: " &clock.Elapsed.TotalMilliseconds.ToString &"多发性硬化症"时钟.重置()时钟.开始()对于 i As Int32 = 1 到 1000000适用 = MyDS.AC17NextClaims.Select("idData=" & myClaim.idData).Length >0下一个时钟.停止()Dim timeSpan4 As String = "Timespan 4: " &clock.Elapsed.TotalMilliseconds.ToString &"多发性硬化症"时钟.重置()时钟.开始()对于 i As Int32 = 1 到 1000000适用 = MyDS.AC17NextClaims.Rows.Contains(myClaim.idData)下一个时钟.停止()Dim timeSpan5 As String = "Timespan 5: " &clock.Elapsed.TotalMilliseconds.ToString &"多发性硬化症"

  • 更新:我已经改变了我的结果和上面的来源.方括号中是 1000000 次迭代的值.现在结果完全不同了.现在最快的方法肯定是SortedList的ContainsKey.

  • 更新 2:我忘记了使用 List.BinarySearch 的替代方法.这对我来说似乎是最快的:

    clock.Start()对于 i As Int32 = 1 到 1000000适用 = listAC17NextClaims.BinarySearch(myClaim.idData) >-1下一个时钟.停止()

    执行 1000000 次迭代只需要 219.1805 毫秒,因此是最快的,没有 SortedList-KeyValue-Pair 的开销.我可以使用它而无需对列表进行排序,因为 DataAdapter 使用 Order By 子句填充了数据表.

解决方案

为什么不使用具有 HashTable 作为底层数据结构(DictionaryHashSet)?HashTables 应该提供 O(1) 查找时间,如果键中没有冲突(如您所说)并且不需要排序"的开销.

如果您想存储密钥,您应该使用 HashSet 在 .NET 3.5 及更高版本中可用.

来自 MSDN SortedList:><块引用>

对 SortedList 对象的操作倾向于比在 a 上的操作慢哈希表对象,因为排序.

要面向 .NET 2.0,您可以自行开发或使用预构建的,例如 Wintellect 的 Power Collections(您也可以轻松地使用源代码).

Originally i wanted to ask for the fastest way to query a Datatable for a special row.

I have tested 5 different methods for their performance with a surprising(for me) result.

Background: I have created a View in a MS Sql-Server 2005 Database. This view has current a total count of 6318 rows. Because i must check very often if a given id exists in this view, i wondered what is the most efficient way to do. I created a DataAdapter in a strongly typed dataset which returns all rows and fills a Datatable. My first approach was to create a shared generic List(of Int32) and fill it with the ID's from the view once on Application start. Then use List.Contains to check if the current ID is in this List. Because all rows are distinct i wondered if its not faster to use a SortedList and its ContainsKey-metod. Then i checked also the performance of direct access to the Datable with its Select-Method, its automatically generated(when column is defined as primary-key) FindBy-method and last but not least the DatarowCollection.Contains-Method. So i have 5 Methods to check if my ID is in that View(or mapped List/SortedList).

I measured their performance with the System.Diagnostics.StopWatch and got some interesting results. I thought the SortedList.ContainsKey must be faster than the List.Contains because they're distinct and sorted but the opposite is true. But the most surprising for me was that the DataRowCollection.Contains-Method(that i first had forgotten) is by far the fastest. It is even 50 times faster than the dataTable.FindBy-method.

  1. What caused these differences?
  2. Have i forgotten a better way?
  3. Is my measurement method correct(i think i better should loop them and take that values)?
  4. Are the values transferrable or dependent on the size of the Datatable/Collection?
  5. After my update(1000000 iterations) ContainsKey is the fastest. Is this because i always check for the same id or in general? Is there some kind of SortedList without the overhead of a Dictionary's KeyValue-Pair?

Results [for 1000000 iterations*]

  • Timespan 1 = SortedList.ContainsKey = Ø 0.65634 [238.1095] ms
  • Timespan 2 = List.Contains = Ø 0.06802 [57045.37955] ms
  • Timespan 3 = DataTable.FindByIdData(auto-generated method) = Ø 0.31580 [1542.62345] ms
  • Timespan 4 = DataTable.Select = Ø 0.27790 [26029.39635] ms
  • Timespan 5 = DataRowCollection.Contains = Ø 0.00638 [1202.79735] ms

    1.)
    Timespan 1: 0,6913 ms
    Timespan 2: 0,1053 ms
    Timespan 3: 0,3279 ms
    Timespan 4: 0,1002 ms
    Timespan 5: 0,0056 ms
    
    2.)
    Timespan 1: 0,6405 ms
    Timespan 2: 0,0588 ms
    Timespan 3: 0,3112 ms
    Timespan 4: 0,3872 ms
    Timespan 5: 0,0067 ms
    
    3.)
    Timespan 1: 0,6502 ms
    Timespan 2: 0,0588 ms
    Timespan 3: 0,3092 ms
    Timespan 4: 0,1268 ms
    Timespan 5: 0,007 ms
    
    4.)
    Timespan 1: 0,6504 ms
    Timespan 2: 0,0586 ms
    Timespan 3: 0,3092 ms
    Timespan 4: 0,3893 ms
    Timespan 5: 0,0063 ms
    
    5.)
    Timespan 1: 0,6493 ms
    Timespan 2: 0,0586 ms
    Timespan 3: 0,3215 ms
    Timespan 4: 0,386 ms
    Timespan 5: 0,0063 ms
    
    
    
    Timespan 1: 0,6913 0,6405 0,6502 0,6504 0,6493 = Ø 0,65634
    Timespan 2: 0,1053 0,0588 0,0588 0,0586 0,0586 = Ø 0,06802
    Timespan 3: 0,3279 0,3112 0,3092 0,3092 0,3215 = Ø 0,31580
    Timespan 4: 0,1002 0,3872 0,1268 0,3893 0,3860 = Ø 0,27790
    Timespan 5: 0,0056 0,0067 0,0070 0,0063 0,0063 = Ø 0,00638
    

And for the sake of completeness part of the VB.Net source:

Dim applies As Boolean
Dim clock As New System.Diagnostics.Stopwatch

clock.Start()
For i As Int32 = 1 To 1000000
    applies = sortedListAC17NextClaims.ContainsKey(myClaim.idData)
Next
clock.Stop()
Dim timeSpan1 As String = "Timespan 1: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"

clock.Reset()
clock.Start()
For i As Int32 = 1 To 1000000
    applies = listAC17NextClaims.Contains(myClaim.idData)
Next
clock.Stop()
Dim timeSpan2 As String = "Timespan 2: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"

clock.Reset()
clock.Start()
For i As Int32 = 1 To 1000000
    applies = Not MyDS.AC17NextClaims.FindByIdData(myClaim.idData) Is Nothing
Next
clock.Stop()
Dim timeSpan3 As String = "Timespan 3: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"

clock.Reset()
clock.Start()
For i As Int32 = 1 To 1000000
    applies = MyDS.AC17NextClaims.Select("idData=" & myClaim.idData).Length > 0
Next
clock.Stop()
Dim timeSpan4 As String = "Timespan 4: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"

clock.Reset()
clock.Start()
For i As Int32 = 1 To 1000000
    applies = MyDS.AC17NextClaims.Rows.Contains(myClaim.idData)
Next
clock.Stop()
Dim timeSpan5 As String = "Timespan 5: " & clock.Elapsed.TotalMilliseconds.ToString & " ms"

  • UPDATE: I have changed my results and the source above. In squared brackets are the values for 1000000 iterations. Now the result is completely different. The fastest method now is definitely is the ContainsKey of the SortedList.

  • UPDATE 2: I forgot the alternative to use List.BinarySearch. This seems to be fastest for me:

    clock.Start()
    For i As Int32 = 1 To 1000000
        applies = listAC17NextClaims.BinarySearch(myClaim.idData) > -1
    Next
    clock.Stop()
    

    needs only 219.1805 ms to perform 1000000 iterations and hence is the fastest without the overhead of a SortedList-KeyValue-Pair. I can use it without to sort the list because the DataAdapter filled the datatable with an Order By clause.

解决方案

Why don't you use a collection that has a HashTable as the underlying data structure (Dictionary<TKey, TValue> or HashSet<T>)? HashTables should provide O(1) look up time if there are no collisions in keys (as you've stated) and doesn't need the overhead to "sort".

EDIT: If you only want to store the keys, you should use HashSet<T> which is available in .NET 3.5 and above.

From MSDN on SortedList:

Operations on a SortedList object tend to be slower than operations on a Hashtable object because of the sorting.

To target .NET 2.0, you could roll your own or use a pre-built one like Wintellect's Power Collections (you could easily just use the source as well).

这篇关于令人惊讶的性能差异:List.Constains,SortedList.ContainsKey,DataRowCollection.Contains,DataTable.Select, DataTable.FindBy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆