Powershell 2和.NET:针对超大哈希表进行优化吗? [英] Powershell 2 and .NET: Optimize for extremely large hash tables?

查看:60
本文介绍了Powershell 2和.NET:针对超大哈希表进行优化吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Powershell,并且对.NET完全陌生.

我正在运行以空哈希表开头的PS脚本.哈希表将增长到至少15,000到20,000个条目.哈希表的键将是字符串形式的电子邮件地址,值将是布尔值. (我只需要跟踪我是否看到过电子邮件地址即可.)

到目前为止,我一直在将哈希表一次增加一个条目.我检查以确保键值对不存在(在这种情况下PS会出错),然后添加该对.

这是我们正在讨论的代码部分:

...
    if ($ALL_AD_CONTACTS[$emailString] -ne $true) {
      $ALL_AD_CONTACTS += @{$emailString = $true}
    }
...

我想知道,从PowerShell或.NET的角度来看,是否有什么可以优化此哈希表的性能的,如果您知道它会提前变得庞大,例如15,000至20,000个条目或更多. /p>

谢谢!

解决方案

我使用一组20条细线的 Measure-Command 进行了一些基本测试; 000 随机单词.

下面显示了各个结果,但总的来说,通过首先分配具有单个条目的新哈希表来添加到一个哈希表中的效率似乎难以置信:)尽管通常在选项2至5中有一些次要的效率提高他们的表现大致相同.

如果我要选择的话,我可能会因为其简单性而倾向于选项5(只是一个 $chars = [char[]]('a'[0]..'z'[0]) $words = 1..20KB | foreach { $count = Get-Random -Minimum 15 -Maximum 35 -join (Get-Random $chars -Count $count) } # 1) Original, adding to hashtable with "+=". # TotalSeconds: ~800 Measure-Command { $h = @{} $words | foreach { if( $h[$_] -ne $true ) { $h += @{ $_ = $true } } } } # 2) Using sharding among sixteen hashtables. # TotalSeconds: ~3 Measure-Command { [hashtable[]]$hs = 1..16 | foreach { @{} } $words | foreach { $h = $hs[$_.GetHashCode() % 16] if( -not $h.ContainsKey( $_ ) ) { $h.Add( $_, $null ) } } } # 3) Using ContainsKey and Add on a single hashtable. # TotalSeconds: ~3 Measure-Command { $h = @{} $words | foreach { if( -not $h.ContainsKey( $_ ) ) { $h.Add( $_, $null ) } } } # 4) Using ContainsKey and Add on a hashtable constructed with capacity. # TotalSeconds: ~3 Measure-Command { $h = New-Object Collections.Hashtable( 21KB ) $words | foreach { if( -not $h.ContainsKey( $_ ) ) { $h.Add( $_, $null ) } } } # 5) Using HashSet<string> and Add. # TotalSeconds: ~3 Measure-Command { $h = New-Object Collections.Generic.HashSet[string] $words | foreach { $null = $h.Add( $_ ) } }

I am dabbling in Powershell and completely new to .NET.

I am running a PS script that starts with an empty hash table. The hash table will grow to at least 15,000 to 20,000 entries. Keys of the hash table will be email addresses in string form, and values will be booleans. (I simply need to track whether or not I've seen an email address.)

So far, I've been growing the hash table one entry at a time. I check to make sure the key-value pair doesn't already exist (PS will error on this condition), then I add the pair.

Here's the portion of my code we're talking about:

...
    if ($ALL_AD_CONTACTS[$emailString] -ne $true) {
      $ALL_AD_CONTACTS += @{$emailString = $true}
    }
...

I am wondering if there is anything one can do from a PowerShell or .NET standpoint that will optimize the performance of this hash table if you KNOW it's going to be huge ahead of time, like 15,000 to 20,000 entries or beyond.

Thanks!

解决方案

I performed some basic tests using Measure-Command, using a set of 20 000 random words.

The individual results are shown below, but in summary it appears that adding to one hashtable by first allocating a new hashtable with a single entry is incredibly inefficient :) Although there were some minor efficiency gains among options 2 through 5, in general they all performed about the same.

If I were to choose, I might lean toward option 5 for its simplicity (just a single Add call per string), but all the alternatives I tested seem viable.

$chars = [char[]]('a'[0]..'z'[0])
$words = 1..20KB | foreach {
  $count = Get-Random -Minimum 15 -Maximum 35
  -join (Get-Random $chars -Count $count)
}

# 1) Original, adding to hashtable with "+=".
#     TotalSeconds: ~800
Measure-Command {
  $h = @{}
  $words | foreach { if( $h[$_] -ne $true ) { $h += @{ $_ = $true } } }
}

# 2) Using sharding among sixteen hashtables.
#     TotalSeconds: ~3
Measure-Command {
  [hashtable[]]$hs = 1..16 | foreach { @{} }
  $words | foreach {
    $h = $hs[$_.GetHashCode() % 16]
    if( -not $h.ContainsKey( $_ ) ) { $h.Add( $_, $null ) }
  }
}

# 3) Using ContainsKey and Add on a single hashtable.
#     TotalSeconds: ~3
Measure-Command {
  $h = @{}
  $words | foreach { if( -not $h.ContainsKey( $_ ) ) { $h.Add( $_, $null ) } }
}

# 4) Using ContainsKey and Add on a hashtable constructed with capacity.
#     TotalSeconds: ~3
Measure-Command {
  $h = New-Object Collections.Hashtable( 21KB )
  $words | foreach { if( -not $h.ContainsKey( $_ ) ) { $h.Add( $_, $null ) } }
}

# 5) Using HashSet<string> and Add.
#     TotalSeconds: ~3
Measure-Command {
  $h = New-Object Collections.Generic.HashSet[string]
  $words | foreach { $null = $h.Add( $_ ) }
}

这篇关于Powershell 2和.NET:针对超大哈希表进行优化吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆