Powershell问题-寻找最快的方法来遍历500k对象,以在另一个500k对象数组中寻找匹配项 [英] Powershell question - Looking for fastest method to loop through 500k objects looking for a match in another 500k object array

查看:59
本文介绍了Powershell问题-寻找最快的方法来遍历500k对象,以在另一个500k对象数组中寻找匹配项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个使用import-csv cmdlet导入的大型.csv文件.我做了很多搜索和尝试,最后发布帖子,寻求帮助以简化此操作.

I have two large .csv files that I've imported using the import-csv cmdlet. I've done a lot of searching and trying and am finally posting to ask for some help to make this easier.

我需要遍历第一个数组,该数组的范围从80k行到500k行.这些数组中的每个对象都有多个属性,然后我需要在第二个数组中找到相同大小的第二个数组中的对应条目,该属性与该属性匹配.

I need to move through the first array that will have anywhere from 80k rows to 500k rows. Each object in these arrays has multiple properties, and I then need to find the corresponding entry in a second array of the same size matching on a property from there.

我正在将它们作为[systems.collection.arrayList]导入,我也尝试将它们作为哈希表放置.我什至尝试过与其他几篇文章中提到的LINQ混为一谈.

I'm importing them as [systems.collection.arrayList] and I've tried to place them as hashtables too. I have even tried to muck with LINQ which was mentioned in several other posts.

任何人都可以提供建议或见解如何使运行速度更快?感觉就像我在一个干草堆中寻找将干草堆放在另一个堆中一样.

Any chance anyone can offer advice or insight how to make this run faster? It feels like I'm looking in one haystack for matching hay in a different stack.

$ImportTime1 = Measure-Command {
    [System.Collections.ArrayList]$fileList1 = Import-csv file1.csv
    [System.Collections.ArrayList]$fileSorted1 = ($fileList1 | Sort-Object -property 'Property1' -Unique -Descending)
    Remove-Variable fileList1
}

$ImportTime2 = Measure-Command {
    [System.Collections.ArrayList]$fileList2 = Import-csv file2.csv
    [System.Collections.ArrayList]$fileSorted2 = ($fileList2 | Sort-Object -property 'Property1' -Unique -Descending)
    Remove-Variable fileList2
}

$fileSorted1.foreach({
     $varible1 = $_
     $target = $fileSorted2.where({$_ -eq $variable1})
     ###do some other stuff
})

推荐答案

这可能有用:注释#27359中的更新解决方案,并添加了注释#27380中Max Kozlov建议的更改.

The updated solution in comment #27359 + add the suggested change by Max Kozlov in comment #27380.

Function RJ-CombinedCompare() {
    [CmdletBinding()]
    PARAM(
        [Parameter(Mandatory=$True)]$List1,
        [Parameter(Mandatory=$True)]$L1Match,
        [Parameter(Mandatory=$True)]$List2,
        [Parameter(Mandatory=$True)]$L2Match
    )
    $hash = @{}
    foreach ($data in $List1) {$hash[$data.$L1Match] += ,[pscustomobject]@{Owner=1;Value=$($data)}}
    foreach ($data in $List2) {$hash[$data.$L2Match] += ,[pscustomobject]@{Owner=2;Value=$($data)}}
    foreach ($kv in $hash.GetEnumerator()) {
        $m1, $m2 = $kv.Value.where({$_.Owner -eq 1}, 'Split')
        [PSCustomObject]@{
            MatchValue = $kv.Key
            L1Matches = $m1.Count
            L2Matches = $m2.Count
            L1MatchObject = $L1Match
            L2MatchObject = $L2Match
            List1 = $m1.Value
            List2 = $m2.Value
        }
    }
}

$fileList1 = Import-csv file1.csv
$fileList2 = Import-csv file2.csv

$newList = RJ-CombinedCompare -List1 $fileList1 -L1Match $(yourcolumnhere) -List2 $fileList2 -L2Match $(yourothercolumnhere)

foreach ($item in $newList) {
    # your logic here
}

将列表传递到此哈希表中应该很快,并且迭代遍历它也很快.

It should be fast to pass the lists into this hashtable and it's fast to iterate through as well.

这篇关于Powershell问题-寻找最快的方法来遍历500k对象,以在另一个500k对象数组中寻找匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆