按部分名称匹配过滤文件 [英] Filtering files by partial name match
问题描述
我有一个包含 20.000 个格式的 XML 文件的网络共享
I have a network share with 20.000 XML files in the format
username-computername.xml
有重复的条目(当用户收到新电脑时)
There are duplicate entries in the form of (when a user received a new comptuer)
user1-computer1.xml
user1-computer2.xml
user1-computer1.xml
user1-computer2.xml
或
BLRPPR-SKB52084.xml
BLRSIA-SKB50871.xml
S028DS-SKB51334.xml
s028ds-SKB52424.xml
S02FL6-SKB51644.xml
S02FL6-SKB52197.xml
S02VUD-SKB52083.xml
BLRPPR-SKB52084.xml
BLRSIA-SKB50871.xml
S028DS-SKB51334.xml
s028ds-SKB52424.xml
S02FL6-SKB51644.xml
S02FL6-SKB52197.xml
S02VUD-SKB52083.xml
因为我稍后要操作 XML,所以我不能仅仅忽略数组的属性,因为至少我需要完整路径.目的是,如果发现重复,则使用时间戳较新的那个.
Since im going to manipulate the XMLs later I can't just dismiss properties of the array as at the very least I need the full path. The aim is, if a duplicate is found, the one with the newer timestamp is being used.
这是我需要该逻辑的代码片段
Here is a snipet of the code where I need that logic
$xmlfiles = Get-ChildItem "network share"
这里我只是在做一个 foreach
循环:
Here I'm just doing a foreach
loop:
foreach ($xmlfile in $xmlfiles) {
[xml]$xmlcontent = Get-Content -Path $xmlfile.FullName -Encoding UTF8
Select-Xml -Xml $xmlcontent -Xpath " "
# create [pscustomobject] etc...
}
基本上我需要的是
if ($xmlfiles.Name.Split("-")[0]) - duplicate) {
# select the one with higher $xmlfiles.LastWriteTime and store either
# the full object or the $xmlfiles.FullName
}
理想情况下,这应该是 foreach
循环的一部分,不必循环两次.
Ideally that should be part of the foreach
loop to not to have to loop through twice.
推荐答案
您可以使用 Group-Object
按自定义属性对文件进行分组:
You can use Group-Object
to group files by a custom attribute:
$xmlfiles | Group-Object { $_.Name.Split('-')[0] }
上面的语句会产生这样的结果:
The above statement will produce a result like this:
Count Name Group
----- ---- -----
1 BLRPPR {BLRPPR-SKB52084.xml}
1 BLRSIA {BLRSIA-SKB50871.xml}
2 S028DS {S028DS-SKB51334.xml, s028ds-SKB52424.xml}
2 S02FL6 {S02FL6-SKB51644.xml, S02FL6-SKB52197.xml}
1 S02VUD {S02VUD-SKB52083.xml}
其中 Group
属性包含原始 FileInfo
对象.
where the Group
property contains the original FileInfo
objects.
在 ForEach-Object
循环中展开组,按 LastWriteTime
对每个组进行排序,然后从中选择最近的文件:
Expand the groups in a ForEach-Object
loop, sort each group by LastWriteTime
, and select the most recent file from it:
... | ForEach-Object {
$_.Group | Sort-Object LastWriteTime -Desc | Select-Object -First 1
}
这篇关于按部分名称匹配过滤文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!