Powershell:在CSV文件中排序/删除重复项 [英] Powershell: Sorting/Removing Duplicates in a CSV file

查看:266
本文介绍了Powershell:在CSV文件中排序/删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我是Powershell的新手,我想感谢本站点的所有参与者通过提供不同的答案来帮助我!由于这个网站,我在短时间内取得了很多成就!

first of all, I'm very new to Powershell, and I would like to thank all the participants of this site for helping me by providing answers across different fourms! I've accomplished a lot in a short time because of this site!

这是问题所在,我将尽力解释.我有一个CSV文件来创建学生帐户.每当学生注册,更改或退出程序时,我们的学生管理系统都会生成一条记录.如果那个学生尝试"一些不同的程序,他们将在CSV文件中有多个记录.因此,我的目标是按用户ID(用户ID永不更改)和CurrentStatusDate(即创建记录的时间)对CSV文件进行排序.使用此命令:

Here is the issue, and I'll do my best to explain. I have a CSV file to create student accounts. Our Student management system produces a record each time a student enrolls, is altered, or exits a program. If that student "trys out" a few different programs, they will have multiple records in the CSV file. So my goal is to Sort the CSV file by userID (the userID never changes) and by CurrentStatusDate (which is when the record was created). Using this command:

Import-CSV "C:\students.csv" | sort UserID,CurrentStatusDate

CSV记录示例:

"UserID","AccountStatus","PersonID","PIN","FirstName","LastName","IDEXPIRY","Term","Role","Course","SectionName","locationDescription","Location","CurrentStatusDate"
"aboggs","Add","xxxxxxx","xxxxxxx","Ashley","Baggs","5/11/2013","xxxxxx","Student","Accounting Technology","xxxxxx","xxxxxx","xxxxxx","9/12/2011"
"aboutilier","Add","xxxxxxx","xxxxxxx","Amelia","Boutilier","5/3/2012","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","11/15/2011"
"abowtle","Delete","xxxxxxx","xxxxxxx","Aleisha","Bowtle","7/31/2013","xxxxxx","Student","Business Administration","xxxxxx","xxxxxx","xxxxxx","2/1/2011"
"abowtle","Add","xxxxxxx","xxxxxxx","Aleisha","Bowtle","7/31/2012","xxxxxx","Student","General Studies","xxxxxx","xxxxxx","xxxxxx","9/9/2011"
"abradley","Delete","xxxxxxx","xxxxxxx","Anna","Bradley","10/25/2011","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","11/17/2011"
"abridges","Delete","xxxxxxx","xxxxxxx","Ashley","Bridges","10/5/2011","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","11/15/2011"
"abrown10165","Add","xxxxxxx","xxxxxxx","Adam","Brown","10/28/2011","xxxxxx","Student","Advanced Firefighting STCW VI/3","xxxxxx","xxxxxx","xxxxxx","10/24/2011"
"abrown10165","Add","xxxxxxx","xxxxxxx","Adam","Brown","12/16/2011","xxxxxx","Student","Simulated Electronic Navigation Level 1, Part B","xxxxxx","xxxxxx","xxxxxx","11/10/2011"
"abrown8081","Add","xxxxxxx","xxxxxxx","Alex","Brown","5/25/2013","xxxxxx","Student","Culinary Arts","xxxxxx","xxxxxx","xxxxxx","9/6/2011"
"abrown8950","Delete","xxxxxxx","xxxxxxx","Ashley","Brown","9/13/2012","xxxxxx","Student","Medical Support Services","xxxxxx","xxxxxx","xxxxxx","9/14/2011"
"acameron2637","Delete","xxxxxxx","xxxxxxx","Anne","Cameron","10/14/2011","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","10/14/2011"
"acameron4368","Add","xxxxxxx","xxxxxxx","Amanda","Cameron","4/20/2013","xxxxxx","Student","Applied Degree in Culinary Operations","xxxxxx","xxxxxx","xxxxxx","10/12/2011"
"acampbell10266","Add","xxxxxxx","xxxxxxx","Amanda","Campbell","5/4/2012","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","11/7/2011"
"acampbell6499","Delete","xxxxxxx","xxxxxxx","Aaron","Campbell","10/31/2012","xxxxxx","Student","Retail Business Management","xxxxxx","xxxxxx","xxxxxx","11/1/2011"
"acampbell6499","Add","xxxxxxx","xxxxxxx","Aaron","Campbell","12/13/2011","xxxxxx","Student","Complete the Accounting Cycle - Part II","xxxxxx","xxxxxx","xxxxxx","9/26/2011"

这应该将所有用户ID与相同的记录分组,然后按创建日期对它们进行排序.然后,我要删除重复项并保留创建的最后一条记录.我对-Unique很熟悉,但是它不适用于上面的命令,因为它只会删除具有重复的userID和CurrentStatusDates的记录.

This should group all the userID's with the same records, then sort them by date created. I then want to remove the duplicates and retain the last record created. I'm familiar with the -Unique, but it doesn't apply to the command above as it will only remove records that have duplicate userID and CurrentStatusDates.

如果一直在谷歌搜索"并且用力地敲了两天...开始认为没有简单"的答案,但是我的编程能力很弱...只是在寻找微调"正确的方向.

If been "Google-ing" and banging my head for 2 days... starting to think there is no "easy" answer, but my programming-fu is weak... Just looking for a "nudge" in the right direction.

谢谢!

克里斯

推荐答案

正如Andy所说,鉴于我们没有CSV格式的示例,这有点困难.但是我在想,您正在寻找以下类似的东西:

As Andy stated, it's a little hard given we don't have a sample of the CSV format. However I'm thinking that something like the below is what you're looking for:

Import-CSV "C:\students.csv" | Group-Object userid | foreach-object { $_.group | sort-object currentstatusdate | select -last 1}

正如您所描述的-我们按ID分组,然后按CurrentStatusDate排序,然后选择最近的记录.我不确定CurrentStatusDate的格式,所以我不知道直接的排序对象是否足够好.

Just as you describe - we group by ID, then sort by CurrentStatusDate, then select most-recent record. I'm not sure how CurrentStatusDate is formatted, so I don't know if a straight sort-object will be good enough.

这篇关于Powershell:在CSV文件中排序/删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆