如何在powershell中合并两个具有相同标题的csv文件并丢弃重复的行 [英] How to merge two csv files in powershell with same headers and discard duplicate rows

查看:48
本文介绍了如何在powershell中合并两个具有相同标题的csv文件并丢弃重复的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从 NetApp Performance Manager 软件 (OPM) 收集性能计数器.OPM 在 MySQL 数据库中保存了 30 天的数据.所以我必须输入两个查询来检索数据:

I am collecting performance counters from NetApp Performance Manager software (OPM). OPM saves 30 days worth of data in MySQL database. So i have to put two queries to retrieve the data:

  1. 每月 30 日第一次查询并保存在 csv 文件中.
  2. 每月 1 号进行第二次查询并保存在 csv 文件中.

然后合并两个csv文件,如果一个月有31天就得到数据.

Then merge the two csv files to get data if there are 31 days in a month.

两个文件如下所示:

"Time","objid","cpuBusy","cifsOps","avgLatency"
"2016:06:04 00:04","72","50.6196","2069.11","7622.1"
"2016:06:04 00:09","72","30.2233","2075.94","7633.27"
"2016:06:04 00:14","72","35.2559","1980.64","8352.17"

当我使用以下代码合并两个 csv 文件时.我从相同的数据/时间得到重复的行.

When i merge the two csv files with below code. I get duplicate rows with data from same data/time.

@(Import-Csv au2004npsa003-mm-business.csv) + @(Import-Csv au2004npsa003-nn-business.csv) | export-csv joined.csv -NoTypeInformation

如何在不获取重复数据的情况下合并两个 csv 文件?我试过 select -unique 但是,它只给出一行.

How can i merge the two csv files without getting duplicate data? I have tried select -unique however, it gives just one row.

推荐答案

至于 为什么 Select-Object -Unique 不起作用:

As for why Select-Object -Unique didn't work:

  • Select-Object -Unique,当给定引用类型(字符串除外)的实例时,比较它们的.ToString() 以确定唯一性.

  • Select-Object -Unique, when given instances of reference types (other than strings), compares their .ToString() values in order to determine uniqueness.

[pscustomobject] 实例,例如 Import-Csv 创建的实例,遗憾的是 返回 空字符串来自他们的.ToString()方法.

[pscustomobject] instances, such as the ones Import-Csv creates, regrettably return the empty string from their .ToString() method.

因此,所有输入对象都比较相同,并且只会返回第一个输入对象.

Thus, all input objects compare the same, and only the first input object is ever returned.

S9uare 有用的Select-Object -Property * -Unique 方法通过强制所有属性单独比较,但附带一个性能警告:
输入对象被有效地重新创建,在这种情况下比较所有属性值是多余的,因为比较Time值就足够了;对于大型输入文件,处理可能需要很长时间.

S9uare's helpful Select-Object -Property * -Unique approach overcomes this problem by forcing all properties to be compared invidually, but comes with a performance caveat:
The input objects are effectively recreated, and comparing all property values is overkill in this case, because comparing Time values would suffice; with large input files, processing can take a long time.

由于手头的数据来自 CSV 文件,性能问题可以通过字符串处理来解决,使用 Get-Content 而不是导入-CSV:

Since the data at hand comes from CSV files, the performance problem can be helped with string processing, using Get-Content rather than Import-Csv:

Get-Content au2004npsa003-mm-business.csv, au2004npsa003-nn-business.csv |
  Select-Object -Unique |
    Set-Content -Encoding ASCII joined.csv

请注意,我使用 -Encoding ASCII 来模仿 Export-Csv 的默认行为;根据需要更改.

Note that I'm using -Encoding ASCII to mimic Export-Csv's default behavior; change as needed.

对于字符串形式的输入对象,Select-Object -Unique 按预期工作 - 并且速度更快.
但是请注意,对于大型输入文件,您可能会耗尽内存,因为 Select-Object 需要构建一个包含所有行的内存数据结构以确定唯一性.

With input objects that are strings, Select-Object -Unique works as expected - and is faster.
Note, however, that with large input files that you may run out of memory, given that Select-Object needs to build up an in-memory data structure containing all rows in order to determine uniqueness.

这篇关于如何在powershell中合并两个具有相同标题的csv文件并丢弃重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆