Powershell csv 行列转置和操作 [英] Powershell csv row column transpose and manipulation

查看:69
本文介绍了Powershell csv 行列转置和操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Powershell 的新手.我尝试根据中等大小的基于 csv 的记录(大约 10000 行)处理/转置行列.原始 CSV 包含大约 10000 行和 3 列 ("Time","Id","IOT") 如下:

I'm newbie in Powershell. I tried to process / transpose row-column against a medium size csv based record (around 10000 rows). The original CSV consist of around 10000 rows with 3 columns ("Time","Id","IOT") as below:

"Time","Id","IOT" 
"00:03:56","23","26" 
"00:03:56","24","0" 
"00:03:56","25","0" 
"00:03:56","26","1" 
"00:03:56","27","0" 
"00:03:56","28","0" 
"00:03:56","29","0" 
"00:03:56","30","1953" 
"00:03:56","31","22" 
"00:03:56","32","39" 
"00:03:56","33","8" 
"00:03:56","34","5" 
"00:03:56","35","269" 
"00:03:56","36","5" 
"00:03:56","37","0" 
"00:03:56","38","0" 
"00:03:56","39","0" 
"00:03:56","40","1251" 
"00:03:56","41","103" 
"00:03:56","42","0" 
"00:03:56","43","0" 
"00:03:56","44","0" 
"00:03:56","45","0" 
"00:03:56","46","38" 
"00:03:56","47","14" 
"00:03:56","48","0" 
"00:03:56","49","0" 
"00:03:56","2013","0" 
"00:03:56","2378","0" 
"00:03:56","2380","32" 
"00:03:56","2758","0" 
"00:03:56","3127","0" 
"00:03:56","3128","0" 
"00:09:16","23","22" 
"00:09:16","24","0" 
"00:09:16","25","0" 
"00:09:16","26","2" 
"00:09:16","27","0" 
"00:09:16","28","0" 
"00:09:16","29","21" 
"00:09:16","30","48" 
"00:09:16","31","0" 
"00:09:16","32","4" 
"00:09:16","33","4" 
"00:09:16","34","7" 
"00:09:16","35","382" 
"00:09:16","36","12" 
"00:09:16","37","0" 
"00:09:16","38","0" 
"00:09:16","39","0" 
"00:09:16","40","1882" 
"00:09:16","41","42" 
"00:09:16","42","0" 
"00:09:16","43","3" 
"00:09:16","44","0" 
"00:09:16","45","0" 
"00:09:16","46","24" 
"00:09:16","47","22" 
"00:09:16","48","0" 
"00:09:16","49","0" 
"00:09:16","2013","0" 
"00:09:16","2378","0" 
"00:09:16","2380","19" 
"00:09:16","2758","0" 
"00:09:16","3127","0" 
"00:09:16","3128","0" 
... 
... 
... 

我尝试使用基于从 下载的 powershell 脚本的代码进行转置https://gallery.technet.microsoft.com/scriptcenter/Powershell-Script-to-7c8368be
基本上我的powershell代码如下:

I tried to do the transpose using code based from powershell script downloaded from https://gallery.technet.microsoft.com/scriptcenter/Powershell-Script-to-7c8368be
Basically my powershell code is as below:

$b = @() 
    foreach ($Time in $a.Time | Select -Unique) { 
        $Props = [ordered]@{ Time = $time } 
        foreach ($Id in $a.Id | Select -Unique){ 
            $IOT = ($a.where({ $_.Id -eq $Id -and $_.time -eq $time })).IOT 
            $Props += @{ $Id = $IOT } 
        } 
        $b += New-Object -TypeName PSObject -Property $Props 
    } 
$b | FT -AutoSize 
$b | Out-GridView 

上面的代码可以给我预期的结果,其中所有 "Id" 值将成为列标题,而所有 "Time" 值将成为唯一行和 "IOT" 值作为 "Id" x "Time" 的交集如下:

Above code could give me the result as I expected which are all "Id" values will become column headers while all "Time" values will become unique row and "IOT" values as the intersection from "Id" x "Time" as below:

"Time","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","2013","2378","2380","2758","3127","3128" 
"00:03:56","26","0","0","1","0","0","0","1953","22","39","8","5","269","5","0","0","0","1251","103","0","0","0","0","38","14","0","0","0","0","32","0","0","0" 
"00:09:16","22","0","0","2","0","0","21","48","0","4","4","7","382","12","0","0","0","1882","42","0","3","0","0","24","22","0","0","0","0","19","0","0","0" 

虽然它只涉及几百行,但结果很快就出来了,但是现在在处理具有 10000 行的整个 csv 文件时出现问题,上面的脚本继续执行"并且似乎无法完成很长时间(小时)并且无法吐出任何结果.那么,如果来自 stackoverflow 的一些 powershell 专家可以帮助评估上面的代码,并且可能可以帮助进行修改以加快结果的速度?

While it only involves a few hundreds rows, the result comes out quickly as expected, but the problem now when processing the whole csv file with 10000 rows, the script above 'keep executing' and doesn't seem able to finish for long time (hours) and couldn't spit out any results. So probably if some powershell experts from stackoverflow could help to asses the code above and probably could help to modify to speed up the results?

非常感谢您的建议

推荐答案

10000 条记录很多,但我认为这不足以提供建议 streamreader* 并手动解析 CSV.不过,对您不利的最大事情是以下行:

10000 records is a lot but I don't think it is enough to advise streamreader* and manually parsing the CSV. The biggest thing going against you though is the following line:

$b += New-Object -TypeName PSObject -Property $Props 

PowerShell 在这里所做的是创建一个新数组并将该元素附加到它.这是一个非常占用内存的操作,您要重复 1000 次.在这种情况下,更好的做法是利用管道为您带来优势.

What PowerShell is doing here is making a new array and appending that element to it. This is a very memory intensive operation that you are repeating 1000's of times. Better thing to do in this case is use the pipeline to your advantage.

$data = Import-Csv -Path "D:\temp\data.csv"
$headers = $data.ID  | Sort-Object {[int]$_}  -Unique

$data | Group-Object Time | ForEach-Object{
    $props = [ordered]@{Time = $_.Name}
    foreach($header in $headers){
        $props."$header" = ($_.Group | Where-Object{$_.ID -eq $header}).IOT
    }
    [pscustomobject]$props
} |  export-csv d:\temp\testing.csv -NoTypeInformation

$data 将您在内存中的整个文件作为一个对象.需要获取将成为列标题的所有 $headers.

$data will be your entire file in memory as an object. Need to get all the $headers that will be the column headers.

按每个时间对数据进行分组.然后在每个时间对象中,我们获得每个 ID 的值.如果在那段时间内该 ID 不存在,则该条目将显示为空.

Group the data by each Time. Then inside each time object we get the value for every ID. If the ID does not exist during that time then the entry will show as null.

这不是最好的方法,但应该比你的更快.我在一分钟内运行了 10000 条记录(3 次通过平均 51 秒).如果可以,我将基准测试向您展示.

This is not the best way but should be faster than yours. I ran 10000 records in under a minute (51 second average over 3 passes). Will benchmark to show you if I can.

我只用自己的数据运行了一次您的代码,耗时 13 分钟.我认为可以肯定地说,我的性能更快.

I just ran your code once with my own data and it took 13 minutes. I think it is safe to say that mine performs faster.

虚拟数据是用这个逻辑制作的,仅供参考

Dummy data was made with this logic FYI

1..100 | %{
 $time = get-date -Format "hh:mm:ss"
 sleep -Seconds 1
    1..100 | % {

        [pscustomobject][ordered]@{
            time = $time 
            id = $_
            iot = Get-Random -Minimum 0 -Maximum 7
        } 
    }
} | Export-Csv d:\temp\data.csv -notypeinformation

* 对于您的流阅读器来说,这不是一个很好的例子.只是指出它以表明它是读取大文件的更好方法.只需要逐行解析字符串.

* Not a stellar example for your case of streamreader. Just pointing it out to show that it is the better way to read large files. Just need to parse string line by line.

这篇关于Powershell csv 行列转置和操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆