使用Powershell合并行并将内容从一个.csv拆分为多个文件 [英] Merge rows and split content from one .csv to multiple files using powershell

查看:67
本文介绍了使用Powershell合并行并将内容从一个.csv拆分为多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

https://stackoverflow.com/a/66565662/14210760 中所述,我希望有一个给定数据的第二种输出类型:

As mentioned in https://stackoverflow.com/a/66565662/14210760 I'd like to have a second type of output for the given data:

header1; header2; header3; header4; header5; header6; header7; header8; header9; header10; header11; header12; header13;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654321; EUR
CD; 456789; 22.24; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR
CD; 354345; 85.45; Text; SW;
CD; 123556; 94.63; Text; SW;
CD; 354564; 12.34; Text; SW;
CD; 135344; 32.23; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654323; EUR
CD; 354564; 12.34; Text; SW;
CD; 852143; 34.97; Text; SW;

这次, AB 行应始终位于 CD 行的前面.我知道这是多余的,但是它将使每一行成为一组完整的数据.理想的结果将是: BC987654321.csv

This time the AB rows should always be in front of the CD rows. I know it's redundand but it'll make every single row a whole set of data. The desired outcome would be: BC987654321.csv

header1; header2; header3; header4; header5; header6; header7; header8; header9; header10; header11; header12; header13;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654321; EUR; 12345; CD; 456789; 22.24; Text; SW;

BC987654322.csv

header1; header2; header3; header4; header5; header6; header7; header8; header9; header10; header11; header12; header13;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR; 12345; CD; 354345; 85.45; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR; 12345; CD; 123556; 94.63; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR; 12345; CD; 354564; 12.34; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654322; EUR; 12345; CD; 135344; 32.23; Text; SW;

BC987654323.csv

header1; header2; header3; header4; header5; header6; header7; header8; header9; header10; header11; header12; header13;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654323; EUR; 12345; CD; 354564; 12.34; Text; SW;
AB; 12345; AB123456789; 10.03.2021; GT; BC987654323; EUR; 12345; CD; 852143; 34.97; Text; SW;

提前谢谢

推荐答案

为此,我们需要更具创造力并使用临时Hashtables.

For this we need to be more creative and use temporary Hashtables.

类似这样的东西:

$path = 'D:\Test'
$fileIn = Join-Path -Path $path -ChildPath 'input.csv'
$fileOut = $null   # will get a value in the loop
$splitValue = 'AB' # the header1 value that decides to start a new file
$csv = Import-Csv -Path $fileIn -Delimiter ';'
# get an array of the column headers
$allHeaders = $csv[0].PsObject.Properties.Name

## create a new variable containing

$hash = [ordered]@{}
foreach ($item in $csv) {
    if ($item.header1 -eq $splitValue) { 
        # start a new row (build a new hash)
        $hash.Clear()
        $item.PsObject.Properties | Where-Object { $_.Value } | ForEach-Object { $hash[$_.Name] = $_.Value } 
        # get the filename from header6
        $fileOut = Join-Path -Path $path -ChildPath ('{0}.csv' -f $item.header6)
        # if a file with that name already exists, delete it
        if (Test-Path -Path $fileOut -PathType Leaf) { Remove-Item -Path $fileOut }
    }
    elseif ($hash.Count) {
        # copy the hash which holds the beginning of the line to a temporary row hash (the 'AB' line)
        $rowHash = [ordered]@{}
        foreach ($name in $hash.Keys) { $rowHash[$name] = $hash[$name] }
        $headerIndex = $hash.Count
        # append the new fields from this line to the row hash
        $item.PsObject.Properties | Where-Object { $_.Value } | ForEach-Object {
            # for safety: test if we do not index out of the $allHeaders array
            $header = if ($headerIndex -lt $allHeaders.Count) { $allHeaders[$headerIndex] } else { "header$($headerIndex + 1)" }
            $rowHash[$header] = $_.Value 
            $headerIndex++  # increment the counter
        }
        # append trailing headers with empty value
        while ($headerIndex -lt $allHeaders.Count) { 
            $rowHash[$allHeaders[$headerIndex++]] = $null
        }
        # cast the finalized rowhash into a [PsCustomObject]
        $newRow = [PsCustomObject]$rowHash
        # write the completed row in the csv file
        ##$fileOut = Join-Path -Path $path -ChildPath ('{0}.csv' -f $newRow.header6)
        # if the file already exists, we append, otherwise we create a new file
        $append = Test-Path -Path $fileOut -PathType Leaf
        $newRow | Export-Csv -Path $fileOut -Delimiter ';' -NoTypeInformation -Append:$append
    }
    else {
        Write-Warning "Could not find a starting row (header1 = '$splitValue') for the file"
    }
 }

输出:

BC987654321.csv

"header1";"header2";"header3";"header4";"header5";"header6";"header7";"header8";"header9";"header10";"header11";"header12";"header13"
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654321";"EUR";"CD";"456789";"22.24";"Text";"SW";

BC987654322.csv.csv

"header1";"header2";"header3";"header4";"header5";"header6";"header7";"header8";"header9";"header10";"header11";"header12";"header13"
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR";"CD";"354345";"85.45";"Text";"SW";
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR";"CD";"123556";"94.63";"Text";"SW";
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR";"CD";"354564";"12.34";"Text";"SW";
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654322";"EUR";"CD";"135344";"32.23";"Text";"SW";

BC987654323.csv.csv

"header1";"header2";"header3";"header4";"header5";"header6";"header7";"header8";"header9";"header10";"header11";"header12";"header13"
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654323";"EUR";"CD";"354564";"12.34";"Text";"SW";
"AB";"12345";"AB123456789";"10.03.2021";"GT";"BC987654323";"EUR";"CD";"852143";"34.97";"Text";"SW";


编辑

以上内容适用于问题中给出的示例数据,但在很大程度上取决于这样一个事实,即任何重要的字段都不能为空.

The above works on the sample data given in the question, but relies heavily on the fact that no fields that matter can be empty.

正如您所评论的那样,实际的csv确实具有空字段,因此,代码会将数据移到发生这种情况的错误列中.

As you have commented, the real csv does have empty fields and because of that, the code shifts the data into the wrong columns where that happens.

使用您的真实数据,效果会更好:

Using your real data, this should do a lot better:

$path       = 'D:\Test'
$fileIn     = Join-Path -Path $path -ChildPath 'input.csv'
$fileOut    = $null   # will get a value in the loop
$splitValue = 'IH'    # the value in the first column ($idColumn) that decides to start a new file. (in example data 'AB')
$csv        = Import-Csv -Path $fileIn -Delimiter ';'

# get an array of all the column headers
$allHeaders = $csv[0].PsObject.Properties.Name   # a string array of all header names
# get the index of the first column to start appending from ("Identifier")
$idColumn   = $allHeaders[0]                     # --> 'Record Identifier'  (in example data 'header1')

$mergeIndex = [array]::IndexOf($allHeaders, "Identifier")  # this is Case-Sensitive !
# if you want to do this case-insensitive, you need to do something like
# $mergeIndex = [array]::IndexOf((($allHeaders -join ';').ToLowerInvariant() -split ';'), "identifier")

# create an ordered hash that will contain the values up to column no. $mergeIndex
$hash = [ordered]@{}
foreach ($item in $csv) {
    if ($item.$idColumn -eq $splitValue) { 
        # start a new row (build a new hash)
        $hash.Clear()
        for ($i = 0; $i -lt $mergeIndex; $i++) {
            $hash[$allHeaders[$i]] = $item.$($allHeaders[$i])  # we need $(..) because of the spaces in the header names
        }

        # get the filename from the 6th header $item.$($allHeaders[5]) --> 'VAT Number'
        $fileOut = Join-Path -Path $path -ChildPath ('{0}.csv' -f $item.'VAT Number')
        # if a file with that name already exists, delete it
        if (Test-Path -Path $fileOut -PathType Leaf) { Remove-Item -Path $fileOut }
    }
    elseif ($hash.Count) {
        # create a new ordered hashtable to build the entire line with
        $rowHash = [ordered]@{}
        # copy the hash which holds the beginning of the line to a temporary row hash (the 'IH' line)
        # an ordered hashtable does not have a .Clone() method unfortunately..
        foreach ($name in $hash.Keys) { $rowHash[$name] = $hash[$name] }

        # append the fields from this item to the row hash starting at the $mergeIndex column
        $j = 0
        for ($i = $mergeIndex; $i -lt $allHeaders.Count; $i++) {
            $rowHash[$allHeaders[$i]] = $item.PsObject.Properties.Value[$j++]
        }

        # cast the finalized rowhash into a [PsCustomObject] and add to the file
        [PsCustomObject]$rowHash | Export-Csv -Path $fileOut -Delimiter ';' -NoTypeInformation -Append
    }
    else {
        Write-Warning "Could not find a starting row ('$idColumn' = '$splitValue') for the file"
    }
 }

请注意,由于实际的csv可能显示敏感数据,因此我在此处未显示输出

这篇关于使用Powershell合并行并将内容从一个.csv拆分为多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆