在PowerShell中从管道流读取 [英] Reading from the Pipeline Stream in PowerShell

查看:177
本文介绍了在PowerShell中从管道流读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

我希望编写使用Microsoft.VisualBasic.FileIO.TextFieldParser解析某些csv数据的代码. 我为其生成数据的系统无法理解引号;所以我无法逃脱定界符;但必须更换它. 我已经找到了使用上述文本解析器的解决方案,但是我只看到人们将它与文件输入一起使用.与其将我的数据写到文件中以仅再次导入它,不如将其保留在内存中/使用此类的构造函数,该构造函数接受流作为输入.

I'm hoping to write code which uses Microsoft.VisualBasic.FileIO.TextFieldParser to parse some csv data. The system I'm generating this data for doesn't understand quotes; so I can't escape the delimiter; but rather have to replace it. I've found a solution using the above text parser, but I've only seen people use it with input from files. Rather than writing my data to file only to import it again, I'd rather keep things in memory / make use of this class's constructor which accepts a stream as input.

理想情况下,它将能够从用于管道的任何内存流中直接获取提要;但我不知道如何访问它. 在我当前的代码中,我创建了自己的内存流,并从管道向其中馈送数据;然后尝试从中读取.不幸的是我丢失了一些东西.

Ideally it would be able to take a feed direct from whichever memory stream's used for the pipeline; but I couldn't work out how to access that. In my current code I create my own memory stream and feed data to it from the pipeline; then attempt to read from that. Unfortunately I'm missing something.

问题

  1. 如何在PowerShell中读取/写入内存流?
  2. 是否可以直接从馈入函数管道的流中读取数据?

代码

clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
#[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null

function Clean-CsvStream {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory = $true, ValueFromPipeline=$true)]
        [string]$Line
        ,
        [Parameter(Mandatory = $false)]
        [char]$Delimiter = ','
    )
    begin {
        [System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
        [System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
        [System.IO.StreamReader]$readStream = New-Object System.IO.StreamReader($memStream)
        #[Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
        #$Parser.SetDelimiters($Delimiter)
        #$Parser.HasFieldsEnclosedInQuotes = $true
        #$writeStream.AutoFlush = $true
    }
    process {
        $writeStream.WriteLine($_)
        #$writeStream.Flush() #maybe we need to flush it before the reader will see it?
        write-output $readStream.ReadLine()
        #("Line: {0:000}" -f $Parser.LineNumber)
        #write-output $Parser.ReadFields()
    }
    end {
        #close streams and dispose (dodgy catch all's in case object's disposed before we call Dispose)
        #try {$Parser.Close(); $Parser.Dispose()} catch{} 
        try {$readStream.Close(); $readStream.Dispose()} catch{} 
        try {$writeStream.Close(); $writeStream.Dispose()} catch{} 
        try {$memStream.Close(); $memStream.Dispose()} catch{} 
    }
}
1,2,3,4 | Clean-CsvStream -$Delimiter ';' #nothing like the real data, but I'm not interested in actual CSV cleansing at this point

解决方法

同时,我的解决方案是在对象的属性而不是CSV行上执行此替换.

In the meantime my solution is just to do this replace on the objects's properties rather than the CSV rows.

$cols = $objectArray | Get-Member | ?{$_.MemberType -eq 'NoteProperty'} | select -ExpandProperty name
$objectArray | %{$csvRow =$_; ($cols | %{($csvRow.$_ -replace "[`n,]",':')}) -join ',' }


更新

我意识到缺少的代码是$memStream.Seek(0, [System.IO.SeekOrigin]::Begin) | out-null;

I realised the missing code was $memStream.Seek(0, [System.IO.SeekOrigin]::Begin) | out-null;

但是,这并不完全符合预期;即我的CSV的第一行显示两次,而其他输出的顺序错误;所以大概我误解了如何使用Seek.

However this isn't behaving entirely as expected; i.e. the first row of my CSV's showing twice, and other output's in the wrong order; so presumably I've misunderstood how to use Seek.

clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null

function Clean-CsvStream {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory = $true, ValueFromPipeline=$true)]
        [string]$CsvRow
        ,
        [Parameter(Mandatory = $false)]
        [char]$Delimiter = ','
        ,
        [Parameter(Mandatory = $false)]
        [regex]$InvalidCharRegex 
        ,
        [Parameter(Mandatory = $false)]
        [string]$ReplacementString 

    )
    begin {
        [System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
        [System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
        [Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
        $Parser.SetDelimiters($Delimiter)
        $Parser.HasFieldsEnclosedInQuotes = $true
        $writeStream.AutoFlush = $true
    }
    process {
        if ($InvalidCharRegex) {
            $writeStream.WriteLine($CsvRow)
            #flush here if not auto
            $memStream.Seek(0, [System.IO.SeekOrigin]::Begin) | out-null;
            write-output (($Parser.ReadFields() | %{$_ -replace $InvalidCharRegex,$ReplacementString }) -join $Delimiter)
        } else { #if we're not replacing anything, keep it simple
            $CsvRow
        }
    }
    end {
        "end {"
        try {$Parser.Close(); $Parser.Dispose()} catch{} 
        try {$writeStream.Close(); $writeStream.Dispose()} catch{} 
        try {$memStream.Close(); $memStream.Dispose()} catch{} 
        "} #end"
    }
}
$csv = @(
    (new-object -TypeName PSCustomObject -Property @{A="this is regular text";B="nothing to see here";C="all should be good"}) 
    ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text2";B="what the`nLine break!";C="all should be good2"}) 
    ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text3";B="ooh`r`nwindows line break!";C="all should be good3"}) 
    ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text4";B="I've got;a semi";C="all should be good4"}) 
    ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text5";B="""You're Joking!"" said the Developer`r`n""No honestly; it's all about the secret VB library"" responded the Google search result";C="all should be good5"})
) | convertto-csv -Delimiter ';' -NoTypeInformation
$csv | Clean-CsvStream -Delimiter ';' -InvalidCharRegex "[`r`n;]" -ReplacementString ':' 

推荐答案

经过大量的尝试之后,似乎可行:

After a lot of playing around it seems this works:

clear-host
[Reflection.Assembly]::LoadWithPartialName("System.IO") | out-null
[Reflection.Assembly]::LoadWithPartialName("Microsoft.VisualBasic") | out-null

function Clean-CsvStream {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory = $true, ValueFromPipeline=$true)]
        [string]$CsvRow
        ,
        [Parameter(Mandatory = $false)]
        [char]$Delimiter = ','
        ,
        [Parameter(Mandatory = $false)]
        [regex]$InvalidCharRegex 
        ,
        [Parameter(Mandatory = $false)]
        [string]$ReplacementString 
    )
    begin {
        [bool]$IsSimple = [string]::IsNullOrEmpty($InvalidCharRegex) 
        if(-not $IsSimple) {
            [System.IO.MemoryStream]$memStream = New-Object System.IO.MemoryStream
            [System.IO.StreamWriter]$writeStream = New-Object System.IO.StreamWriter($memStream)
            [Microsoft.VisualBasic.FileIO.TextFieldParser]$Parser = new-object Microsoft.VisualBasic.FileIO.TextFieldParser($memStream)
            $Parser.SetDelimiters($Delimiter)
            $Parser.HasFieldsEnclosedInQuotes = $true
        }
    }
    process {
        if ($IsSimple) {
            $CsvRow
        } else { #if we're not replacing anything, keep it simple
            [long]$seekStart = $memStream.Seek(0, [System.IO.SeekOrigin]::Current) 
            $writeStream.WriteLine($CsvRow)
            $writeStream.Flush()
            $memStream.Seek($seekStart, [System.IO.SeekOrigin]::Begin) | out-null 
            write-output (($Parser.ReadFields() | %{$_ -replace $InvalidCharRegex,$ReplacementString }) -join $Delimiter)
        }
    }
    end {
        if(-not $IsSimple) {
            try {$Parser.Close(); $Parser.Dispose()} catch{} 
            try {$writeStream.Close(); $writeStream.Dispose()} catch{} 
            try {$memStream.Close(); $memStream.Dispose()} catch{} 
        }
    }
}
$csv = @(
    (new-object -TypeName PSCustomObject -Property @{A="this is regular text";B="nothing to see here";C="all should be good"}) 
    ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text2";B="what the`nLine break!";C="all should be good2"}) 
    ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text3";B="ooh`r`nwindows line break!";C="all should be good3"}) 
    ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text4";B="I've got;a semi";C="all should be good4"}) 
    ,(new-object -TypeName PSCustomObject -Property @{A="this is regular text5";B="""You're Joking!"" said the Developer`r`n""No honestly; it's all about the secret VB library"" responded the Google search result";C="all should be good5"})
) | convertto-csv -Delimiter ';' -NoTypeInformation
$csv | Clean-CsvStream -Delimiter ';' -InvalidCharRegex "[`r`n;]" -ReplacementString ':' 

  1. 在编写之前先搜索当前位置
  2. 然后写
  3. 然后刷新(如果不是自动的话)
  4. 然后查找数据的开始
  5. 然后阅读
  6. 重复

我不确定这是否正确;因为我找不到任何好的示例或文档来说明,所以只玩了一段时间,直到有些东西变得有意义为止.

I'm not certain this is correct though; as I can't find any good examples or docs explaining, so just played about until something worked which vaguely made sense.

如果有人知道如何直接从管道流中读取内容,我仍然很感兴趣;即消除了奖金流的额外开销.

I'm also still interested if anyone knows how to read direct from the pipeline stream; i.e. to remove the additional overhead of bonus streams.

用于@ M.R.的评论

对不起,这太晚了;以防其他人使用:

Sorry this is so late; in case it's of use to others:

如果行尾定界符是CrLf(\r\n)而不是Cr(\r),则很容易在记录/行尾之间消除歧义,并且行在字段内中断:

If the end of line delimiter is CrLf (\r\n) rather than just Cr (\r) then it's easy to disambiguate between the end of record/line, and the line breaks within a field:

Get-Content -LiteralPath 'D:\test\file to clean.csv' -Delimiter "`r`n" | 
%{$_.ToString().TrimEnd("`r`n")} | #the delimiter is left on the end of the string; remove it
%{('"{0}"' -f $_) -replace '\|','"|"'} | #insert quotes at start and end of line, as well as around delimeters
ConvertFrom-Csv -Delimiter '|' #treat the pipeline content as a valid pipe delimitted csv

但是,如果没有,您将无法分辨哪个Cr是记录的结尾,哪个只是文本的中断.您可以通过计算管道数来稍微解决这个问题.也就是说,就像您有5列一样,第四个定界符之前的任何CR都是换行符,而不是记录的结尾.但是,如果还有另一个换行符,则不能确定是最后一列的数据还是该行的末尾是换行符.如果您知道第一列或最后一列都不包含换行符(或两者都不包含),则可以解决此问题.对于所有这些更复杂的情况,我怀疑正则表达式将是最佳选择.使用类似select-string的方法来应用它.如果需要的话;在此处发布问题以给出您的确切要求&有关您已经尝试过的内容的信息,其他人可以为您提供帮助.

However, if not you'll have no way of telling which Cr is the end of record, and which just a break in the text. You could get around this slightly by counting the number of pipes; i.e. as if you have 5 columns, any CRs before the fourth delimiter are line breaks rather than the end of record. However, if there's another line break you can't be sure if that's a line break in the last column's data, or the end of that row. If you know that either the first or the last column do not contain line breaks (or both) you can work around that. For all these more complex scenarios I suspect a regex would be the best option; using something like select-string to apply it. If this is required; post as a question on here giving your exact requirements & info on what you've attempted already, and others can help you out.

这篇关于在PowerShell中从管道流读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆