Powershell - 计算 .txt 文件中回车换行的次数 [英] Powershell - Count number of carriage returns line feed in .txt file

查看:83
本文介绍了Powershell - 计算 .txt 文件中回车换行的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大文本文件(从 SQL db 输出),我需要确定行数.但是,由于源 SQL 数据本身包含回车 \r 和换行符 \n(永远不会一起出现),因此某些行的数据在输出中跨越多行.txt 文件.我在下面使用的 Powershell 为我提供了大于实际 SQL 行数的文件行数.所以我需要修改脚本以忽略额外的行 - 一种方法可能只是计算 CRLF\r\n 出现的次数(TOGETHER)在文件中,这应该是实际的行数,但我不知道该怎么做.

Get-ChildItem."|%{$n = $_;$c = 0;Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count };"$n;$c"} >行数.txt

解决方案

我刚刚了解到 Get-Content 通过 CRCRLF 拆分和流式传输文件中的每一行,和 LF 以便它可以在操作系统之间互换读取数据:

1`r2`n3`r`n4"|输出文件 .\Test.txt(Get-Content .\Test.txt).Count4

再次阅读问题,我可能误解了您的问题.
在任何情况下,如果您只想对特定字符组合进行拆分(计数):

CR

((Get-Content -Raw .\Test.txt).Trim() -Split '\r').Count3

LF

((Get-Content -Raw .\Test.txt).Trim() -Split '\n').Count3

CRLF

((Get-Content -Raw .\Test.txt).Trim() -Split '\r\n').Count # 或:-Split [Environment]::NewLine2

注意 .Trim() 方法,该方法删除由 Get-Content -Raw 参数添加的文件末尾的额外换行符(空格).>


附录

(根据内存异常的注释更新)
恐怕目前没有其他选择,然后构建自己的 StreamReader 使用 ReadBlock 方法并专门在 CRLF 上拆分行.我已经为此问题打开了一个功能请求:-NewLine Parameter to custom line separator for Get-Content

获取线路

解决内存异常错误的可能方法:

function Get-Lines {[CmdletBinding()][OutputType([string])] param([Parameter(ValueFromPipeLine = $True)][string] $Filename,[字符串] $NewLine = [环境]::NewLine)开始 {[Char[]] $Buffer = 新对象 Char[] 10$Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList (Get-Item($Filename))$Rest = '' # 注意多字符换行符(如 CRLF)可以在缓冲区的末尾分割}过程 {虽然 ($True) {$Length = $Reader.ReadBlock($Buffer, 0, $Buffer.Length)if (!$length) { 中断 }$Split = ($Rest + [string]::new($Buffer[0..($Length - 1)])) -Split $NewLine如果 ($Split.Count -gt 1) { $Split[0..($Split.Count - 2)] }$Rest = $Split[-1]}}结尾 {$休息}}

用法

为了防止内存异常,重要的是不要将结果分配给变量或使用括号,因为这会阻止 PowerShell PowerShell 管道并将所有内容存储在内存中.

$Count = 0获取行 .\Test.txt |ForEach-Object { $Count++ }$Count

I have a large text file (output from SQL db) and I need to determine the row count. However, since the source SQL data itself contains carriage returns \r and line feeds \n (NEVER appearing together), the data for some rows spans multiple lines in the output .txt file. The Powershell I'm using below gives me the file line count which is greater than the actual SQL row count. So I need to modify the script to ignore the additional lines - one way of doing it might be just counting the number of times CRLF or \r\n occurs (TOGETHER) in the file and that should be the actual number of rows but I'm not sure how to do it.

Get-ChildItem "." |% {$n = $_; $c = 0; Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count }; "$n; $c"} > row_count.txt

解决方案

I just learned myself that the Get-Content splits and streams each lines in a file by CR, CRLF, and LF sothat it can read data between operating systems interchangeably:

"1`r2`n3`r`n4" | Out-File .\Test.txt
(Get-Content .\Test.txt).Count
4

Reading the question again, I might have misunderstood your question.
In any case, if you want to split (count) on only a specific character combination:

CR

((Get-Content -Raw .\Test.txt).Trim() -Split '\r').Count
3

LF

((Get-Content -Raw .\Test.txt).Trim() -Split '\n').Count
3

CRLF

((Get-Content -Raw .\Test.txt).Trim() -Split '\r\n').Count # or: -Split [Environment]::NewLine
2

Note .Trim() method which removes the extra newline (white spaces) at the end of the file added by the Get-Content -Raw parameter.


Addendum

(Update based on the comment on the memory exception)
I am afraid that there is currently no other option then building your own StreamReader using the ReadBlock method and specifically split lines on a CRLF. I have opened a feature request for this issue: -NewLine Parameter to customize line separator for Get-Content

Get-Lines

A possible way to workaround the memory exception errors:

function Get-Lines {
    [CmdletBinding()][OutputType([string])] param(
        [Parameter(ValueFromPipeLine = $True)][string] $Filename,
        [String] $NewLine = [Environment]::NewLine
    )
    Begin {
        [Char[]] $Buffer = new-object Char[] 10
        $Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList (Get-Item($Filename))
        $Rest = '' # Note that a multiple character newline (as CRLF) could be split at the end of the buffer
    }
    Process {
       While ($True) {
            $Length = $Reader.ReadBlock($Buffer, 0, $Buffer.Length)
            if (!$length) { Break }
            $Split = ($Rest + [string]::new($Buffer[0..($Length - 1)])) -Split $NewLine
            If ($Split.Count -gt 1) { $Split[0..($Split.Count - 2)] }
            $Rest = $Split[-1]
        }
    }
    End {
        $Rest
    }
}

Usage

To prevent the memory exceptions it is important that you do not assign the results to a variable or use brackets as this will stall the PowerShell PowerShell pipeline and store everything in memory.

$Count = 0
Get-Lines .\Test.txt | ForEach-Object { $Count++ }
$Count

这篇关于Powershell - 计算 .txt 文件中回车换行的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆