大文件的 UTF-8 BOM 到 UTF-8 转换 [英] UTF-8 BOM to UTF-8 Conversion for a large file

查看:64
本文介绍了大文件的 UTF-8 BOM 到 UTF-8 转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据该线程的建议,我使用 powershell 进行了 UTF-8 转换,现在我遇到了另一个问题,我有一个大约 18 GB 的非常大的文件,我正在尝试在一台机器上转换它大约有 50GB 的可用内存,但是这个转换过程消耗了所有内存并且编码失败,有没有办法限制内存使用或分块进行转换?

Based on the suggestion from this thread, i have used powershell to do the UTF-8 conversion, now i am running into another problem, i have a very huge file around 18 gb which i am trying to convert on a machine with around 50GB RAM free, but this conversion process eats up all the ram and encoding fails, is there a way to limit the RAM usage or to do the conversion in chunks?

使用 PowerShell 编写一个没有 BOM 的 UTF-8 文件

顺便说一句,下面是确切的代码

BTW below is exact code

foreach ($file in ls -name $Path\CM*.csv)
{
   $file_content = Get-Content "$Path\$file";
   [System.IO.File]::WriteAllLines("$Path\$file", $file_content);
   
   echo "encoding done : $file"

}

推荐答案

您可以使用 StreamReaderStreamWriter 进行转换.

You can use a StreamReader and StreamWriter to do the conversion.

StreamWriter 默认输出 UTF8NoBOM.

The StreamWriter by default outputs UTF8NoBOM.

这将需要大量磁盘操作,但会占用内存.

This will take a lot of disk actions, but will be lean on memory.

请记住,.Net 需要完整的绝对路径.

Bear in mind that .Net needs full absolute paths.

$sourceFile      = 'D:\Test\Blah.txt'  # enter your own in- and output files here
$destinationFile = 'D:\Test\out.txt'

$reader = [System.IO.StreamReader]::new($sourceFile, [System.Text.Encoding]::UTF8)
$writer = [System.IO.StreamWriter]::new($destinationFile)

while ($null -ne ($line = $reader.ReadLine())) {
    $writer.WriteLine($line)
}
# clean up
$writer.Flush()
$reader.Dispose()
$writer.Dispose()

<小时>

上面的代码将在输出文件中添加一个最后的换行符.如果这是不需要的,请改为执行此操作:


The above code will add a final newline to the output file. If that is unwanted, do this instead:

$sourceFile      = 'D:\Test\Blah.txt'
$destinationFile = 'D:\Test\out.txt'

$reader = [System.IO.StreamReader]::new($sourceFile, [System.Text.Encoding]::UTF8)
$writer = [System.IO.StreamWriter]::new($destinationFile)

while ($null -ne ($line = $reader.ReadLine())) {
    if ($reader.EndOfStream) {
        $writer.Write($line)
    }
    else {
        $writer.WriteLine($line)
    }
}
# clean up
$writer.Flush()
$reader.Dispose()
$writer.Dispose()

这篇关于大文件的 UTF-8 BOM 到 UTF-8 转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆