Windows命令行/shell-丢弃UTF-8 BOM [英] Windows command line/shell - discarding the UTF-8 BOM
问题描述
此问题是我正在使用的正则表达式可以很好地匹配要保留/丢弃的行.问题是该文件是由许多其他文件组成的,有时我想保留的那一行作为UTF-8编码文件的第一行开始.这意味着findstr
命令返回如下内容:
The regex that I'm using works just fine at matching the lines to keep/to discard. The problem is that the file was composed from a bunch of other files, and sometimes the line I want to keep started out as the first line of a UTF-8 encoded file. This means that the findstr
command returns something like:
LineToKeep that started out as the first line in its file
LineToKeep another
LineToKeep more lines
LineToKeep that started out as the first line in its file
LineToKeep more
保证除了BOM表字节外,该行始终以"LineToKeep"开头.由于这些Windows Shell命令无法正确处理它们,我如何摆脱这三个UTF-8 BOM字节?
It's guaranteed that excepting the BOM bytes, the line will always begin with "LineToKeep". How can I get rid of those three UTF-8 BOM bytes, since these windows shell commands can't properly handle them?
我希望找到一种将其删除的方法,或者是对上一个问题中对findstr
命令的修改.
I'm hoping for a way to remove them in place, or perhaps a modification to the findstr
command from that previous question.
由于我知道每行必须以"LineToKeep"或∩╗┐LineToKeep"开头,所以我认为有一种方法可以为每行计算类似if (Line[3:10] == "LineToKeep") { Line = Line[3:]; }
的内容.
Since I know each line must begin with "LineToKeep" or "LineToKeep", I figure there's a way to compute something like if (Line[3:10] == "LineToKeep") { Line = Line[3:]; }
for every line.
推荐答案
我最终在Windows cmd中调用PowerShell:
I ended up calling PowerShell in windows cmd:
powershell . "Get-ChildItem . | Select-String '^LineToKeep' | foreach {$_.Line}"
这篇关于Windows命令行/shell-丢弃UTF-8 BOM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!