Windows命令行/shell-丢弃UTF-8 BOM [英] Windows command line/shell - discarding the UTF-8 BOM

查看:154
本文介绍了Windows命令行/shell-丢弃UTF-8 BOM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题是我正在使用的正则表达式可以很好地匹配要保留/丢弃的行.问题是该文件是由许多其他文件组成的,有时我想保留的那一行作为UTF-8编码文件的第一行开始.这意味着findstr命令返回如下内容:

The regex that I'm using works just fine at matching the lines to keep/to discard. The problem is that the file was composed from a bunch of other files, and sometimes the line I want to keep started out as the first line of a UTF-8 encoded file. This means that the findstr command returns something like:

LineToKeep that started out as the first line in its file
LineToKeep another
LineToKeep more lines
LineToKeep that started out as the first line in its file
LineToKeep more

保证除了BOM表字节外,该行始终以"LineToKeep"开头.由于这些Windows Shell命令无法正确处理它们,我如何摆脱这三个UTF-8 BOM字节?

It's guaranteed that excepting the BOM bytes, the line will always begin with "LineToKeep". How can I get rid of those three UTF-8 BOM bytes, since these windows shell commands can't properly handle them?

我希望找到一种将其删除的方法,或者是对上一个问题中对findstr命令的修改.

I'm hoping for a way to remove them in place, or perhaps a modification to the findstr command from that previous question.

由于我知道每行必须以"LineToKeep"或∩╗┐LineToKeep"开头,所以我认为有一种方法可以为每行计算类似if (Line[3:10] == "LineToKeep") { Line = Line[3:]; }的内容.

Since I know each line must begin with "LineToKeep" or "LineToKeep", I figure there's a way to compute something like if (Line[3:10] == "LineToKeep") { Line = Line[3:]; } for every line.

推荐答案

我最终在Windows cmd中调用PowerShell:

I ended up calling PowerShell in windows cmd:

powershell . "Get-ChildItem . | Select-String '^LineToKeep' | foreach {$_.Line}"

这篇关于Windows命令行/shell-丢弃UTF-8 BOM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆