检测BOM的编码或其缺失 [英] Detect encoding by BOM / its absense

查看:72
本文介绍了检测BOM的编码或其缺失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在批处理脚本中使用此代码替换文件中的文本,然后将文件移动到某个位置。该代码包含在循环中,每次通过都读取变量。

I am using this code in a batch script to replace text in a file and then move the file to a location. This code is contained within a loop and reads in variables with each pass.

powershell -Command "(gc %inputPath%\%inputFile%) -replace 'Foo', '%bar%' | Out-File '%outputPath%\%outputFile%' -encoding default"

由于缺少 -encode default参数,所有文件都被编码为Unicode(UCS-2 Little Endian),我遇到了一个问题。添加完该参数后,我对ANSI文件没有问题,但是其中一些是UTF-8,并且遇到了同样的问题。

I ran into an issue with all the files being encoded as Unicode (UCS-2 Little Endian) since I lacked the "-encoding default" argument. After adding that argument, I have no problem with ANSI files, but some are UTF-8, and I'm getting the same problems.

这些文件是可执行文件的配置,它们对配置的编码可能非常挑剔。

These files are configs for executables, and they can be VERY picky about the encoding of their configs.

我已经搜寻了很多方法来读取对输入进行编码的类型,但是我一直无法找到有效的批处理解决方案。批处理是否可以读取编码?

I've searched a good bit for a way to read what type for encoding the input is, and I have been unable to find a batch solution that works. Does batch have a means of reading encoding?

我将接受Powershell解决方案,但前提是可以在批处理文件中执行它们。我不希望使用外部模块,但是如果这是唯一的方法,则可能不得不使用。

I'll accept powershell solutions, but ONLY if they can be executed from within the batch file. I'd prefer not to use external modules, but may have to if it's the only way.

推荐答案

创建普通的ascii文本名为dummy.txt的文件,只需在其中放入两个字符。我通常只是放AA。然后对这两个文件进行二进制比较。

Create a normal ascii text file named dummy.txt and just put two characters in it. I usually just put AA. Then do a binary compare of your two files.

fc /b LIttleEndian.txt dummy.txt

然后您将看到它作为输出

You will then see this as your output

Comparing files LIttleEndian.txt and DUMMY.TXT
00000000: FF 41
00000001: FE 41
FC: LIttleEndian.txt longer than DUMMY.TXT

对于UTF8,您会看到此内容。

For UTF8 you will see this.

C:\BatchFiles\Encoding>fc /b utf8.txt dummy.txt
Comparing files UTF8.txt and DUMMY.TXT
00000000: EF 41
00000001: BB 41
FC: UTF8.txt longer than DUMMY.TXT

使用FOR / F命令解析输出,这将有助于您确定输入文件所使用的编码。

Use a FOR /F command to parse the output and that should help you determine the encoding used for your input file.

对于ascii文本,十六进制代码应以数字开头。

For ascii text the hex codes would start with numbers.

C:\BatchFiles\Encoding>fc /b Normaltext.txt dummy.txt
Comparing files Normaltext.txt and DUMMY.TXT
00000000: 4E 41
00000001: 6F 41
FC: Normaltext.txt longer than DUMMY.TXT

这篇关于检测BOM的编码或其缺失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆