使用awk去除Byte-order标记 [英] Using awk to remove the Byte-order mark
问题描述
awk
脚本(大概是单行)如何删除 BOM 看起来像什么?
How would an awk
script (presumably a one-liner) for removing a BOM look like?
规格:
- 打印第一行之后的每一行 (
NR > 1
) - 对于第一行:如果它以
#FE #FF
或#FF #FE
开头,删除它们并打印其余部分
- print every line after the first (
NR > 1
) - for the first line: If it starts with
#FE #FF
or#FF #FE
, remove those and print the rest
推荐答案
试试这个:
awk 'NR==1{sub(/^xefxbbxbf/,"")}{print}' INFILE > OUTFILE
在第一条记录(行)上,删除 BOM 字符.打印每条记录.
On the first record (line), remove the BOM characters. Print every record.
或者稍微短一点,使用awk中的默认操作是打印记录的知识:
Or slightly shorter, using the knowledge that the default action in awk is to print the record:
awk 'NR==1{sub(/^xefxbbxbf/,"")}1' INFILE > OUTFILE
1
是最短的条件,总是评估为真,因此打印每条记录.
1
is the shortest condition that always evaluates to true, so each record is printed.
享受吧!
-- 附录 --
Unicode 字节顺序标记 (BOM) 常见问题 包括下表列出了确切的 BOM每个编码的字节数:
Unicode Byte Order Mark (BOM) FAQ includes the following table listing the exact BOM bytes for each encoding:
Bytes | Encoding Form
--------------------------------------
00 00 FE FF | UTF-32, big-endian
FF FE 00 00 | UTF-32, little-endian
FE FF | UTF-16, big-endian
FF FE | UTF-16, little-endian
EF BB BF | UTF-8
因此,您可以从上表中看到 xefxbbxbf
如何对应 EF BB BF
UTF-8
BOM 字节.
Thus, you can see how xefxbbxbf
corresponds to EF BB BF
UTF-8
BOM bytes from the above table.
这篇关于使用awk去除Byte-order标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!