使用awk去除Byte-order标记 [英] Using awk to remove the Byte-order mark

查看：32 发布时间：2021/12/26 13:23:37 unicode awk byte-order-mark

本文介绍了使用awk去除Byte-order标记的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

awk 脚本(大概是单行)如何删除 BOM 看起来像什么?

How would an awk script (presumably a one-liner) for removing a BOM look like?

规格:

打印第一行之后的每一行 (NR > 1)
对于第一行:如果它以 #FE #FF 或 #FF #FE 开头，删除它们并打印其余部分

print every line after the first (NR > 1)
for the first line: If it starts with #FE #FF or #FF #FE, remove those and print the rest

推荐答案

试试这个:

awk 'NR==1{sub(/^xefxbbxbf/,"")}{print}' INFILE > OUTFILE

在第一条记录(行)上，删除 BOM 字符.打印每条记录.

On the first record (line), remove the BOM characters. Print every record.

或者稍微短一点，使用awk中的默认操作是打印记录的知识:

Or slightly shorter, using the knowledge that the default action in awk is to print the record:

awk 'NR==1{sub(/^xefxbbxbf/,"")}1' INFILE > OUTFILE

1 是最短的条件，总是评估为真，因此打印每条记录.

1 is the shortest condition that always evaluates to true, so each record is printed.

享受吧！

-- 附录 --

Unicode 字节顺序标记 (BOM) 常见问题包括下表列出了确切的 BOM每个编码的字节数:

Unicode Byte Order Mark (BOM) FAQ includes the following table listing the exact BOM bytes for each encoding:

Bytes         |  Encoding Form
--------------------------------------
00 00 FE FF   |  UTF-32, big-endian
FF FE 00 00   |  UTF-32, little-endian
FE FF         |  UTF-16, big-endian
FF FE         |  UTF-16, little-endian
EF BB BF      |  UTF-8

因此，您可以从上表中看到 xefxbbxbf 如何对应 EF BB BF UTF-8 BOM 字节.

Thus, you can see how xefxbbxbf corresponds to EF BB BF UTF-8 BOM bytes from the above table.

这篇关于使用awk去除Byte-order标记的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用awk去除Byte-order标记 [英] Using awk to remove the Byte-order mark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用awk去除Byte-order标记 [英] Using awk to remove the Byte-order mark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭