如何从 UTF-8 文件中删除 BOM? [英] How can I remove the BOM from a UTF-8 file?

查看：28 发布时间：2021/12/27 22:55:16 linux file command-line utf-8 byte-order-mark

本文介绍了如何从 UTF-8 文件中删除 BOM?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有 BOM 的 UTF-8 编码文件，我想删除 BOM.是否有任何 Linux 命令行工具可以从文件中删除 BOM?

I have a file in UTF-8 encoding with BOM and want to remove the BOM. Are there any linux command-line tools to remove the BOM from the file?

$ file test.xml
test.xml:  XML 1.0 document, UTF-8 Unicode (with BOM) text, with very long lines

推荐答案

BOM 是 Unicode 代码点 U+FEFF;UTF-8 编码由三个十六进制值 0xEF、0xBB、0xBF 组成.

A BOM is Unicode codepoint U+FEFF; the UTF-8 encoding consists of the three hex values 0xEF, 0xBB, 0xBF.

使用 bash，您可以创建带有 $'' 特殊引用形式的 UTF-8 BOM，它实现了 Unicode 转义:$'uFEFF'.因此，使用 bash，从文本文件的开头删除 UTF-8 BOM 的可靠方法是:

With bash, you can create a UTF-8 BOM with the $'' special quoting form, which implements Unicode escapes: $'uFEFF'. So with bash, a reliable way of removing a UTF-8 BOM from the beginning of a text file would be:

sed -i $'1s/^uFEFF//' file.txt

如果文件不是以 UTF-8 BOM 开头，这将保持文件不变，否则删除 BOM.

This will leave the file unchanged if it does not start with a UTF-8 BOM, and otherwise remove the BOM.

如果您使用其他 shell，您可能会发现 "$(printf 'ufeff')" 生成 BOM 字符(与 zsh 一起使用)以及任何没有内置 printf 的 shell，前提是 /usr/bin/printf 是 Gnu 版本)，但如果你想要一个与 Posix 兼容的版本，你可以使用:

If you are using some other shell, you might find that "$(printf 'ufeff')" produces the BOM character (that works with zsh as well as any shell without a printf builtin, provided that /usr/bin/printf is the Gnu version ), but if you want a Posix-compatible version you could use:

sed "$(printf '1s/^357273277//')" file.txt

(-i 就地编辑标志也是 Gnu 扩展；此版本将可能修改的文件写入标准输出.)

(The -i in-place edit flag is also a Gnu extension; this version writes the possibly-modified file to stdout.)

这篇关于如何从 UTF-8 文件中删除 BOM?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从 UTF-8 文件中删除 BOM? [英] How can I remove the BOM from a UTF-8 file?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

如何从 UTF-8 文件中删除 BOM? [英] How can I remove the BOM from a UTF-8 file?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭