如何检查文件是否为有效的 UTF-8? [英] How to check whether a file is valid UTF-8?

查看：42 发布时间：2022/1/18 12:58:05 validation utf-8 internationalization

本文介绍了如何检查文件是否为有效的 UTF-8?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理一些应该是有效的 UTF-8 但不是的数据文件，这会导致解析器(不在我的控制之下)失败.我想添加一个预先验证 UTF-8 格式良好的数据的阶段，但我还没有找到一个实用程序来帮助做到这一点.

I'm processing some data files that are supposed to be valid UTF-8 but aren't, which causes the parser (not under my control) to fail. I'd like to add a stage of pre-validating the data for UTF-8 well-formedness, but I've not yet found a utility to help do this.

W3C 上有一个网络服务，它似乎是死了，我发现了一个仅限 Windows 的验证工具报告无效的 UTF-8 文件但不报告要修复的行/字符.

There's a web service at W3C which appears to be dead, and I've found a Windows-only validation tool that reports invalid UTF-8 files but doesn't report which lines/characters to fix.

我会很高兴有一个我可以放入并使用的工具(理想情况下是跨平台的)，或者我可以作为数据加载过程的一部分的 ruby/perl 脚本.

I'd be happy with either a tool I can drop in and use (ideally cross-platform), or a ruby/perl script I can make part of my data loading process.

推荐答案

你可以使用GNU iconv:

You can use GNU iconv:

$ iconv -f UTF-8 your_file -o /dev/null; echo $?

或者使用旧版本的 iconv，例如在 macOS 上:

Or with older versions of iconv, such as on macOS:

$ iconv -f UTF-8 your_file > /dev/null; echo $?

如果文件可以转换成功，该命令将返回 0，否则返回 1.此外，它还会打印出出现无效字节序列的字节偏移量.

The command will return 0 if the file could be converted successfully, and 1 if not. Additionally, it will print out the byte offset where the invalid byte sequence occurred.

编辑:输出编码不必指定，假定为UTF-8.

Edit: The output encoding doesn't have to be specified, it will be assumed to be UTF-8.

这篇关于如何检查文件是否为有效的 UTF-8?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何检查文件是否为有效的 UTF-8? [英] How to check whether a file is valid UTF-8?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何检查文件是否为有效的 UTF-8? [英] How to check whether a file is valid UTF-8?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭