如何在编码之间转换文件,其中只有一些是错误的? [英] How do I convert files between encodings where only some of them are wrong?

查看:147
本文介绍了如何在编码之间转换文件,其中只有一些是错误的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一大套嵌套目录,其中包含PHP,HTML和Javascript文件,应该全部编码为UTF-8。但是,有人编辑了几个文件,并用ISO-8859-1编码保存。不幸的是,它们都混合了UTF-8文件。

I have a large set of nested directories containing PHP, HTML, and Javascript files that should all be encoded as UTF-8. However, someone edited several of the files and saved them with ISO-8859-1 encoding. Unfortunately, they're all mixed in with the UTF-8 files.

我想使用 iconv 工具将未正确编码的文件转换为UTF-8(如这里)。首先,问题出现在有效的ISO-8859-1但无效的UTF-8的字符。

I'd like to use the iconv tool to convert the incorrectly-encoded files to UTF-8 (as described here). Primarily, the problems occur with characters that are valid ISO-8859-1 but invalid UTF-8.

我认为一个合适的起点是找到所有包含无效的UTF-8。这是一个很好的方法吗?

I think an appropriate starting point would be to find all files that contain invalid UTF-8. What's a good way to do this?

我意识到这不会捕获所有的错误字符可能显示的情况。关于我如何解决这个混乱的任何进一步的提示?

I realise this won't catch all of the cases where the wrong character might be displayed. Any further tips on how I might fix this mess?

推荐答案

这将是一个黑客,但因为它是一个关闭发生,那么它可能是值得的。 iconv将抱怨无效编码,如果它无法读取使用您给它的编码的文件。因此,您可以编写一个包装器脚本来遍历所有文件,尝试将它们从UTF-8转换为其他文件,而无法转换的文件则具有无效的UTF-8。

This would be a bit of a hack, but since it's a one-off occurrence then it might be worth it. iconv will complain about invalid encoding if it can't read the file using the encoding you give it. Therefore, you could write a wrapper script to iterate over all the files, attempting to convert them from UTF-8 to something else, and those that can't be converted have invalid UTF-8.

这篇关于如何在编码之间转换文件,其中只有一些是错误的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆