将Mac Roman字符转换为等效的UTF-8 [英] Converting Mac Roman character to equivalent UTF-8

查看:344
本文介绍了将Mac Roman字符转换为等效的UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到了一些使用Mac OS Roman文件编码的HTML文件.这些文件具有法语文本,但在编辑器中,许多变音符看起来很奇怪(即非法语)

I have been given some HTML files that use the Mac OS Roman file encoding. The files have French text, but in an editor many of the diacritical chars look strange (i.e. non French)

Si cette option est sÈlectionnÈe, <removed> tentera de communiquer avec votre tÈlescope seulement ‡ líaide díun ...

带有重音符号的大写字母E与其他奇怪字符一样在é中正确显示在浏览器中.

The capital E with accent does display properly in the browser as é as do the other strange characters.

我也有一些UTF-8法语文件,这些文件在编辑器中看起来很普通(é看起来像é).我想做的就是将所有Mac Roman文件都转换为UTF-8,以便于维护.

I also have some UTF-8 French files that look normal in an editor (é looks like é). What I'd like to do is convert all the Mac Roman files to UTF-8 for easier maintenance.

仅在编辑器中更改文件编码不会执行此操作.奇怪的人物还是很奇怪.

Simply changing the file encoding in the editor doesn't do this. The strange characters are still strange.

制作转换字典并在所有文件上进行查找/替换的时间短,有没有办法做到这一点?

Short of making a conversion dictionary and doing a Find/Replace on all the files, is there a way to do this?

推荐答案

如果在指定编码时编辑器显示不正确,则说明编码错误.您需要确定您实际使用的编码.

If your editor isn’t showing it correctly when you specify the encoding, you have given it the wrong encoding. You need to figure what encoding you really have.

您似乎有一个值为0xE9的字节,需要使用Unicode LATIN SMALL LETTER E WITH ACUTE字符. MacRoman 0xE9字节是一个LATIN CAPITAL LETTER E WITH GRAVE字符,这是您的编辑器正在显示的内容,因为您说它是MacRoman.但事实并非如此.

You appear to have a byte valued 0xE9 where you need a Unicode LATIN SMALL LETTER E WITH ACUTE character. A MacRoman 0xE9 byte is a LATIN CAPITAL LETTER E WITH GRAVE character, which is what your editor is displaying because you said it was MacRoman. But it is not.

但是,Unicode代码点U + 00E9确实是LATIN SMALL LETTER E WITH ACUTE.

However, Unicode code point U+00E9 is indeed LATIN SMALL LETTER E WITH ACUTE.

因此,您所在的位置不是[strong>不是 MacRoman,但几乎可以肯定是ISO-8859-1或ISO-8859-15.

Therefore, it is not MacRoman that you have there, but almost certainly ISO-8859-1 or ISO-8859-15.

所以使用类似的东西

$ iconv -f ISO-8859-1 -t UTF-8 < input.latin1 > output.utf8

进行转换.

这篇关于将Mac Roman字符转换为等效的UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆