如何将文件从ASCII转换为UTF-8? [英] How to covert a file from ASCII to UTF-8?

查看:1405
本文介绍了如何将文件从ASCII转换为UTF-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



为此,我尝试使用iconv:



iconv -f US-ASCII -t UTF-8 infile> outfile



-f编码输入的编码



-t编码输出的编码



仍然该文件未转换为UTF-8。它是.dat文件。



在发布之前,我在google中搜索到的信息,如



ASCII是UTF-8的一个子集,所以所有ASCII文件都已经被UTF-8编码了。 ASCII文件中的字节和将其编码为UTF-8将产生的字节将完全相同。他们之间没有区别。



从US-ASCII到UTF-8(iconv)的强制编码



在字符集之间转换文本文件的最佳方式



仍然上述链接没有帮助。



即使它是ASCII,它将支持UTF-8,因为UTF-8超级设置对方谁要接收我的文件需要文件编码为UTF-8。他只需要文件格式为UTF-8。



任何建议请。



感谢提前。 p>

解决方案

我有点困惑的问题,因为正如你所说,ASCII是UTF-8的一个子集,所以所有ASCII文件已经是UTF-8编码。



如果您发送的文件只包含ASCII字符给另一方,但对方抱怨他们不是'UTF-8 Encoded',那么我猜测它们是指ASCII文件没有明确指出内容是UTF-8的字节顺序标记。



如果确实是这样,那么你可以使用这里的答案添加一个字节顺序标记:



iconv:使用BOM从Windows ANSI转换为UTF-8



如果对方表示他不需要BOM(字节)但是仍然抱怨文件不是UTF-8,那么另一个可能性是你的初始文件实际上不是ASCII,而是包含使用ANSI或ISO-8859-1编码的字符。



编辑以添加以下实验,在Ram的评论后,使用file命令查找该类型的另一方

  Tims-MacBook-Pro:〜tjohns $ echo'Stuff'> deleteme 
Tims-MacBook-Pro:〜tjohns $ cat deleteme
Stuff
Tims-MacBook-Pro:〜tjohns $ file -I deleteme
deleteme:text / plain; charset = us-ascii
Tims-MacBook-Pro:〜tjohns $ echo -ne'\xEF\xBB\xBF'> deleteme
Tims-MacBook-Pro:〜tjohns $ echo'Stuff'>> deleteme
Tims-MacBook-Pro:〜tjohns $ cat deleteme
Stuff
Tims-MacBook-Pro:〜tjohns $ file -I deleteme
deleteme:text / plain; charset = utf-8


I'm trying to transcode a bunch a files from ASCII to UTF-8.

For that, I tried using iconv:

iconv -f US-ASCII -t UTF-8 infile > outfile

-f ENCODING the encoding of the input

-t ENCODING the encoding of the output

Still that file didn't converted to UTF-8. It is .dat file.

Before posting this I searched in the google found information like

ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded. The bytes in the ASCII file and the bytes that would result from "encoding it to UTF-8" would be exactly the same bytes. There's no difference between them.

Force encode from US-ASCII to UTF-8 (iconv)

Best way to convert text files between character sets?

Still the above links didn't helped.

Even though it is in ASCII it will support UTF-8 as UTF-8 is super set the other party who is going to receive the files from me need file encoding as UTF-8. He just need file format as UTF-8.

Any suggestions please.

Thanks in Advance.

解决方案

I'm a little confused by the question, because, as you indicated, ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded.

If you're sending files containing only ASCII characters to the other party, but the other party is complaining that they're not 'UTF-8 Encoded', then I would guess that they're referring to the fact that the ASCII file has no byte order mark explicitly indicating the contents are UTF-8.

If that is indeed the case, then you can add a byte order mark using the answer here:

iconv: Converting from Windows ANSI to UTF-8 with BOM

If the other party indicates that he does not need the 'BOM' (Byte Order Mark), but is still complaining that the files are not UTF-8, then another possibility is that your initial file is not actually ASCII, but rather contains characters that are encoded using ANSI or ISO-8859-1.

Edited to add the following experiment, after comment from Ram regarding the other party looking for the type using the 'file' command

Tims-MacBook-Pro:~ tjohns$ echo 'Stuff' > deleteme
Tims-MacBook-Pro:~ tjohns$ cat deleteme
Stuff
Tims-MacBook-Pro:~ tjohns$ file -I deleteme
deleteme: text/plain; charset=us-ascii
Tims-MacBook-Pro:~ tjohns$ echo -ne '\xEF\xBB\xBF' > deleteme
Tims-MacBook-Pro:~ tjohns$ echo 'Stuff' >> deleteme
Tims-MacBook-Pro:~ tjohns$ cat deleteme
Stuff
Tims-MacBook-Pro:~ tjohns$ file -I deleteme
deleteme: text/plain; charset=utf-8

这篇关于如何将文件从ASCII转换为UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆