如何将文件从ASCII转换为UTF-8? [英] How to convert a file from ASCII to UTF-8?

查看:1219
本文介绍了如何将文件从ASCII转换为UTF-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将一堆文件从ASCII转换为UTF-8.

I'm trying to transcode a bunch a files from ASCII to UTF-8.

为此,我尝试使用iconv:

iconv -f US-ASCII -t UTF-8 infile > outfile

-f ENCODING输入的编码

-t ENCODING输出的编码

仍然该文件未转换为UTF-8.这是一个.dat文件.

Still that file didn't convert to UTF-8. It is a .dat file.

在发布此信息之前,我搜索了Google并发现了以下信息:

Before posting this, I searched Google and found information like:

ASCII是UTF-8的子集,因此所有ASCII文件均已采用UTF-8编码. ASCII文件中的字节和将其编码为UTF-8"所产生的字节将完全相同.它们之间没有区别.

ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded. The bytes in the ASCII file and the bytes that would result from "encoding it to UTF-8" would be exactly the same bytes. There's no difference between them.

从US-ASCII强制编码为UTF-8( iconv)

在字符集之间转换文本文件的最佳方法?

仍然上面的链接没有帮助.

Still the above links didn't help.

即使采用ASCII格式,它也将支持UTF-8,因为UTF-8是超集,要从我这里接收文件的另一方也需要将文件编码为UTF-8.他只需要UTF-8这样的文件格式即可.

Even though it is in ASCII it will support UTF-8 as UTF-8 is a super set, the other party who is going to receive the files from me need file encoding as UTF-8. He just need file format as UTF-8.

请提出任何建议.

推荐答案

这个问题让我有些困惑,因为正如您所指出的,ASCII是UTF-8的子集,因此所有ASCII文件都已经是UTF- 8个已编码.

I'm a little confused by the question, because, as you indicated, ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded.

如果您要向另一方发送仅包含ASCII字符的文件,但是另一方抱怨它们不是"UTF-8编码"的,那么我猜他们是在指这样的事实: ASCII文件没有字节顺序标记,明确指示内容为UTF-8.

If you're sending files containing only ASCII characters to the other party, but the other party is complaining that they're not 'UTF-8 Encoded', then I would guess that they're referring to the fact that the ASCII file has no byte order mark explicitly indicating the contents are UTF-8.

如果确实如此,则可以在此处使用答案添加字节顺序标记:

If that is indeed the case, then you can add a byte order mark using the answer here:

iconv:从Windows ANSI转换为UTF- 8,带BOM

如果另一方指示他不需要"BOM"(字节顺序标记),但仍抱怨文件不是UTF-8,则另一种可能性是您的初始文件实际上不是ASCII,而是而是包含使用ANSI或ISO-8859-1编码的字符.

If the other party indicates that he does not need the 'BOM' (Byte Order Mark), but is still complaining that the files are not UTF-8, then another possibility is that your initial file is not actually ASCII, but rather contains characters that are encoded using ANSI or ISO-8859-1.

经过编辑,添加了以下实验,这是Ram提出的有关另一方使用文件"命令寻找类型的评论之后

Tims-MacBook-Pro:~ tjohns$ echo 'Stuff' > deleteme
Tims-MacBook-Pro:~ tjohns$ cat deleteme
Stuff
Tims-MacBook-Pro:~ tjohns$ file -I deleteme
deleteme: text/plain; charset=us-ascii
Tims-MacBook-Pro:~ tjohns$ echo -ne '\xEF\xBB\xBF' > deleteme
Tims-MacBook-Pro:~ tjohns$ echo 'Stuff' >> deleteme
Tims-MacBook-Pro:~ tjohns$ cat deleteme
Stuff
Tims-MacBook-Pro:~ tjohns$ file -I deleteme
deleteme: text/plain; charset=utf-8

这篇关于如何将文件从ASCII转换为UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆