强制编码从 US-ASCII 到 UTF-8 (iconv) [英] Force encode from US-ASCII to UTF-8 (iconv)

查看:41
本文介绍了强制编码从 US-ASCII 到 UTF-8 (iconv)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将一堆文件从 US-ASCII 转码为 UTF-8.

I'm trying to transcode a bunch of files from US-ASCII to UTF-8.

为此,我使用 iconv:

For that, I'm using iconv:

iconv -f US-ASCII -t UTF-8 file.php > file-utf8.php

我的原始文件是 US-ASCII 编码的,这使得转换不会发生.显然这是因为 ASCII 是 UTF-8 的子集...

My original files are US-ASCII encoded, which makes the conversion not happen. Apparently it occurs because ASCII is a subset of UTF-8...

iconv US ASCII to UTF-8 or ISO-8859-15

并引用:

文本文件不需要以其他方式出现,直到非 ASCII人物介绍

There's no need for the textfile to appear otherwise until non-ASCII characters are introduced

没错.如果我在文件中引入一个非 ASCII 字符并保存它,让我们说 Eclipse、文件编码(字符集)切换为UTF-8.

True. If I introduce a non-ASCII character in the file and save it, let's say with Eclipse, the file encoding (charset) is switched to UTF-8.

就我而言,我想强制 iconv 无论如何将文件转码为 UTF-8.是否有非ASCII字符.

In my case, I'd like to force iconv to transcode the files to UTF-8 anyway. Whether there is non-ASCII characters in it or not.

注意:原因是我的 PHP 代码(非 ASCII 文件...)正在处理一些非 ASCII 字符串,导致这些字符串不能很好地解释(法语):

Note: The reason is my PHP code (non-ASCII files...) is dealing with some non-ASCII string, which causes the strings not to be well interpreted (french):

Il était une fois... l'homme série animée mythique d'Albert

Il était une fois... l'homme série animée mythique d'Albert

Barillé (Procidis), 1ère

Barillé (Procidis), 1ère

...

  • US ASCII -- is -- UTF-8 的子集(请参阅 Ned 的回答如下)
  • 意味着美国 ASCII 文件实际上是用 UTF-8
  • 编码的
  • 我的问题来自其他地方
    • US ASCII -- is -- a subset of UTF-8 (see Ned's answer below)
    • Meaning that US ASCII files are actually encoded in UTF-8
    • My problem came from somewhere else
    • 推荐答案

      ASCII 是 UTF-8 的一个子集,所以所有 ASCII 文件都已经是 UTF-8 编码的.ASCII 文件中的字节和将其编码为 UTF-8"所产生的字节将是完全相同的字节.它们之间没有区别,因此无需执行任何操作.

      ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded. The bytes in the ASCII file and the bytes that would result from "encoding it to UTF-8" would be exactly the same bytes. There's no difference between them, so there's no need to do anything.

      看起来您的问题是这些文件实际上不是 ASCII.您需要确定他们使用的编码,并正确转码.

      It looks like your problem is that the files are not actually ASCII. You need to determine what encoding they are using, and transcode them properly.

      这篇关于强制编码从 US-ASCII 到 UTF-8 (iconv)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆