PHP:转换“”的问题字符从ISO-8859-1到UTF-8 [英] PHP: Problems converting "’" character from ISO-8859-1 to UTF-8

查看:1885
本文介绍了PHP:转换“”的问题字符从ISO-8859-1到UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用PHP将ISO-8859-1数据库内容转换为UTF-8时遇到一些问题。我运行以下代码测试:

I'm having some issues with using PHP to convert ISO-8859-1 database content to UTF-8. I am running the following code to test:

// Connect to a latin1 charset database 
// and retrieve "Georgia O’Keeffe", which contains a "’" character
$connection = mysql_connect('*****', '*****', '*****');
mysql_select_db('*****', $connection);
mysql_set_charset('latin1', $connection);
$result = mysql_query('SELECT notes FROM categories WHERE id = 16', $connection);
$latin1Str = mysql_result($result, 0);
$latin1Str = substr($latin1Str, strpos($latin1Str, 'Georgia'), 16);

// Try to convert it to UTF-8
$utf8Str = iconv('ISO-8859-1', 'UTF-8', $latin1Str);

// Output both
var_dump($latin1Str);
var_dump($utf8Str);

当我在Firefox的源代码视图中运行时,确保Firefox的编码设置设置为Western -8859-1),我得到:

When I run this in Firefox's source view, making sure Firefox's encoding setting is set to "Western (ISO-8859-1)", I get this:

到目前为止,还不错。

So far, so good. The first output contains that weird quote and I can see it correctly because it's in ISO-8859-1 and so is Firefox.

在我将Firefox的编码设置更改为UTF后,第一个输出包含这个奇怪的报价,我可以正确地看到它,因为它在ISO-8859-1中。 -8,它看起来像这样:

After I change Firefox's encoding setting to "UTF-8", it looks like this:

报价去了哪里? iconv()是否应该将其转换为UTF-8?

Where did the quote go? Wasn't iconv() supposed to convert it to UTF-8?

推荐答案

U + 2019 RIGHT SINGLE QUOTATION MARK不是ISO-8859-1中的字符。它是 windows-1252 中的字符,为0x92。实际的ISO-8859-1字符0x92是很少使用的 C1控制字符,称为私有使用2。

U+2019 RIGHT SINGLE QUOTATION MARK is not a character in ISO-8859-1. It is a character in windows-1252, as 0x92. The actual ISO-8859-1 character 0x92 is a rarely-used C1 control character called "Private Use 2".


这是很常见的错误标记
Windows-1252文本数据与
字符集标签ISO -8859-1。许多web
浏览器和电子邮件客户端将
MIME字符集ISO-8859-1视为
Windows-1252字符,以便
容纳这样的错误标记,但它是
不是标准的行为和关心应该
,以避免生成这些
字符在ISO-8859-1标记为
内容。

It is very common to mislabel Windows-1252 text data with the charset label ISO-8859-1. Many web browsers and e-mail clients treat the MIME charset ISO-8859-1 as Windows-1252 characters in order to accommodate such mislabeling but it is not standard behaviour and care should be taken to avoid generating these characters in ISO-8859-1 labeled content.

看来这是发生在这里。将ISO-8859-1更改为windows-1252。

It appears that this is what's happening here. Change "ISO-8859-1" to "windows-1252".

这篇关于PHP:转换“”的问题字符从ISO-8859-1到UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆