php中的UTF-8问题:var_export()返回\ 0空字符,并且ucfirst(),strtoupper()等行为异常 [英] UTF-8 problems in php: var_export() returns \0 null characters, and ucfirst(), strtoupper(), etc. behave strangely

查看:128
本文介绍了php中的UTF-8问题:var_export()返回\ 0空字符,并且ucfirst(),strtoupper()等行为异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在处理Joyent Solaris服务器中从未发生过的一个奇怪的错误(在localhost或其他两个具有相同php配置的Solaris服务器中不会​​发生).实际上,我不确定我们是否必须查看php或solaris,以及它是否存在软件或硬件问题...

We are dealing with a strange bug in a Joyent Solaris server that never happened before (doesn't happen in localhost or two other Solaris servers with identical php configuration). Actually, I'm not sure if we have to look at php or solaris, and if it is a software or hardware problem...

我只想发布此内容,以防有人可以将我们指向正确的方向.

I just want to post this in case somebody can point us in the right direction.

因此,在处理奇怪字符时问题似乎出在var_export()上. 在CLI中执行此操作,我们在本地主机和两台服务器中获得了预期的结果,但在第三台服务器中没有得到预期的结果.所有这些都配置为可与utf-8一起使用.

So, the problem seems to be in var_export()when dealing with strange characters. Executing this in the CLI, we get the expected result in our localhost machines and in two of the servers, but not in the 3rd one. All of them are configured to work with utf-8.

$ php -r "echo var_export('ñu', true);"

在较旧的服务器和本地主机中提供此功能(预期):

Gives this in older servers and localhost (expected):

'ñu'

但是在服务器中,我们遇到( PHP版本=> 5.3.6 )的问题,只要遇到不常见"字符,它就会添加\0空字符:è,á,ç ,...命名.

But in the server we are having problems with (PHP Version => 5.3.6), it adds \0 null characters whenever it encounters an "uncommon" character: è, á, ç, ... you name it.

'' . "\0" . '' . "\0" . 'u'

关于应该在哪里查看的任何想法?预先感谢.

Any idea on where should be looking at? Thanks in advance.

更多信息:

  • PHP version 5.3.6.
  • setlocale()没有解决任何问题.
  • default_charsetphp.ini中的UTF-8.
  • php.ini中将
  • mbstring.internal_encoding设置为UTF-8.
  • mbstring.func_overload = 0.
  • 这在CLI(示例)和Web应用程序(php-fpm + nginx)中都会发生.
  • iconv编码也是UTF-8
  • 所有已编码utf-8的文件.
  • PHP version 5.3.6.
  • setlocale() is not solving anything.
  • default_charset is UTF-8 in php.ini.
  • mbstring.internal_encoding is set to UTF-8 in php.ini.
  • mbstring.func_overload = 0.
  • this happens in both CLI (example) and web application (php-fpm + nginx).
  • iconv encoding is also UTF-8
  • all files utf-8 encoded.

system('locale')返回:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=


到目前为止已完成的某些测试(CLI):

正常行为:

$ php -r "echo bin2hex('ñu');" => 'c3b175'
$ php -r "echo mb_strtoupper('ñu');" => 'ÑU'
$ php -r "echo serialize(\"\\xC3\\xB1\");" => 's:2:"ñ";'
$ php -r "echo bin2hex(addcslashes(b\"\\xC3\\xB1\", \"'\\\\\"));" => 'c3b1'
$ php -r "echo ucfirst('iñu');" => 'Iñu'

不正常:

$ php -r "echo strtoupper('ñu');" => 'U' 
$ php -r "echo ucfirst('ñu');" => '?u' 
$ php -r "echo ucfirst(b\"\\xC3\\xB1u\");" => '?u' 
$ php -r "echo bin2hex(ucfirst('ñu'));" => '00b175'
$ php -r "echo bin2hex(var_export('ñ', 1));" => '2727202e20225c3022202e202727202e20225c3022202e202727'
$ php -r "echo bin2hex(var_export(b\"\\xC3\\xB1\", 1));" => '2727202e20225c3022202e202727202e20225c3022202e202727'

所以问题似乎出在var_export()使用当前语言环境但按字节操作的字符串函数" 文档 (查看@hakre的答案).

So the problem seems to be in var_export() and "string functions that use the current locale but operate byte-by-byte" Docs (view @hakre's answer).

推荐答案

我建议您验证遇到问题的PHP二进制文件.检查编译器标志及其使用的库.

I suggest you verify the PHP binary you've got problems with. Check the compiler flags and the libraries it makes use of.

通常,PHP内部使用二进制字符串,这意味着ucfirst之类的函数逐字节工作,并且仅支持您的语言环境支持的功能(如果已配置).请参见 字符串的详细信息键入 文档 .

Normally PHP internally uses binary strings, which means that functions like ucfirst work byte-to-byte and only support what your locale support (if and like configured). See Details of the String TypeDocs.

$ php -r "echo ucfirst('ñu');" 

返回

?u

这很有意义,ñ

LATIN SMALL LETTER N WITH TILDE (U+00F1)    UTF8: \xC3\xB1

您配置了一些语言环境,使PHP将\xC3更改为其他语言,破坏了UTF-8字节序列,并使您的shell显示

You have some locale configured that makes PHP change \xC3 into something else, breaking the UTF-8 byte-sequence and making your shell display the � replacement characterWikipedia.

我建议如果您真的要分析问题,则应从(这是前向兼容性,也许您已经启用了一些编译标志,而您正在进行unicode实验吗?),还可以按字面意义编写字符串,此处为UTF-的十六进制方式8:

I suggest if you really want to analyze the issues, you should start with hexdumps next to how things get displayed in shell and elsewhere. Know that you can explicitly define binrary strings b"string" (that's forward compatibility, mabye you've got enabled some compile flag and you're on unicode experimental?), and also you can write strings literally, here hex-way for UTF-8:

 $ php -r "echo ucfirst(b\"\\xC3\\xB1u\");"

还有很多设置可以发挥作用,我开始在 准备工作的答案中列出一些要点与UTF-8 一起使用的PHP应用程序.

And there are a lot more settings that can play a role, I started to list some points in an answer to Preparing PHP application to use with UTF-8.

多字节ucfirst变体的示例:

/**
 * multibyte ucfirst
 *
 * @param string $str
 * @param string|null $encoding (optional)
 * @return string
 */
function mb_ucfirst($str, $encoding = NULL)
{
    $first = mb_substr($str, 0, 1, $encoding);
    $rest = mb_substr($str, 1, strlen($str), $encoding);
    return mb_strtoupper($first, $encoding) . $rest;
}

请参见 mb_strtoupper 文档 和很好 mb_convert_case 文档 .

See mb_strtoupperDocs and as well mb_convert_caseDocs.

这篇关于php中的UTF-8问题:var_export()返回\ 0空字符,并且ucfirst(),strtoupper()等行为异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆