如何防止json_encode()删除带有无效字符的字符串 [英] How to keep json_encode() from dropping strings with invalid characters

查看:97
本文介绍了如何防止json_encode()删除带有无效字符的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一种方法可以防止json_encode()对于包含无效(非UTF-8)字符的字符串返回null?

Is there a way to keep json_encode() from returning null for a string that contains an invalid (non-UTF-8) character?

在复杂的系统中调试可能会很麻烦.实际看到无效字符,或者至少将其省略会更合适.照原样,json_encode()会静默删除整个字符串.

It can be a pain in the ass to debug in a complex system. It would be much more fitting to actually see the invalid character, or at least have it omitted. As it stands, json_encode() will silently drop the entire string.

示例(在UTF-8中):

Example (in UTF-8):

$string = 
  array(utf8_decode("Düsseldorf"), // Deliberately produce broken string
        "Washington",
        "Nairobi"); 

print_r(json_encode($string));

结果

[null,"Washington","Nairobi"]

所需结果:

["D�sseldorf","Washington","Nairobi"]

注意:我希望使残破的字符串在json_encode()中起作用.我正在寻找使诊断编码错误更容易的方法. null字符串对此无济于事.

Note: I am not looking to make broken strings work in json_encode(). I am looking for ways to make it easier to diagnose encoding errors. A null string isn't helpful for that.

推荐答案

php确实会产生错误,但是仅当您关闭display_errors时.这很奇怪,因为display_errors设置仅用于控制是否将错误打印到标准输出,而不是控制是否触发错误.我想强调一点,即使您打开display_errors,即使您可能会看到其他各种php错误,php也不只是隐藏此错误,它甚至不会触发它.这意味着它将不会显示在任何错误日志中,也不会调用任何自定义的error_handlers.该错误永远不会发生.

php does try to spew an error, but only if you turn display_errors off. This is odd because the display_errors setting is only meant to control whether or not errors are printed to standard output, not whether or not an error is triggered. I want to emphasize that when you have display_errors on, even though you may see all kinds of other php errors, php doesn't just hide this error, it will not even trigger it. That means it will not show up in any error logs, nor will any custom error_handlers get called. The error just never occurs.

下面的代码演示了这一点:

Here's some code that demonstrates this:

error_reporting(-1);//report all errors
$invalid_utf8_char = chr(193);

ini_set('display_errors', 1);//display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());//nothing

ini_set('display_errors', 0);//do not display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());// json_encode(): Invalid UTF-8 sequence in argument

该异常错误的行为与该错误有关 https://bugs.php .net/bug.php?id = 47494 和其他一些,而且看起来永远不会被修复.

That bizarre and unfortunate behavior is related to this bug https://bugs.php.net/bug.php?id=47494 and a few others, and doesn't look like it will ever be fixed.

解决方法:

在将字符串传递给json_encode之前清理字符串可能是可行的解决方案.

Cleaning the string before passing it to json_encode may be a workable solution.

$stripped_of_invalid_utf8_chars_string = iconv('UTF-8', 'UTF-8//IGNORE', $orig_string);
if ($stripped_of_invalid_utf8_chars_string !== $orig_string) {
    // one or more chars were invalid, and so they were stripped out.
    // if you need to know where in the string the first stripped character was, 
    // then see http://stackoverflow.com/questions/7475437/find-first-character-that-is-different-between-two-strings
}
$json = json_encode($stripped_of_invalid_utf8_chars_string);

http://php.net/manual/en/function.iconv.php

手册说

//IGNORE静默丢弃目标中的非法字符 字符集.

//IGNORE silently discards characters that are illegal in the target charset.

因此,通过首先删除有问题的字符,理论上json_encode()不应该得到任何会窒息而失败的东西.我尚未验证带有//IGNORE标志的iconv的输出是否与有效utf8字符的json_encodes概念完全兼容,因此请当心……在某些情况下它仍然会失败.恩,我讨厌字符集问题.

So by first removing the problematic characters, in theory json_encode() shouldnt get anything it will choke on and fail with. I haven't verified that the output of iconv with the //IGNORE flag is perfectly compatible with json_encodes notion of what valid utf8 characters are, so buyer beware...as there may be edge cases where it still fails. ugh, I hate character set issues.

修改
在php 7.2+中,json_encode似乎有一些新标记: JSON_INVALID_UTF8_IGNOREJSON_INVALID_UTF8_SUBSTITUTE
尚无足够的文档,但是就目前而言,此测试应该可以帮助您了解预期的行为: https://github.com/php/php-src/blob/master/ext/json/tests/json_encode_invalid_utf8.phpt

Edit
in php 7.2+, there seems to be some new flags for json_encode: JSON_INVALID_UTF8_IGNORE and JSON_INVALID_UTF8_SUBSTITUTE
There's not much documentation yet, but for now, this test should help you understand expected behavior: https://github.com/php/php-src/blob/master/ext/json/tests/json_encode_invalid_utf8.phpt

并且,在php 7.3+中,有新的标志JSON_THROW_ON_ERROR.参见 http://php.net/manual/zh/class.jsonexception.php

And, in php 7.3+ there's the new flag JSON_THROW_ON_ERROR. See http://php.net/manual/en/class.jsonexception.php

这篇关于如何防止json_encode()删除带有无效字符的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆