如果default_charset为空,字符集是什么 [英] What is the character set if default_charset is empty

查看:146
本文介绍了如果default_charset为空,字符集是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从PHP 5.6开始, default_charset 字符串设置为 UTF-8 ,例如, php.ini 文档。它说早期版本的字符串是空的。

In PHP 5.6 onwards the default_charset string is set to "UTF-8" as explained e.g. in the php.ini documentation. It says that the string is empty for earlier versions.

当我创建一个Java库以与PHP进行通信时,我需要知道在使用字符串时应该期待哪些值在内部作为字节处理。如果 default_charset 字符串为空,并且(文字)字符串包含ASCII范围以外的字符,会发生什么情况?我应该使用平台的默认字符编码还是源文件使用的字符编码?

As I am creating a Java library to communicate with PHP, I need to know which values I should expect when a string is handled as bytes internally. What happens if the default_charset string is empty and a (literal) string contains characters outside the range of ASCII? Should I expect the default character encoding of the platform, or the character encoding used for the source file?

推荐答案

简短答案



对于文字字符串-始终是源文件编码。 default_charset 值在这里什么也不做。

Short answer

For literal strings -- always source file encoding. default_charset value does nothing here.

PHP字符串是二进制安全的,这意味着它们没有任何内部字符串编码。

PHP strings are "binary safe" meaning they do not have any internal string encoding. Basically string in PHP are just buffers of bytes.

对于文字字符串,例如。 $ s =Ä 这意味着字符串将包含引号之间文件中保存的所有字节。如果文件保存在 UTF-8 中,则等同于 $ s = \xc3\x84 ,如果文件保存在 ISO-8859-1 (拉丁语1),这等效于 $ s = \xc4

For literal strings e.g. $s = "Ä" this means that string will contain whatever bytes were saved in file between quotes. If file was saved in UTF-8 this will be equivalent to $s = "\xc3\x84", if file was saved in ISO-8859-1 (latin1) this will be equivalent to $s = "\xc4".

设置 default_charset 值不会以任何方式影响存储在字符串中的字节。

Setting default_charset value does not affect bytes stored in strings in any way.

某些函数必须将字符串作为 text 并且具有编码意识,请接受 $ encoding 作为参数(通常是可选的)。

Some functions, that have to deal with strings as text and are encoding aware, accept $encoding as argument (usually optional). This tells the function what encoding the text is encoded in a string.

在PHP 5.6之前,这些可选的 $ encoding 参数要么在函数定义中(例如 htmlspecialchars()),要么可以分别为每个扩展名在各种php.ini设置中配置(例如 mbstring.internal_encoding iconv.input_encoding )。

Before PHP 5.6 default value of these optional $encoding arguments were either in function definition (e.g. htmlspecialchars()) or configurable in various php.ini settings for each extension separately (e.g. mbstring.internal_encoding, iconv.input_encoding).

在PHP 5.6中,新的php.ini设置<引入了code> default_charset 。不建议使用旧设置,并且当未指定编码时,所有接受可选 $ encoding 参数的函数现在应默认为 default_charset

In PHP 5.6 new php.ini setting default_charset was introduced. Old settings were deprecated and all functions that accept optional $encoding argument should now default to default_charset value when encoding is not specified explicitly.

但是,开发人员有责任确保字符串中的文本实际上是使用指定的编码进行编码的。

However, developer is left responsible to make sure that text in string is actually encoded in encoding that was specified.

链接:

  • Details of the String Type
    More details on nature of PHP strings (does not mention default_charset at the time of writing).
  • New features in PHP 5.6: Default character encoding
    Short introduction of new default_charset option in PHP 5.6 release notes.
  • Deprecated features in PHP 5.6: iconv and mbstring encoding settings
    List of deprecated php.ini options in favour of default_chaset option.

这篇关于如果default_charset为空,字符集是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆