mb_detect_encoding是否将ASCII检测为UTF-8? [英] mb_detect_encoding detects ASCII as UTF-8?

查看:153
本文介绍了mb_detect_encoding是否将ASCII检测为UTF-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试自动将导入的IPTC元数据从图像转换为UTF-8,以便基于PHP mb_函数存储在数据库中.

I'm trying to automatically convert imported IPTC metadata from images to UTF-8 for storage in a database based on the PHP mb_ functions.

当前看起来像这样:

$val = mb_convert_encoding($val, 'UTF-8', mb_detect_encoding($val));

但是,当为mb_detect_encoding()提供ASCII字符串(192-255的Latin1字段中的特殊字符)时,它将检测为utf-8,因此在下面的尝试中,将所有内容都转换为正确的utf-8字符已删除.

However, when mb_detect_encoding() is supplied an ASCII string (special characters in the Latin1-fields from 192-255) it detects it as UTF-8, hence in the following attempt to convert everything to proper UTF-8 all special characters are removed.

我尝试通过查找Latin1值来编写自己的方法,如果没有发生,我将继续让mb_detect_encoding决定它的含义.但是当我意识到无法确定其他编码不会为其他内容使用相同的字节值时,我停了下来.

I tried writing my own method by looking for Latin1 values and if none occured I would go on to letting mb_detect_encoding decide what it is. But I stopped midway when I realized that I can't be sure that other encoding don't use the same byte values for other things.

那么,有没有一种方法可以正确地检测ASCII并将其作为源编码馈入mb_convert_encoding?

So, is there a way to properly detect ASCII to feed to mb_convert_encoding as the source encoding?

推荐答案

指定自定义顺序(首先检测到ASCII)起作用.

Specifying a custom order, where ASCII is detected first, works.

mb_detect_encoding($val, 'ASCII,UTF-8,ISO-8859-15');

为完整起见,可用编码的列表位于 http://www.php.net/manual/en/mbstring.supported-encodings.php

For completeness, the list of available encodings is at http://www.php.net/manual/en/mbstring.supported-encodings.php

这篇关于mb_detect_encoding是否将ASCII检测为UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆