获取文件编码 [英] Get file encoding

查看:90
本文介绍了获取文件编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


可能重复:

检测PHP中的文件编码

如何找出用PHP编写文件的文件是什么?

How can I figure out with PHP what file encoding a file has?

推荐答案

检测编码对于所有8位字符集而言非常困难,但utf-8 (因为不是每个8位字节序列都是有效的utf-8),并且通常需要对要对其进行编码的文本的语义知识。

Detecting the encoding is really hard for all 8 bit character sets but utf-8 (because not every 8 bit byte sequence is valid utf-8) and usually requires semantic knowledge of the text for which the encoding is to be detected.

想想:任何特定的纯文本信息只是一堆没有编码信息的字节。如果您查看任何特定字节,可能意味着任何,所以有机会检测编码,您必须在其他字节的上下文中查看该字节,并尝试基于可能的一些启发式

Think of it: Any particular plain text information is just a bunch of bytes with no encoding information associated. If you look at any particular byte, it could mean anything, so to have a chance at detecting the encoding, you would have to look at that byte in context of other bytes and try some heuristics based on possible language combination.

对于8位字符集,您无法确定。

For 8bit character sets you can never be sure though.

启发式出现错误的示例如下所示:

A demonstration of heuristics going wrong is here for example:

http://www.hoax-slayer.com/bush-hid-the-facts-notepad.html

一些16位集,您有机会检测,因为它们可能包含一个字节顺序标记或将每个第二个字节设置为0。

Some 16bit sets, you have a chance at detecting because they might include a byte order mark or have every second byte set to 0.

如果你只是想要检测UTF-8,您可以使用已解释的mb_detect_encoding,或者您可以使用这个方便的小功能:

If you just want to detect UTF-8, you can either use mb_detect_encoding as already explained, or you can use this handy little function:

function isUTF8($string){
    return preg_match('%(?:
    [\xC2-\xDF][\x80-\xBF]        # non-overlong 2-byte
    |\xE0[\xA0-\xBF][\x80-\xBF]               # excluding overlongs
    |[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}      # straight 3-byte
    |\xED[\x80-\x9F][\x80-\xBF]               # excluding surrogates
    |\xF0[\x90-\xBF][\x80-\xBF]{2}    # planes 1-3
    |[\xF1-\xF3][\x80-\xBF]{3}                  # planes 4-15
    |\xF4[\x80-\x8F][\x80-\xBF]{2}    # plane 16
    )+%xs', $string);
}

这篇关于获取文件编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆