检测编码并使一切UTF-8 [英] Detect encoding and make everything UTF-8

查看：150 发布时间：2016/11/19 12:27:23 php encoding utf-8 character-encoding

本文介绍了检测编码并使一切UTF-8的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在从各种RSS Feed中读出大量文本并将它们插入到我的数据库中。

当然，Feed中使用了几种不同的字符编码，例如UTF-8和ISO-8859-1。

不幸的是，文本的编码有时会出现问题。示例：

1）Fußball中的ß在我的数据库中应该是这样：Ÿ。

2）有时，Fußball中的ß在我的数据库中看起来像这样：ÃƒÂŸ 。

3）在其他情况下，ß保存为ß - 因此没有任何更改。

如何避免第2和第3种情况？

我可以让一切一样的编码，最好是UTF-8？当我必须使用utf8_encode（），当我必须使用utf8_decode（）（这是明确的效果是什么，但是我必须使用的功能？）和什么时候我什么也不做输入？

你能帮我，告诉我如何让一切都一样编码？也许与函数mb-detect-encoding（）？我可以为此写一个函数吗？所以我的问题是：
1）如何找出文本使用什么编码
2）如何将其转换为UTF-8 - 无论旧编码是什么

EDIT：
这样的函数是否可以工作？

  correct_encoding（$ text）{
 $ current_encoding = mb_detect_encoding（$ text，'auto'）; 
 $ text = iconv（$ current_encoding，'UTF-8'，$ text）; 
 return $ text; 
}

我测试了它，但它不工作。

$ p

解决方案

如果将utf8_encode（）应用到已经是UTF8的字符串，它将返回一个乱码的UTF8输出。 / p>

我做了一个函数来解决所有这些问题。它被称为Encoding :: toUTF8（）。

你不需要知道你的字符串的编码是什么。它可以是Latin1（iso 8859-1），Windows-1252或UTF8，或字符串可以混合使用。 Encoding :: toUTF8（）会将所有内容转换为UTF8。

我这样做是因为一个服务给我一个数据源，混乱UTF8和Latin1相同的字符串。

用法：

  require_once（'Encoding.php '）; 
使用\ForceUTF8\Encoding; //现在的命名空间。 
 
 $ utf8_string = Encoding :: toUTF8（$ utf8_or_latin1_or_mixed_string）; 
 
 $ latin1_string = Encoding :: toLatin1（$ utf8_or_latin1_or_mixed_string）;

下载：

https://github.com/neitanod/forceutf8

更新：

我包含另一个函数Encoding :: fixUFT8（），它将修复每个看起来乱码的UTF8字符串。

用法：

  require_once（'Encoding.php'）; 
使用\ForceUTF8\Encoding; //现在的命名空间。 
 
 $ utf8_string = Encoding :: fixUTF8（$ garbled_utf8_string）;

示例：

  echo Encoding :: fixUTF8（Fédicure Camerounaise de Football）; 
 echo Encoding :: fixUTF8（FÃ©©déréCamerounaise de Football）; 
 echo Encoding :: fixUTF8（Fêçé©déÃÂréCamerounaise de Football）; 
 echo Encoding :: fixUTF8（Fê©dérationCamerounaise de Football）;

将输出：

 FédérationCamerounaise de Football 
FédérationCamerounaise de Football 
FédérationCamerounaise de Football 
FédérationCamerounaise de Football

更新：我已经将函数（forceUTF8）转换为一个名为Encoding的类的静态函数系列。新的函数是Encoding :: toUTF8（）。

I'm reading out lots of texts from various RSS feeds and inserting them into my database.

Of course, there are several different character encodings used in the feeds, e.g. UTF-8 and ISO-8859-1.

Unfortunately, there are sometimes problems with the encodings of the texts. Example:

1) The "ß" in "Fußball" should look like this in my database: "ÂŸ". If it is a "ÂŸ", it is displayed correctly.

2) Sometimes, the "ß" in "Fußball" looks like this in my database: "ÃƒÂŸ". Then it is displayed wrongly, of course.

3) In other cases, the "ß" is saved as a "ß" - so without any change. Then it is also displayed wrongly.

What can I do to avoid the cases 2 and 3?

How can I make everything the same encoding, preferably UTF-8? When must I use utf8_encode(), when must I use utf8_decode() (it's clear what the effect is but when must I use the functions?) and when must I do nothing with the input?

Can you help me and tell me how to make everything the same encoding? Perhaps with the function mb-detect-encoding()? Can I write a function for this? So my problems are: 1) How to find out what encoding the text uses 2) How to convert it to UTF-8 - whatever the old encoding is

EDIT: Would a function like this work?

function correct_encoding($text) {
    $current_encoding = mb_detect_encoding($text, 'auto');
    $text = iconv($current_encoding, 'UTF-8', $text);
    return $text;
}

I've tested it but it doesn't work. What's wrong with it?

解决方案

If you apply utf8_encode() to an already UTF8 string it will return a garbled UTF8 output.

I made a function that addresses all this issues. It´s called Encoding::toUTF8().

You dont need to know what the encoding of your strings is. It can be Latin1 (iso 8859-1), Windows-1252 or UTF8, or the string can have a mix of them. Encoding::toUTF8() will convert everything to UTF8.

I did it because a service was giving me a feed of data all messed up, mixing UTF8 and Latin1 in the same string.

Usage:

require_once('Encoding.php'); 
use \ForceUTF8\Encoding;  // It's namespaced now.

$utf8_string = Encoding::toUTF8($utf8_or_latin1_or_mixed_string);

$latin1_string = Encoding::toLatin1($utf8_or_latin1_or_mixed_string);

Download:

https://github.com/neitanod/forceutf8

Update:

I've included another function, Encoding::fixUFT8(), which will fix every UTF8 string that looks garbled.

Usage:

require_once('Encoding.php'); 
use \ForceUTF8\Encoding;  // It's namespaced now.

$utf8_string = Encoding::fixUTF8($garbled_utf8_string);

Examples:

echo Encoding::fixUTF8("FÃ©dÃ©ration Camerounaise de Football");
echo Encoding::fixUTF8("FÃÂ©dÃÂ©ration Camerounaise de Football");
echo Encoding::fixUTF8("FÃÂÃÂ©dÃÂÃÂ©ration Camerounaise de Football");
echo Encoding::fixUTF8("FÃÂ©dération Camerounaise de Football");

will output:

Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football

Update: I've transformed the function (forceUTF8) into a family of static functions on a class called Encoding. The new function is Encoding::toUTF8().

这篇关于检测编码并使一切UTF-8的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

检测编码并使一切UTF-8 [英] Detect encoding and make everything UTF-8

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

检测编码并使一切UTF-8 [英] Detect encoding and make everything UTF-8

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭