file_get_contents不能使用utf8 [英] file_get_contents not working with utf8

查看:122
本文介绍了file_get_contents不能使用utf8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从网站获取泰语字符。我试过:

I'm trying to get Thai characters from a website. I've tried:

$rawChapter = file_get_contents("URL");
$rawChapter = mb_convert_encoding($rawChapter, 'UTF-8', mb_detect_encoding($rawChapter, 'UTF-8, ISO-8859-1', true));

当我这样做然后字符回来,如:

When I do this then the characters come back like:

áÅѺ˹éÒáá¾ÃФÑ,α,ÃìÀÒÉÒ䷺Ѻ

¡ÅѺ˹éÒáá¾ÃФÑÁÀÕÃìÀÒÉÒä·Â©ºÑº

但如果我把页面的源,我试图加载并保存到我的自己的.htm文件在我的localhost作为utf8文件,然后它正确加载泰国字符。只有当我尝试从网站直接加载它打破了。

But if I take the source of the page I'm trying to load and save that into my own .htm file on my localhost as a utf8 file then it loads the Thai characters correctly. Only when I try to load it from the site directly it breaks.

如何解决这个问题?可能是什么问题?

How can I fix this? What could be the problem?

我也试过添加这个上下文:

I've also tried adding this context:

$context = stream_context_create(array(
            'http' => array(
                'method' => 'POST',
                'header' => implode("\r\n", array(
                    'Content-type: application/x-www-form-urlencoded',
                    'Accept-Language: en-us,en;q=0.5',
                    'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7'
                ))
            )
        ));

我试过添加它,我试过添加它与mb_convert_encoding 。

I've tried adding it alone, I've tried adding it with the mb_convert_encoding()... I feel like I've tried all combinations of this stuff and no success.

推荐答案

更改您的 Accept-Charset UTF-8 ,因为ISO-8859-1不支持泰文字符。如果你在Windows机器上运行你的PHP脚本,你也可以使用 windows-874 字符集,你也可以尝试添加这个头:

Change your Accept-Charset to UTF-8 because ISO-8859-1 does not support Thai characters. If you are running your PHP script on a windows machine, you may also use the windows-874 charset, and you may also try adding this header :

Content-Language: th

** UPDATE *

很奇怪,但这对我有用。

Very strange, but this works for me.

$opts = array(
  'http'=>array(
    'method'=>"GET",
    'header'=> implode("\r\n", array(
                   'Content-type: text/plain; charset=TIS-620'
                   //'Content-type: text/plain; charset=windows-874'  // same thing
                ))
  )
);

$context = stream_context_create($opts);

//$fp = fopen('http://thaipope.org/webbible/01_002.htm', 'rb', false, $context);
//$contents = stream_get_contents($fp);
//fclose($fp);
$contents = file_get_contents("http://thaipope.org/webbible/01_002.htm",false, $context);

header('Content-type: text/html; charset=TIS-620');
//header('Content-type: text/html; charset=windows-874');  // same thing

echo $contents;

显然,我错了这个关于UTF-8。有关详情,请参见此处细节。虽然您仍然可以有一个UTF-8输出:

Apparently, I was wrong for this one about UTF-8. See here for more details. Though you can still have an UTF-8 output :

$in_charset = 'TIS-620';   // == 'windows-874'
$out_charset = 'utf-8';

$opts = array(
  'http'=>array(
    'method'=>"GET",
    'header'=> implode("\r\n", array(
                   'Content-type: text/plain; charset=' . $in_charset
                ))
  )
);

$context = stream_context_create($opts);

$contents = file_get_contents("http://thaipope.org/webbible/01_002.htm",false, $context);
if ($in_charset != $out_charset) {
    $contents = iconv($in_charset, $out_charset, $contents);
}

header('Content-type: text/html; charset=' . $out_charset);

echo $contents;   // output in UTF-8

这篇关于file_get_contents不能使用utf8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆