如何使用file_get_contents以正确的utf-8编码获取文件内容? [英] How to get file content with a proper utf-8 encoding using file_get_contents?

查看:1053
本文介绍了如何使用file_get_contents以正确的utf-8编码获取文件内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要以utf-8编码获取远程文件的内容. utf-8中的文件.当我在屏幕上显示该文件时,它具有正确的编码:

http://www.parfumeriafox.sk/source_file.html

(请注意ňč字符,例如,这些都可以).

当我运行此代码时:

<?php

$url = 'http://parfumeriafox.sk/source_file.html';

$csv = file_get_contents_utf8($url);
header('Content-type: text/html; charset=utf-8');
print $csv;

function file_get_contents_utf8($fn) {
  $content = file_get_contents($fn);
  return mb_convert_encoding($content, 'utf-8');
}

(您可以使用 http://www.parfumeriafox.sk/encoding.php),那么我会得到问号而不是那些特殊字符.我对此进行了大量研究,尝试了标准的file_read_contents函数,甚至使用了一些流bla php上下文函数,还尝试了fopen和fread函数以二进制级别读取该文件,似乎没有任何作用.我已经尝试过,并且不发送标题.这应该很简单,我在做什么错呢?当我使用某种编码检测功能检查该字符串时,它将返回UTF-8.

解决方案

这个怎么样???

为此,我使用了header('Content-Type: text/plain;; charset=Windows-1250');

佛手柑,citrón,trava,rebarbora,bazalka;levanduľa,škorica,hruška;céderovédrevo,vanilka,pižmo,amberlyn



此代码对我有用

<?php
header('Content-Type: text/plain;charset=Windows-1250');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>



问题不在于file_get_contents()

我将$ data保存到文件中,并且字符正确,但文本编辑器仍未正确编码.参见下图.

$data = file_get_contents('http://www.parfumeriafox.sk/source_file.html');
file_put_contents('doc.txt',$data);

更新

似乎是一个有问题的字符,如下所示. 在下面的HTML图像上也可以看到它.渲染为¾

其十六进制值为xBE(十进制190)

我尝试了这两个字符集.都不起作用.

header('Content-Type: text/plain; charset=ISO 8859-1');
header('Content-Type: text/plain; charset=ISO 8859-2');




更新结束


它通过添加不带charset = utf-8的标头来工作.

这两个标头起作用

header('Content-Type: text/plain');
header('Content-Type: text/html');

这两个标头不起作用

header('Content-Type: text/plain; charset=utf-8');
header('Content-Type: text/html; charset=utf-8');

此代码已经过测试并显示了所有字符.

<?php
header('Content-Type: text/plain');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>

<?php
header('Content-Type: text/html');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>



这些是带有十六进制值的一些有问题的字符.
这是在记事本中使用UTF-8编码查看的保存文件.

对照这些字符集检查十六进制值.

从上表中,我看到字符集是Latin2.

我转到 Wikipedia Windows代码页,发现Latin2是Windows-1250


佛手柑,citrón,trava,rebarbora,bazalka;levanduľa,škorica,hruška;céderovédrevo,vanilka,pižmo,amberlyn

I need to get content of the remote file in utf-8 encoding. The file in in utf-8. When I display that file on screen, it has proper encoding:

http://www.parfumeriafox.sk/source_file.html

(notice the ň and č characters, for example, these are alright).

When I run this code:

<?php

$url = 'http://parfumeriafox.sk/source_file.html';

$csv = file_get_contents_utf8($url);
header('Content-type: text/html; charset=utf-8');
print $csv;

function file_get_contents_utf8($fn) {
  $content = file_get_contents($fn);
  return mb_convert_encoding($content, 'utf-8');
}

(you can run it using http://www.parfumeriafox.sk/encoding.php), then I get question marks instead of those special characters. I have done huge research on this, I have tried standard file_read_contents function, I have even used some stream bla bla php context function, I also tried fopen and fread function to read that file on binary level, nothing seems to work. I have tried that with and without sending header. This is supposed to be perfectly siple, what am I doing wrong? When I check that string with some encoding detect function, it returns UTF-8.

解决方案

How about this one????

For this one I used header('Content-Type: text/plain;; charset=Windows-1250');

bergamot, citrón, tráva, rebarbora, bazalka;levanduľa, škorica, hruška;céderové drevo, vanilka, pižmo, amberlyn



This code works for me

<?php
header('Content-Type: text/plain;charset=Windows-1250');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>



The problem is not with file_get_contents()

I save the $data to a file and the characters were correct but still not encoded correctly by my text editor. See image below.

$data = file_get_contents('http://www.parfumeriafox.sk/source_file.html');
file_put_contents('doc.txt',$data);

UPDATE

Seems to be one problematic character as shown here. It also is seen on the HTML image below. Renders as ¾

Its Hex value is xBE (190 decimal)

I tried these two character sets. Neither worked.

header('Content-Type: text/plain; charset=ISO 8859-1');
header('Content-Type: text/plain; charset=ISO 8859-2');




END OF UPDATE


It works by adding a header WITHOUT charset=utf-8.

These two headers work

header('Content-Type: text/plain');
header('Content-Type: text/html');

These two headers do NOT work

header('Content-Type: text/plain; charset=utf-8');
header('Content-Type: text/html; charset=utf-8');

This code is tested and displayed all characters.

<?php
header('Content-Type: text/plain');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>

<?php
header('Content-Type: text/html');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>



These are some of the problematic characters with their Hex values.
This is the saved file viewed in Notepad++ with UTF-8 Encoding.

Check the Hex values against these character sets.

From the above table I saw the character set was Latin2.

I went to Wikipedia Windows code page and found that Latin2 is Windows-1250


bergamot, citrón, tráva, rebarbora, bazalka;levanduľa, škorica, hruška;céderové drevo, vanilka, pižmo, amberlyn

这篇关于如何使用file_get_contents以正确的utf-8编码获取文件内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆