如何使用file_get_contents以正确的utf-8编码获取文件内容? [英] How to get file content with a proper utf-8 encoding using file_get_contents?
问题描述
我需要以utf-8编码获取远程文件的内容. utf-8中的文件.当我在屏幕上显示该文件时,它具有正确的编码:
http://www.parfumeriafox.sk/source_file.html
(请注意ň
和č
字符,例如,这些都可以).
当我运行此代码时:
<?php
$url = 'http://parfumeriafox.sk/source_file.html';
$csv = file_get_contents_utf8($url);
header('Content-type: text/html; charset=utf-8');
print $csv;
function file_get_contents_utf8($fn) {
$content = file_get_contents($fn);
return mb_convert_encoding($content, 'utf-8');
}
(您可以使用 http://www.parfumeriafox.sk/encoding.php),那么我会得到问号而不是那些特殊字符.我对此进行了大量研究,尝试了标准的file_read_contents
函数,甚至使用了一些流bla php上下文函数,还尝试了fopen和fread函数以二进制级别读取该文件,似乎没有任何作用.我已经尝试过,并且不发送标题.这应该很简单,我在做什么错呢?当我使用某种编码检测功能检查该字符串时,它将返回UTF-8
.
这个怎么样???
为此,我使用了header('Content-Type: text/plain;; charset=Windows-1250');
佛手柑,citrón,trava,rebarbora,bazalka;levanduľa,škorica,hruška;céderovédrevo,vanilka,pižmo,amberlyn
此代码对我有用
<?php
header('Content-Type: text/plain;charset=Windows-1250');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>
问题不在于file_get_contents()
我将$ data保存到文件中,并且字符正确,但文本编辑器仍未正确编码.参见下图.
$data = file_get_contents('http://www.parfumeriafox.sk/source_file.html');
file_put_contents('doc.txt',$data);
更新
似乎是一个有问题的字符,如下所示. 在下面的HTML图像上也可以看到它.渲染为¾
其十六进制值为xBE(十进制190)
我尝试了这两个字符集.都不起作用.
header('Content-Type: text/plain; charset=ISO 8859-1');
header('Content-Type: text/plain; charset=ISO 8859-2');
更新结束
它通过添加不带charset = utf-8的标头来工作.
这两个标头起作用
header('Content-Type: text/plain');
header('Content-Type: text/html');
这两个标头不起作用
header('Content-Type: text/plain; charset=utf-8');
header('Content-Type: text/html; charset=utf-8');
此代码已经过测试并显示了所有字符.
<?php
header('Content-Type: text/plain');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>
<?php
header('Content-Type: text/html');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>
这些是带有十六进制值的一些有问题的字符.
这是在记事本中使用UTF-8编码查看的保存文件.
对照这些字符集检查十六进制值.
从上表中,我看到字符集是Latin2.
我转到 Wikipedia Windows代码页,发现Latin2是Windows-1250 >
佛手柑,citrón,trava,rebarbora,bazalka;levanduľa,škorica,hruška;céderovédrevo,vanilka,pižmo,amberlyn
I need to get content of the remote file in utf-8 encoding. The file in in utf-8. When I display that file on screen, it has proper encoding:
http://www.parfumeriafox.sk/source_file.html
(notice the ň
and č
characters, for example, these are alright).
When I run this code:
<?php
$url = 'http://parfumeriafox.sk/source_file.html';
$csv = file_get_contents_utf8($url);
header('Content-type: text/html; charset=utf-8');
print $csv;
function file_get_contents_utf8($fn) {
$content = file_get_contents($fn);
return mb_convert_encoding($content, 'utf-8');
}
(you can run it using http://www.parfumeriafox.sk/encoding.php), then I get question marks instead of those special characters. I have done huge research on this, I have tried standard file_read_contents
function, I have even used some stream bla bla php context function, I also tried fopen and fread function to read that file on binary level, nothing seems to work. I have tried that with and without sending header. This is supposed to be perfectly siple, what am I doing wrong? When I check that string with some encoding detect function, it returns UTF-8
.
How about this one????
For this one I used header('Content-Type: text/plain;; charset=Windows-1250');
bergamot, citrón, tráva, rebarbora, bazalka;levanduľa, škorica, hruška;céderové drevo, vanilka, pižmo, amberlyn
This code works for me
<?php
header('Content-Type: text/plain;charset=Windows-1250');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>
The problem is not with file_get_contents()
I save the $data to a file and the characters were correct but still not encoded correctly by my text editor. See image below.
$data = file_get_contents('http://www.parfumeriafox.sk/source_file.html');
file_put_contents('doc.txt',$data);
UPDATE
Seems to be one problematic character as shown here. It also is seen on the HTML image below. Renders as ¾
Its Hex value is xBE (190 decimal)
I tried these two character sets. Neither worked.
header('Content-Type: text/plain; charset=ISO 8859-1');
header('Content-Type: text/plain; charset=ISO 8859-2');
END OF UPDATE
It works by adding a header WITHOUT charset=utf-8.
These two headers work
header('Content-Type: text/plain');
header('Content-Type: text/html');
These two headers do NOT work
header('Content-Type: text/plain; charset=utf-8');
header('Content-Type: text/html; charset=utf-8');
This code is tested and displayed all characters.
<?php
header('Content-Type: text/plain');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>
<?php
header('Content-Type: text/html');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>
These are some of the problematic characters with their Hex values.
This is the saved file viewed in Notepad++ with UTF-8 Encoding.
Check the Hex values against these character sets.
From the above table I saw the character set was Latin2.
I went to Wikipedia Windows code page and found that Latin2 is Windows-1250
bergamot, citrón, tráva, rebarbora, bazalka;levanduľa, škorica, hruška;céderové drevo, vanilka, pižmo, amberlyn
这篇关于如何使用file_get_contents以正确的utf-8编码获取文件内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!