如何使用file_get_contents以正确的utf-8编码获取文件内容? [英] How to get file content with a proper utf-8 encoding using file_get_contents?

查看：1053 发布时间：2020/7/13 6:36:50 php utf-8 file-get-contents

本文介绍了如何使用file_get_contents以正确的utf-8编码获取文件内容?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要以utf-8编码获取远程文件的内容. utf-8中的文件.当我在屏幕上显示该文件时，它具有正确的编码:

http://www.parfumeriafox.sk/source_file.html

(请注意ň和č字符，例如，这些都可以).

当我运行此代码时:

<?php

$url = 'http://parfumeriafox.sk/source_file.html';

$csv = file_get_contents_utf8($url);
header('Content-type: text/html; charset=utf-8');
print $csv;

function file_get_contents_utf8($fn) {
  $content = file_get_contents($fn);
  return mb_convert_encoding($content, 'utf-8');
}

(您可以使用 http://www.parfumeriafox.sk/encoding.php)，那么我会得到问号而不是那些特殊字符.我对此进行了大量研究，尝试了标准的file_read_contents函数，甚至使用了一些流bla php上下文函数，还尝试了fopen和fread函数以二进制级别读取该文件，似乎没有任何作用.我已经尝试过，并且不发送标题.这应该很简单，我在做什么错呢?当我使用某种编码检测功能检查该字符串时，它将返回UTF-8.

解决方案

这个怎么样???

为此，我使用了header('Content-Type: text/plain;; charset=Windows-1250');

佛手柑，citrón，trava，rebarbora，bazalka；levanduľa，škorica，hruška；céderovédrevo，vanilka，pižmo，amberlyn

此代码对我有用

<?php
header('Content-Type: text/plain;charset=Windows-1250');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>

问题不在于file_get_contents()

我将$ data保存到文件中，并且字符正确，但文本编辑器仍未正确编码.参见下图.

$data = file_get_contents('http://www.parfumeriafox.sk/source_file.html');
file_put_contents('doc.txt',$data);

更新

似乎是一个有问题的字符，如下所示. 在下面的HTML图像上也可以看到它.渲染为¾

其十六进制值为xBE(十进制190)

我尝试了这两个字符集.都不起作用.

header('Content-Type: text/plain; charset=ISO 8859-1');
header('Content-Type: text/plain; charset=ISO 8859-2');

更新结束

它通过添加不带charset = utf-8的标头来工作.

这两个标头起作用

header('Content-Type: text/plain');
header('Content-Type: text/html');

这两个标头不起作用

header('Content-Type: text/plain; charset=utf-8');
header('Content-Type: text/html; charset=utf-8');

此代码已经过测试并显示了所有字符.

<?php
header('Content-Type: text/plain');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>

<?php
header('Content-Type: text/html');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>

这些是带有十六进制值的一些有问题的字符.
这是在记事本中使用UTF-8编码查看的保存文件.

对照这些字符集检查十六进制值.

从上表中，我看到字符集是Latin2.

我转到 Wikipedia Windows代码页，发现Latin2是Windows-1250

佛手柑，citrón，trava，rebarbora，bazalka;levanduľa，škorica，hruška;céderovédrevo，vanilka，pižmo，amberlyn

I need to get content of the remote file in utf-8 encoding. The file in in utf-8. When I display that file on screen, it has proper encoding:

http://www.parfumeriafox.sk/source_file.html

(notice the ň and č characters, for example, these are alright).

When I run this code:

<?php

$url = 'http://parfumeriafox.sk/source_file.html';

$csv = file_get_contents_utf8($url);
header('Content-type: text/html; charset=utf-8');
print $csv;

function file_get_contents_utf8($fn) {
  $content = file_get_contents($fn);
  return mb_convert_encoding($content, 'utf-8');
}

(you can run it using http://www.parfumeriafox.sk/encoding.php), then I get question marks instead of those special characters. I have done huge research on this, I have tried standard file_read_contents function, I have even used some stream bla bla php context function, I also tried fopen and fread function to read that file on binary level, nothing seems to work. I have tried that with and without sending header. This is supposed to be perfectly siple, what am I doing wrong? When I check that string with some encoding detect function, it returns UTF-8.

解决方案

How about this one????

For this one I used header('Content-Type: text/plain;; charset=Windows-1250');

bergamot, citrón, tráva, rebarbora, bazalka;levanduľa, škorica, hruška;céderové drevo, vanilka, pižmo, amberlyn

This code works for me

<?php
header('Content-Type: text/plain;charset=Windows-1250');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>

The problem is not with file_get_contents()

I save the $data to a file and the characters were correct but still not encoded correctly by my text editor. See image below.

$data = file_get_contents('http://www.parfumeriafox.sk/source_file.html');
file_put_contents('doc.txt',$data);

UPDATE

Seems to be one problematic character as shown here. It also is seen on the HTML image below. Renders as ¾

Its Hex value is xBE (190 decimal)

I tried these two character sets. Neither worked.

header('Content-Type: text/plain; charset=ISO 8859-1');
header('Content-Type: text/plain; charset=ISO 8859-2');

END OF UPDATE

It works by adding a header WITHOUT charset=utf-8.

These two headers work

header('Content-Type: text/plain');
header('Content-Type: text/html');

These two headers do NOT work

header('Content-Type: text/plain; charset=utf-8');
header('Content-Type: text/html; charset=utf-8');

This code is tested and displayed all characters.

<?php
header('Content-Type: text/plain');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>

<?php
header('Content-Type: text/html');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>

These are some of the problematic characters with their Hex values.
This is the saved file viewed in Notepad++ with UTF-8 Encoding.

Check the Hex values against these character sets.

From the above table I saw the character set was Latin2.

I went to Wikipedia Windows code page and found that Latin2 is Windows-1250

bergamot, citrón, tráva, rebarbora, bazalka;levanduľa, škorica, hruška;céderové drevo, vanilka, pižmo, amberlyn

这篇关于如何使用file_get_contents以正确的utf-8编码获取文件内容?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用file_get_contents以正确的utf-8编码获取文件内容? [英] How to get file content with a proper utf-8 encoding using file_get_contents?

问题描述

更新

UPDATE

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

如何使用file_get_contents以正确的utf-8编码获取文件内容? [英] How to get file content with a proper utf-8 encoding using file_get_contents?

问题描述

更新

UPDATE

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭