阅读Google资讯提供/快讯时损毁了UTF-8编码 [英] Corrupted UTF-8 encoding when reading Google feed / alerts

查看：154 发布时间：2016/11/19 16:23:06 php utf-8 character-encoding google-api google-alerts

本文介绍了阅读Google资讯提供/快讯时损毁了UTF-8编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

每当我尝试使用以下方式通过 PHP 读取Google快讯：

Whenever I try to read a Google alert via PHP using something like:

$feed = file_get_contents("http://www.google.com/alerts/feeds/01445174399729103044/950192755411504138");

无论我是否保存 $ feed 到文件或 echo 将结果输出到输出，所有 utf-8 unicode字符 > 即 具有变音符号的字符）。我尝试了 - 没有成功 - 以下各种组合：

Regardless of whether I save the $feed to a file or echo the result to the output, all utf-8 unicode characters ( i.e. those with diacritics) are represented by white space. I have tried - without success - various combinations of:

utf8_encode

utf8_decode

iconv

mb_convert_encoding

utf8_encode

utf8_decode

iconv

mb_convert_encoding

来自流，但我失去了，因为如果我在浏览器中尝试这 URI 然后一切都很好。

I think the wrong characters have come from the stream, but I'm lost because if I try this URI in a browser then everything is fine. Can anyone shed some light on the issue?

推荐答案

对不起，你是绝对正确的 - 有 / em> 不愉快的事情！虽然这不是你第一次怀疑...作为参考，考虑到：

Sorry, you are absolutely correct - there is something untoward happening! Though it is not what you would first suspect... For reference, given that:

echo mb_detect_encoding($feed); // prints: ASCII

unicode数据在发送之前丢失 >由远程服务器 - 似乎Google正在查看请求头中的 user-agent 字符串 - 不存在使用 file_get_contents 默认情况下没有流上下文。

The unicode data is lost before it is even sent by the remote server - it appears that Google is looking at the user-agent string in the request header - which is non-existent using file_get_contents by default without a stream-context.

由于无法识别发出请求的客户端，因此默认为 ASCII 编码。这可能是在某种类型的灾难性鸡巴的情况下必要的回退。 ^{[需要引用...]}

Because it cannot identify the client making the request it defaults to and forces ASCII encoding. This is presumably a necessary fallback in the event of some kind of cataclysmic cock-up. ^{[citation needed...]}

这不足以命名您的应用程序，但是，您需要包括一个已知的供应商。我不确定这完全程度，但我相信大多数人包括Mozilla [版本] 来解决这个问题，例如：

It's not simply enough to name your application however, you need to include a known vendor. I 'm unsure of the full extent of this but I believe most folks include "Mozilla [version]" to work around the issue, for example:

$url = 'http://www.google.com/...'; $feed = file_get_contents($url, false, stream_context_create([ 'http' => [ 'method' => 'GET', 'header' => 'Accept-Charset: UTF-8' ."\r\n" .'User-Agent: (Mozilla/5.0 compatible) MyFeedReader/1.0' ] ])); file_put_contents('test.txt', $feed); // should now work as expected

这篇关于阅读Google资讯提供/快讯时损毁了UTF-8编码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

阅读Google资讯提供/快讯时损毁了UTF-8编码 [英] Corrupted UTF-8 encoding when reading Google feed / alerts

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

阅读Google资讯提供/快讯时损毁了UTF-8编码 [英] Corrupted UTF-8 encoding when reading Google feed / alerts

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭