Curl：从站点获取UTF-8数据，字符集不正确 [英] Curl: get UTF-8 data from site with incorrect charset

查看：3260 发布时间：2016/11/19 16:43:32 php curl character-encoding

本文介绍了Curl：从站点获取UTF-8数据，字符集不正确的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我会抓取标题中偶尔有UTF-8字符的网站，但不会将字符集指定为UTF-8（qq.com是一个示例）。当我使用浏览器的网站，我想要复制的数据（即标题）看起来正确（日语或中文。不太确定）。我可以复制标题，并将其粘贴到终端，它看起来完全相同。我甚至可以写它到DB，当我从数据库检索它仍然看起来一样，并正确。

I scrape some sites that occasionally have UTF-8 characters in the title, but that don't specify UTF-8 as the charset (qq.com is an example). When I use look at the website in my browser, the data I want to copy (i.e. the title) looks correct (Japanese or Chinese..not too sure). I can copy the title and paste it into the terminal and it looks exactly the same. I can even write it to the DB and when I retrieve from the DB it still looks the same, and correct.

但是，当我使用cURL，打印错误。我可以从命令行运行cURL或使用PHP ..当它打印到终端它显然不正确，它保持这种方式，当我存储到数据库（记住：终端可以显示这些字符）。我已尝试以下所有合格组合：

However, when I use cURL, the data that gets printed is wrong. I can run cURL from the command line or use PHP .. when it's printed to the terminal it's clearly incorrect, and it remains that way when I store it to the DB (remember: the terminal can display these characters properly). I've tried all eligible combinations of the following:

将 CURLOPT_BINARYTRANSFER 设置为 true

mb_convert_encoding（$ html，'UTF-8'） li>
utf8_encode（$ html）

utf8_decode / code>



Setting CURLOPT_BINARYTRANSFER to true
mb_convert_encoding($html, 'UTF-8')
utf8_encode($html)
utf8_decode($html)

这些都不显示预期的字符。这是非常令人沮丧的，因为我可以得到正确的字符这么容易只是通过访问网站，但cURL不能。我已经阅读了很多建议，如这一个：如何使用PHP中的CURL从不同CHARSET的网站获取网页标题？ 
None of these display the characters as expected.  This is very frustrating since I can get the right characters so easily just by visiting the site, but cURL can't.  I've read a lot of suggestions such as this one: How to get web-page-title with CURL in PHP from web-sites of different CHARSET?
解决方案一般似乎是将数据转换为UTF-8。说实话，我实际上不知道这是什么意思。不是上述函数将数据转换为UTF-8？为什么它不是UTF-8？ 
The solution in general seems to be "convert the data to UTF-8."  To be honest, I don't actually know what that means.  Don't the above functions convert the data to UTF-8?  Why isn't it already UTF-8?  What is it, and why does it display properly in some circumstances, but not for cURL?
推荐答案
有没有尝试过：
  $ html = iconv（gb2312，utf-8，$ html）;  
  gb2312 取自qq.com标题
the gb2312 was taken from the qq.com headers

                        这篇关于Curl：从站点获取UTF-8数据，字符集不正确的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Curl：从站点获取UTF-8数据，字符集不正确 [英] Curl: get UTF-8 data from site with incorrect charset

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

Curl：从站点获取UTF-8数据，字符集不正确 [英] Curl: get UTF-8 data from site with incorrect charset

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭