如何解码“Content-Encoding：gzip，gzip”使用curl？ [英] How to decode "Content-Encoding: gzip, gzip" using curl?

查看：2390 发布时间：2017/3/6 1:53:59 php html curl nginx gzip

本文介绍了如何解码“Content-Encoding：gzip，gzip”使用curl？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图使用CURL通过使用以下代码解码网页www.dealstan.com：

  $ ch = curl_init（）; 
 curl_setopt（$ ch，CURLOPT_URL，$ url）; //定义目标网站
 curl_setopt（$ ch，CURLOPT_RETURNTRANSFER，TRUE）; //返回页面的字符串
 curl_setopt（$ cr，CURLOPT_USERAGENT，'Mozilla / 5.0（Windows; U; Windows NT 6.1; en-US）AppleWebKit / 533.2（KHTML，像Gecko）Chrome / 5.0.342.3 Safari / 533.2'）; 
 curl_setopt（$ ch，CURLOPT_ENCODING，gzip）; 
 curl_setopt（$ ch，CURLOPT_TIMEOUT，5）; 
 curl_setopt（$ ch，CURLOPT_FOLLOWLOCATION，TRUE）; // Follow redirects 
 
 $ return = curl_exec（$ ch）; 
 $ info = curl_getinfo（$ ch）; 
 curl_close（$ ch）; 
 
 $ html = str_get_html（$ return）; 
 echo $ html;

但它显示一些垃圾字符

} {w 6 9 ？X n ..........大约100行。

我试图在hurl.it中找到响应，发现一个有趣的点，它看起来像html是编码两次（只是一个猜测，基于响应）

找到以下回应：
GET http://www.dealstan.com/

200 OK 18.87 kB 490 ms
查看请求查看响应
HEADERS

Cache-Control：max-age = 0，无高速缓存

Cf-Ray：18be7f54f8d80f1b-IAD

连接：keep-alive

Content-Encoding：gzip，gzip ==============>？怀疑这个，有人知道吗？

Content-Type：text / html; charset = UTF-8

日期：Wed，2014年11月19日18:33:39 GMT

cloudflare-nginx

Set-Cookie：__cfduid = d1cff1e3134c5f32d2bddc10207bae0681416422019; expires = Thu，19-Nov-15 18:33:39 GMT; path = /; domain = .dealstan.com; HttpOnly

传输编码：chunked

变化：接受编码

X-Page-Speed：1.8.31.2-3973

X-Pingback：http://www.dealstan.com/xmlrpc.php

X-Powered-By：HHVM / 3.2.0
体查看原始

H4sIAAAAAAAAA5V8Q5AoWrBk27Ztu / u2bdu2bdu2bdu2bds2583f / pjFVOQqozZnUxkVJ7PwoyAA / qeAb3y83LbYHs / 3Hv79wKm / 2N5cZyJVtCWu1xyteyzLNqYuWbdtHeELCyIZRRp / 1Fe7es3 + wL3Vfb

任何人都知道如何解码响应与头Content-Encoding：gzip，gzip，

该网站正在加载firefox，chrome等，我无法使用CURL解码。

请帮我解码这个问题吗？

解决方案

  $ url =http://www.dealstan.com
 
 $ ch = curl_init（）; 
 curl_setopt（$ ch，CURLOPT_URL，$ url）; //定义目标网站
 curl_setopt（$ ch，CURLOPT_RETURNTRANSFER，TRUE）; //返回页面的字符串
 curl_setopt（$ cr，CURLOPT_USERAGENT，'Mozilla / 5.0（Windows; U; Windows NT 6.1; en-US）AppleWebKit / 533.2（KHTML，像Gecko）Chrome / 5.0.342.3 Safari / 533.2'）; 
 curl_setopt（$ ch，CURLOPT_ENCODING，gzip）; 
 curl_setopt（$ ch，CURLOPT_TIMEOUT，5）; 
 curl_setopt（$ ch，CURLOPT_FOLLOWLOCATION，TRUE）; // Follow redirects 
 
 $ return = curl_exec（$ ch）; 
 $ info = curl_getinfo（$ ch）; 
 curl_close（$ ch）; 
 
 $ return = gzinflate（substr（$ return，10））; 
 print_r（$ return）;

I am trying to decode the webpage www.dealstan.com using CURL by using the below code:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");     
curl_setopt($ch, CURLOPT_TIMEOUT,5); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects

$return = curl_exec($ch); 
$info = curl_getinfo($ch); 
curl_close($ch); 

$html = str_get_html("$return");
echo $html;

but, it is showing some junk charater

"��}{w�6��9�X�n��.........." for about 100 lines.

I tried to find the response in hurl.it, found one interesting point, it looks like the html is encoded twice(just a guess, based on the response)

Find the response below: GET http://www.dealstan.com/

200 OK 18.87 kB 490 ms View Request View Response HEADERS

Cache-Control: max-age=0, no-cache

Cf-Ray: 18be7f54f8d80f1b-IAD

Connection: keep-alive

Content-Encoding: gzip, gzip ==============>? suspecting this, anyone know about it?

Content-Type: text/html; charset=UTF-8

Date: Wed, 19 Nov 2014 18:33:39 GMT

Server: cloudflare-nginx

Set-Cookie: __cfduid=d1cff1e3134c5f32d2bddc10207bae0681416422019; expires=Thu, 19-Nov-15 18:33:39 GMT; path=/; domain=.dealstan.com; HttpOnly

Transfer-Encoding: chunked

Vary: Accept-Encoding

X-Page-Speed: 1.8.31.2-3973

X-Pingback: http://www.dealstan.com/xmlrpc.php

X-Powered-By: HHVM/3.2.0 BODY view raw

H4sIAAAAAAAAA5V8Q5AoWrBk27Ztu/u2bdu2bdu2bdu2bds2583f/pjFVOQqozZnUxkVJ7PwoyAA/qeAb3y83LbYHs/3Hv79wKm/2N5cZyJVtCWu1xyteyzLNqYuWbdtHeELCyIZRRp/1Fe7es3+wL3Vfb

anyone knows how to decode the response with the header "Content-Encoding: gzip, gzip",

That site is loading properly in firefox, chrome etc. but, i am not able to decode using CURL.

Please help me to decode this issue?

解决方案

You can decode it by trimming off the headers and using gzinflate.

$url = "http://www.dealstan.com"

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");     
curl_setopt($ch, CURLOPT_TIMEOUT,5); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects

$return = curl_exec($ch); 
$info = curl_getinfo($ch); 
curl_close($ch); 

$return = gzinflate(substr($return, 10));
print_r($return);

这篇关于如何解码“Content-Encoding：gzip，gzip”使用curl？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何解码“Content-Encoding：gzip，gzip”使用curl？ [英] How to decode "Content-Encoding: gzip, gzip" using curl?

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

如何解码“Content-Encoding：gzip，gzip”使用curl？ [英] How to decode &quot;Content-Encoding: gzip, gzip&quot; using curl?

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

如何解码“Content-Encoding：gzip，gzip”使用curl？ [英] How to decode "Content-Encoding: gzip, gzip" using curl?

登录关闭