如何解码“Content-Encoding:gzip,gzip”使用curl? [英] How to decode "Content-Encoding: gzip, gzip" using curl?
问题描述
我试图使用CURL通过使用以下代码解码网页www.dealstan.com:
$ ch = curl_init();
curl_setopt($ ch,CURLOPT_URL,$ url); //定义目标网站
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,TRUE); //返回页面的字符串
curl_setopt($ cr,CURLOPT_USERAGENT,'Mozilla / 5.0(Windows; U; Windows NT 6.1; en-US)AppleWebKit / 533.2(KHTML,像Gecko)Chrome / 5.0.342.3 Safari / 533.2');
curl_setopt($ ch,CURLOPT_ENCODING,gzip);
curl_setopt($ ch,CURLOPT_TIMEOUT,5);
curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,TRUE); // Follow redirects
$ return = curl_exec($ ch);
$ info = curl_getinfo($ ch);
curl_close($ ch);
$ html = str_get_html($ return);
echo $ html;
但它显示一些垃圾字符
} {w 6 9 ?X n ..........大约100行。
我试图在hurl.it中找到响应,发现一个有趣的点,它看起来像html是编码两次(只是一个猜测,基于响应)
找到以下回应:
GET http://www.dealstan.com/
200 OK 18.87 kB 490 ms
查看请求查看响应
HEADERS
Cache-Control:max-age = 0,无高速缓存
Cf-Ray:18be7f54f8d80f1b-IAD
连接:keep-alive
Content-Encoding:gzip,gzip ==============>?怀疑这个,有人知道吗?
Content-Type:text / html; charset = UTF-8
日期:Wed,2014年11月19日18:33:39 GMT
cloudflare-nginx
Set-Cookie:__cfduid = d1cff1e3134c5f32d2bddc10207bae0681416422019; expires = Thu,19-Nov-15 18:33:39 GMT; path = /; domain = .dealstan.com; HttpOnly
传输编码:chunked
变化:接受编码
X-Page-Speed:1.8.31.2-3973
X-Pingback:http://www.dealstan.com/xmlrpc.php
X-Powered-By:HHVM / 3.2.0
体查看原始
H4sIAAAAAAAAA5V8Q5AoWrBk27Ztu / u2bdu2bdu2bdu2bds2583f / pjFVOQqozZnUxkVJ7PwoyAA / qeAb3y83LbYHs / 3Hv79wKm / 2N5cZyJVtCWu1xyteyzLNqYuWbdtHeELCyIZRRp / 1Fe7es3 + wL3Vfb
任何人都知道如何解码响应与头Content-Encoding:gzip,gzip,
该网站正在加载firefox,chrome等,我无法使用CURL解码。
请帮我解码这个问题吗?
$ url =http://www.dealstan.com
$ ch = curl_init();
curl_setopt($ ch,CURLOPT_URL,$ url); //定义目标网站
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,TRUE); //返回页面的字符串
curl_setopt($ cr,CURLOPT_USERAGENT,'Mozilla / 5.0(Windows; U; Windows NT 6.1; en-US)AppleWebKit / 533.2(KHTML,像Gecko)Chrome / 5.0.342.3 Safari / 533.2');
curl_setopt($ ch,CURLOPT_ENCODING,gzip);
curl_setopt($ ch,CURLOPT_TIMEOUT,5);
curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,TRUE); // Follow redirects
$ return = curl_exec($ ch);
$ info = curl_getinfo($ ch);
curl_close($ ch);
$ return = gzinflate(substr($ return,10));
print_r($ return);
I am trying to decode the webpage www.dealstan.com using CURL by using the below code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_TIMEOUT,5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
$return = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
$html = str_get_html("$return");
echo $html;
but, it is showing some junk charater
"��}{w�6����9�X�n���.........." for about 100 lines.
I tried to find the response in hurl.it, found one interesting point, it looks like the html is encoded twice(just a guess, based on the response)
Find the response below: GET http://www.dealstan.com/
200 OK 18.87 kB 490 ms View Request View Response HEADERS
Cache-Control: max-age=0, no-cache
Cf-Ray: 18be7f54f8d80f1b-IAD
Connection: keep-alive
Content-Encoding: gzip, gzip ==============>? suspecting this, anyone know about it?
Content-Type: text/html; charset=UTF-8
Date: Wed, 19 Nov 2014 18:33:39 GMT
Server: cloudflare-nginx
Set-Cookie: __cfduid=d1cff1e3134c5f32d2bddc10207bae0681416422019; expires=Thu, 19-Nov-15 18:33:39 GMT; path=/; domain=.dealstan.com; HttpOnly
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Page-Speed: 1.8.31.2-3973
X-Pingback: http://www.dealstan.com/xmlrpc.php
X-Powered-By: HHVM/3.2.0 BODY view raw
H4sIAAAAAAAAA5V8Q5AoWrBk27Ztu/u2bdu2bdu2bdu2bds2583f/pjFVOQqozZnUxkVJ7PwoyAA/qeAb3y83LbYHs/3Hv79wKm/2N5cZyJVtCWu1xyteyzLNqYuWbdtHeELCyIZRRp/1Fe7es3+wL3Vfb
anyone knows how to decode the response with the header "Content-Encoding: gzip, gzip",
That site is loading properly in firefox, chrome etc. but, i am not able to decode using CURL.
Please help me to decode this issue?
You can decode it by trimming off the headers and using gzinflate.
$url = "http://www.dealstan.com"
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_TIMEOUT,5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
$return = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
$return = gzinflate(substr($return, 10));
print_r($return);
这篇关于如何解码“Content-Encoding:gzip,gzip”使用curl?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!