当在浏览器中找到该页面时,cURL返回404 [英] cURL returns 404 while the page is found in browser

查看:512
本文介绍了当在浏览器中找到该页面时,cURL返回404的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已经有类似的问题在stackoverflow,但他们的解决方案都没有为我工作。我试图抓住一个页面LoveIt.com与cURL,但它返回我404错误,而URL在浏览器中正常工作:

there is already similar questions on stackoverflow, but none of their solutions have been working for me. I'm trying to grab a page on LoveIt.com with cURL, but it returns me a 404 error, while the url works fine in the browser :

        $url = 'http://loveit.com/loves/P0D1jlFaIOzzZfZqj_bY3KV';

        $curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, $url);
        curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
        curl_setopt ($curl, CURLOPT_HEADER, false);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($curl, CURLOPT_REFERER,'http://loveit.com/');

这是我收到的标题:


阵列([URL] => http://loveit.com/loves/P0D1jlFaIOzzZfZqj_bY3KV [content_type] => text / html; charset = utf-8 [http_code] => 404 [header_size] => 667 [request_size] => 172 [filetime] => -1 [ssl_verify_result] => 0 TOTAL_TIME] => 0.320466 [namelookup_time] => 0.000326 [CONNECT_TIME] => 0.119046 [pretransfer_time] => 0.119089 [size_upload] => 0 size_download] => 499 speed_download] => 1557 [speed_upload] => 0 download_content_length] => 499 upload_content_length] => 0 starttransfer_time] => 0.320438 [redirect_time] => 0 certinfo] =>阵列()[primary_ip] => --- [primary_port] => 80 local_ip] => --- [local_port] => 53837 [redirect_url] =>)

Array ( [url] => http://loveit.com/loves/P0D1jlFaIOzzZfZqj_bY3KV [content_type] => text/html; charset=utf-8 [http_code] => 404 [header_size] => 667 [request_size] => 172 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 0.320466 [namelookup_time] => 0.000326 [connect_time] => 0.119046 [pretransfer_time] => 0.119089 [size_upload] => 0 [size_download] => 499 [speed_download] => 1557 [speed_upload] => 0 [download_content_length] => 499 [upload_content_length] => 0 [starttransfer_time] => 0.320438 [redirect_time] => 0 [certinfo] => Array ( ) [primary_ip] => --- [primary_port] => 80 [local_ip] => --- [local_port] => 53837 [redirect_url] => )

我读到一些网站有这种保护的脚本;我没有测试一些解决方案,但没有为我工作(CURLOPT_USERAGENT,CURLOPT_REFERER ...)

I read that some website had protections against this kind of scripts; and I did test some solutions proposed, but none worked for me (CURLOPT_USERAGENT,CURLOPT_REFERER...)

这里发生了什么的想法?

Any ideas of what's happening here ?

我想备份我的LoveIt帐户,这就是为什么我做这个(没有出口功能,没有LoveIt.com的回复关于网站的健康) / p>

I would like to backup my LoveIt account, that's why i'm making this (no exports functions and no replies from LoveIt.com about the health of the website)

推荐答案

我快速选中了LiveHeaders启用的页面,我注意到了一组cookie设置。我怀疑,因为它不是正常的网址,你需要手动这些cookie,而重定向否则你结束被踢出404.使用 CURLOPT_COOKIEJAR 与您的cURL实例开始。请参阅: http://php.net/manual/pl/function.curl-setopt .php

I quickly checked the said page with LiveHeaders enabled and I noticed bunch of cookies set. I suspect that, since it's not "normal" url, you need to hand those cookies while being redirected otherwise you end being kicked out with 404. Use CURLOPT_COOKIEJAR with your cURL instance at start. See: http://php.net/manual/pl/function.curl-setopt.php

这篇关于当在浏览器中找到该页面时,cURL返回404的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆