使用浏览器打开URL且URL有效时,file_get_contents返回404 [英] file_get_contents returns 404 when URL is opened with the Browser and URL is valid

查看:389
本文介绍了使用浏览器打开URL且URL有效时,file_get_contents返回404的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到以下错误消息:

警告: file_get_contents(

Warning: file_get_contents(https://www.readability.com/api/content/v1/parser?url=http://www.redmondpie.com/ps1-and-ps2-games-will-be-playable-on-playstation-4-very-soon/?utm_source=dlvr.it&utm_medium=twitter&token=MYAPIKEY) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 NOT FOUND in /home/DIR/htdocs/readability.php on line 23

通过一些回显,我得到了由函数解析的URL,它是正确有效的,我从浏览器中进行了请求,而且还可以.

With some Echoes I got the URL parsed by the function and it is fine and valid, I do the request from my Browser and it is OK.

问题是我收到file_get_contents的错误,我真的不明白为什么.

The thing is that I get the Error Above with file_get_contents and I really don't understand why.

该URL有效,免费托管服务未阻止该功能(因此我不需要Curl).

The URL is Valid and the Function is NOT Blocked by the Free Hosting Service (So I don't need Curl).

如果有人能在我的代码中发现错误,我将不胜感激! 谢谢...

If someone could spot the error in my Code, I would appreciate it! Thanks...

这是我的代码:

<?php

class jsonRes{
    public $url;
    public $author;
    public $url;
    public $image;
    public $excerpt;
}

function getReadable($url){
 $api_key='MYAPIKEY';
 if(isset($url) && !empty($url)){

    // I tried changing to http, no 'www' etc... -THE URL IS VALID/The browser opens it normally-

    $requesturl='https://www.readability.com/api/content/v1/parser?url=' . urlencode($url) . '&token=' . $api_key;
    $response = file_get_contents($requesturl);   // * here the code FAILS! *

    $g = json_decode($response);

    $article_link=$g->url;
    $article_author='';
    if($g->author != null){
       $article_author=$g->author;
    }

    $article_url=$g->url;
    $article_image=''; 
    if($g->lead_image_url != null){
        $article_image=$g->lead_image_url;
    }
    $article_excerpt=$g->excerpt;

    $toJSON=new jsonRes();
    $toJSON->url=$article_link;
    $toJSON->author=$article_author;
    $toJSON->url=$article_url;
    $toJSON->image=$article_image;
    $toJSON->excerpt->$article_excerpt;

    $retJSONf=json_encode($toJSON);
    return $retJSONf;
 }
}
?>

推荐答案

有时网站会阻止(从远程服务器)爬网程序进入其网页.

Sometimes a website will block crawlers(from remote servers) from getting to their pages.

他们为解决此问题所做的就是欺骗浏览器标头.就像假装是Mozilla Firefox,而不是假装的PHP网络抓取工具.

What they do to work around this is spoof a browsers headers. Like pretend to be Mozilla Firefox instead of the sneaky PHP web scraper they are.

这是一个使用cURL库执行此操作的函数.

This is a function which uses the cURL library to do just that.

function get_data($url) {

$userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13';

$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
if (!$html) {
    echo "<br />cURL error number:" .curl_errno($ch);
    echo "<br />cURL error:" . curl_error($ch);
    exit;
}
else{
    return $html;
}

//End of cURL function

}

然后会这样称呼它:

$response = get_data($requesturl);

与file_get_contents相比,Curl在获取远程内容和错误检查方面提供了更多的选项.如果您甚至想进一步自定义它,请在此处查看cURL选项的列表- cURL选项

Curl offers much more options in fetching of remote content and error checking than file_get_contents does. If you even want to customize it further, check out the list of cURL options here - Abridged list of cURL options

这篇关于使用浏览器打开URL且URL有效时,file_get_contents返回404的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆