为什么cURL返回一个空字符串? [英] Why does cURL return an empty string?

查看:119
本文介绍了为什么cURL返回一个空字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,PHP的cURL返回一个空字符串与一些URL。我试图解析不同网页的OG元数据,它适用于我尝试的除NYTimes之外的所有网站。这是我的代码到目前为止。

I'm having a problem with PHP's cURL returning an empty string with some URL's. I'm trying to parse the OG metadata of different webpages and it works with all websites I've tried except for NYTimes. Here is my code so far.

print_r(get_og_metadata('http://somewebsite.com'));


public function get_data($url)
{
    $ch = curl_init();
    $timeout = 5;
    // the url to fetch
    curl_setopt($ch, CURLOPT_URL, $url);
    // return result as a string rather than direct output
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    // set max time of cURL execution
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}

public function get_og_metadata($url)
{
    libxml_use_internal_errors(TRUE);
    $data = $this->_get_data($url);
    $doc = new DOMDocument();
    $doc->loadHTML($data);

    $xpath = new DOMXPath($doc);
    $query = '//*/meta[starts-with(@property, \'og:\')]';

    $metadatas = $xpath->query($query);
    $result = array();
    foreach($metadatas as $metadata)
    {
        $property = $metadata->getAttribute('property');
        $content = $metadata->getAttribute('content');
        $result[$property] = $content;
    }

    return $result;
}


推荐答案

像纽约时代有保护免受这种行为。
这很可能是基于用户代理,你可以这样伪造:

My guess is that a site like the New York times has protection against such behavior. Most likely this is based on the user agent, which you can fake as so:

curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17');

这是最常见的代理btw。

This is the most common agent btw.

这篇关于为什么cURL返回一个空字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆