file_get_contents脚本可用于某些网站,但不适用于其他网站 [英] file_get_contents script works with some websites but not others

查看:107
本文介绍了file_get_contents脚本可用于某些网站,但不适用于其他网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个PHP脚本来解析HTML的特定标签.我一直在使用此代码块,改编自此

I'm looking to build a PHP script that parses HTML for particular tags. I've been using this code block, adapted from this tutorial:

<?php 
$data = file_get_contents('http://www.google.com');
$regex = '/<title>(.+?)</';
preg_match($regex,$data,$match);
var_dump($match); 
echo $match[1];
?>

该脚本适用于某些网站(例如google,如上),但是当我尝试使用其他网站(例如,freshdirect)时,却出现此错误:

The script works with some websites (like google, above), but when I try it with other websites (like, say, freshdirect), I get this error:

警告:file_get_contents( http://www.freshdirect.com )[function.file-get -contents]:无法打开流:HTTP请求失败!"

"Warning: file_get_contents(http://www.freshdirect.com) [function.file-get-contents]: failed to open stream: HTTP request failed!"

我看过很多很棒的

I've seen a bunch of great suggestions on StackOverflow, for example to enable extension=php_openssl.dll in php.ini. But (1) my version of php.ini didn't have extension=php_openssl.dll in it, and (2) when I added it to the extensions section and restarted the WAMP server, per this thread, still no success.

有人会介意将我指向正确的方向吗?非常感谢你!

Would someone mind pointing me in the right direction? Thank you very much!

推荐答案

$html = file_get_html('http://google.com/');
$title = $html->find('title')->innertext;

或者,如果您希望使用preg_match,则应该使用cURL而不是fgc ...

Or if you prefer with preg_match and you should be really using cURL instead of fgc...

function curl($url){

    $headers[]  = "User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
    $headers[]  = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
    $headers[]  = "Accept-Language:en-us,en;q=0.5";
    $headers[]  = "Accept-Encoding:gzip,deflate";
    $headers[]  = "Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $headers[]  = "Keep-Alive:115";
    $headers[]  = "Connection:keep-alive";
    $headers[]  = "Cache-Control:max-age=0";

    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($curl, CURLOPT_ENCODING, "gzip");
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
    $data = curl_exec($curl);
    curl_close($curl);
    return $data;

}


$data = curl('http://www.google.com');
$regex = '#<title>(.*?)</title>#mis';
preg_match($regex,$data,$match);
var_dump($match); 
echo $match[1];

这篇关于file_get_contents脚本可用于某些网站,但不适用于其他网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆